prometheus的rate与irate内部是如何计算的

主要代码是在 https://github.com/prometheus/prometheus/blob/master/promql/functions.go 的 extrapolatedRate 和 funcRate ，funcRate为

func funcRate(vals []parser.Value, args parser.Expressions, enh *EvalNodeHelper) Vector {
  return extrapolatedRate(vals, args, enh, true, true)
}

它的前后还有 funcDelta 和 funcIncrease 对应promql的 delta 和 increase ，这俩函数内部都是调用的 extrapolatedRate ，主要区别是通过向 extrapolatedRate 函数传递最后的两个布尔标志位的差异，来在 extrapolatedRate 内部进行差异化计算，也就是说 rate 、 delta 和 increase 的部分数学计算逻辑是一样的。

funcRate 里 extrapolatedRate 最后俩实参格式为 isCounter bool, isRate bool ，所以 rate 只能用在 counter 的 metrics 类型上进行计算。

数据点的选取

先看这段代码

var (
  counterCorrection float64
  lastValue         float64
)
for _, sample := range samples.Points {
  if isCounter && sample.V < lastValue {
    counterCorrection += lastValue
  }
  lastValue = sample.V
}
resultValue := lastValue - samples.Points[0].V + counterCorrection

counterCorrection是字面意思修正数值，counter会reset，例如exporter重启了。例如60秒内有下面6数值，在第四个数字后面发生了重置

2小于lastValue 8，所以 counterCorrection = 8

最后的 resultValue = 4 + 8 - 2 ，当然，重置的情况很少，这里如果不重置用数据 2 4 6 8 10 12 算就是最后一个值减去第一个值 resultValue = 12 - 2 + 0

计算的算式

是结果除以时间的秒数

if isRate {
  resultValue = resultValue / ms.Range.Seconds()
}

对比下irate

// 取最后一个数据点
lastSample := samples.Points[len(samples.Points)-1]
// 取倒数第二个数据点
previousSample := samples.Points[len(samples.Points)-2]

var resultValue float64
if isRate && lastSample.V < previousSample.V {
  // counter重置则取最后一个值.
  resultValue = lastSample.V
} else {
  // 最后一个点数值 - 倒数第二个数值
  resultValue = lastSample.V - previousSample.V
}
// 最后两个点的时间间隔
sampledInterval := lastSample.T - previousSample.T
if sampledInterval == 0 {
  // Avoid dividing by 0.
  return out
}

if isRate {
    // 转换成秒，然后结果除以秒数
  resultValue /= float64(sampledInterval) / 1000
}

结论

官方文档和市面上的 gitbook 都是把 rate 翻译成增长率是错误的，应该是平均每秒增长了多少数值。按照实践来算下，同时查询 node_time_seconds[1m] 和 rate(node_time_seconds[1m]) 。我们手动计算下看看是否和rate的结果一致

$ node_time_seconds[1m]
node_time_seconds{instance="exporter:9100",job="node-resources"}
1596077182.3093214 @1596077182.307 // 第一个点
1596077192.3132203 @1596077192.307
1596077202.311446 @1596077202.307
1596077212.309673 @1596077212.307
1596077222.316771 @1596077222.307
1596077232.3151288 @1596077232.307 // 最后一个点
node_time_seconds{instance="10.0.23.29:9100",job="node-resources"}
1596077178.6314309 @1596077178.633  // 第一个点
1596077188.6312084 @1596077188.633
1596077198.633293 @1596077198.634
1596077208.6332283 @1596077208.634
1596077218.6320524 @1596077218.633
1596077228.635078 @1596077228.633  // 最后一个点

$ rate(node_time_seconds[1m])
{instance="exporter:9100",job="node-resources"}    1.0001161479949952
{instance="10.0.23.29:9100",job="node-resources"}  1.0000729417800902

先用 10.0.23.29 这个 instance 算

(1596077228.635078 - 1596077178.6314309) /
    (1596077228.633 - 1596077178.633) // web上的时间是秒数的，go的time是多了三个单位，所以代码里/1000转换成秒
            50.003647089       /  50    =  1.00007294178

谷歌搜的在线计算器算的(比windows的calc精度高一些)，由于是float64，所以精度丢失了一些。结果一样。再算下另一个 instance

(1596077232.3151288 - 1596077182.3093214) /
    (1596077232.307 - 1596077182.307)
            50.0058073997     /    50 = 1.00011614799

increase 是最后一个点-第一个点，不除以秒数。所以在 counter 没发生重置情况下，下面两个是相等的

increase(node_time_seconds[1m]) / 60 == rate(node_time_seconds[1m])

prometheus的rate与irate内部是如何计算的

数据点的选取

计算的算式

对比下irate

结论

热门新闻

周热门

prometheus的rate与irate内部是如何计算的

数据点的选取

计算的算式

对比下irate

结论

哆啦A梦？不好记！安利一下Prometheus这款开源的企业监控报警平台

一步步教你用Prometheus搭建实时监控系统系列(二)——详细分析拉取和推送两种不同模式

使用Prometheus监控Flink

爱奇艺号基于Prometheus的微服务应用监控实践

新鲜开源：基于Prometheus的企业监控平台设计与实现

今日实践：利用Grafna给你的Loki添加日志告警

如何对K8s进行考核？Kuberhealthy来打个样！

3W字干货深入分析基于Micrometer和Prometheus实现度量和监控的方案

用 Prometheus 监控 Kubernetes，目前最实用的部署方式都说全了

使用 Prometheus-Operator 监控 Calico

通过Consul+Prometheus自动注册node-exporter实现自动监控OpenStack的VM

Prometheus和Zabbix的对比

搭建Prometheus平台，你必须考虑的6个因素

为 Prometheus Node Exporter 加上认证

Prometheus 如何做到“活学活用”，大牛总结的避坑指南

热门新闻

周热门