TCP RTO计算方法和思考以及go实现验证

概述

最近在研究tcp的重传机制，tcp的重传大概有三种，超时重传(rto)、快速重传(fack)、早期重传(er)。今天讲讲rto，并用go实现其算法，探究一下。主要参考tcp/ip-guid

基本概念

RTO即超时重传时间
RTT数据包往返时间
平均偏差是指单项测定值与平均值的偏差（取绝对值）之和，除以测定次数。

RTO计算算法

RTO的计算依赖于RTT值，或者说一系列RTT值。rto=f(rtt)
在Linux中，最开始实现的是一个比较简单的经典算法RFC793，后来1988年提出了新的算法计算rto值，文档为RFC6298. 下面对比一下两种算法。

经典算法

rfc原文

    An Example Retransmission Timeout Procedure

      Measure the elapsed time between sending a data octet with a
      particular sequence number and receiving an acknowledgment that
      covers that sequence number (segments sent do not have to match
      segments received).  This measured elapsed time is the Round Trip
      Time (RTT).  Next compute a Smoothed Round Trip Time (SRTT) as:

        SRTT = ( ALPHA * SRTT ) + ((1-ALPHA) * RTT)

      and based on this, compute the retransmission timeout (RTO) as:

        RTO = min[UBOUND,max[LBOUND,(BETA*SRTT)]]

      where UBOUND is an upper bound on the timeout (e.g., 1 minute),
      LBOUND is a lower bound on the timeout (e.g., 1 second), ALPHA is
      a smoothing factor (e.g., .8 to .9), and BETA is a delay variance
      factor (e.g., 1.3 to 2.0).

其中说明了计算rto的计算公式：

SRTT = ( ALPHA * SRTT ) + ((1-ALPHA) * RTT)  
RTO = min[UBOUND,max[LBOUND,(BETA*SRTT)]]

先计算SRTT(Smoothed Round Trip Time)平滑RTT时间，然后再比较设定的最大最小RTO来计算RTO值来限定最大最小RTO值。其中 ALPHA是一个平滑因子，取值为0.8或者0.9，BETA也是一个因子，建议是1.3-2.0。这个算法很简单我们用go来实现一下，ALPHA取0.8，BETA取1.6。

代码实现

package main

import (
	"fmt"
)

const (
	ALPHA = 0.8
	BETA  = 1.6
)

const (
	LBOUND = 100
	UBOUND = 60000
)

type TCP struct {
	srtt int32
	rto  uint32
}

func main() {
	tcp := &TCP{}
	lastRTO := uint32(0)
	rtts := []int32{
		100, 103, 105, 102, 100, 103, 105, 100, 103, 105, 102, 100, 102, 105, 100, 103, 105, 104, 100, 103, 105,
		200, 203, 205, 202, 200, 203, 205, 200, 203, 205, 202, 200, 202, 205, 200, 203, 205, 204, 200, 203, 205,
		400, 403, 405, 404,
		100, 103, 105, 102, 100, 103, 105, 100, 103, 105, 102, 100, 102, 105, 100, 103, 105, 104, 100, 103, 105,
		// 400, 403, 405, 400, 403, 405, 404, 400, 404, 405, 400, 403, 405, 404, 400, 403, 405,
	}
	for i, rtt := range rtts {
		if lastRTO == 0 {
			lastRTO = uint32(rtt)
		}
		rto := tcp.updateRTT(rtt)
		if uint32(rtt) > lastRTO {
			fmt.Printf("warn: loss packet, index: %d\n", i-1)
		}
		fmt.Printf("index: %d, rtt: %d, rto: %d\n", i, rtt, rto)
		lastRTO = rto
	}
}

func (t *TCP) updateRTT(rtt int32) uint32 {
	var rto uint32
	if t.srtt == 0 {
		t.srtt = rtt
		rto = uint32(rtt)
	} else {
		t.srtt = int32(ALPHA*float32(t.srtt) + ((1 - ALPHA) * float32(rtt)))
		rto = min(UBOUND, max(LBOUND, uint32(BETA*float32(t.srtt))))
	}
	return rto
}

func min(a, b uint32) uint32 {
	if a <= b {
		return a
	}
	return b
}

func max(a, b uint32) uint32 {
	if a >= b {
		return a
	}
	return b
}

运行结果

index: 0, rtt: 100, rto: 100
warn: loss packet, index: 0
index: 1, rtt: 103, rto: 160
index: 2, rtt: 105, rto: 161
index: 3, rtt: 102, rto: 161
index: 4, rtt: 100, rto: 160
index: 5, rtt: 103, rto: 160
index: 6, rtt: 105, rto: 161
index: 7, rtt: 100, rto: 160
index: 8, rtt: 103, rto: 160
index: 9, rtt: 105, rto: 161
index: 10, rtt: 102, rto: 161
index: 11, rtt: 100, rto: 160
index: 12, rtt: 102, rto: 160
index: 13, rtt: 105, rto: 161
index: 14, rtt: 100, rto: 160
index: 15, rtt: 103, rto: 160
index: 16, rtt: 105, rto: 161
index: 17, rtt: 104, rto: 161
index: 18, rtt: 100, rto: 160
index: 19, rtt: 103, rto: 160
index: 20, rtt: 105, rto: 161
warn: loss packet, index: 20
index: 21, rtt: 200, rto: 192
warn: loss packet, index: 21
index: 22, rtt: 203, rto: 217
index: 23, rtt: 205, rto: 238
index: 24, rtt: 202, rto: 254
index: 25, rtt: 200, rto: 267
index: 26, rtt: 203, rto: 278
index: 27, rtt: 205, rto: 288
index: 28, rtt: 200, rto: 294
index: 29, rtt: 203, rto: 299
index: 30, rtt: 205, rto: 304
index: 31, rtt: 202, rto: 307
index: 32, rtt: 200, rto: 308
index: 33, rtt: 202, rto: 310
index: 34, rtt: 205, rto: 313
index: 35, rtt: 200, rto: 313
index: 36, rtt: 203, rto: 315
index: 37, rtt: 205, rto: 316
index: 38, rtt: 204, rto: 318
index: 39, rtt: 200, rto: 318
index: 40, rtt: 203, rto: 318
index: 41, rtt: 205, rto: 320
warn: loss packet, index: 41
index: 42, rtt: 400, rto: 384
warn: loss packet, index: 42
index: 43, rtt: 403, rto: 435
index: 44, rtt: 405, rto: 476
index: 45, rtt: 404, rto: 510
index: 46, rtt: 100, rto: 440
index: 47, rtt: 103, rto: 384
index: 48, rtt: 105, rto: 340
index: 49, rtt: 102, rto: 304
index: 50, rtt: 100, rto: 275
index: 51, rtt: 103, rto: 252
index: 52, rtt: 105, rto: 235
index: 53, rtt: 100, rto: 219
index: 54, rtt: 103, rto: 208
index: 55, rtt: 105, rto: 200
index: 56, rtt: 102, rto: 192
index: 57, rtt: 100, rto: 185
index: 58, rtt: 102, rto: 180
index: 59, rtt: 105, rto: 177
index: 60, rtt: 100, rto: 172
index: 61, rtt: 103, rto: 171
index: 62, rtt: 105, rto: 169
index: 63, rtt: 104, rto: 168
index: 64, rtt: 100, rto: 166
index: 65, rtt: 103, rto: 164
index: 66, rtt: 105, rto: 164

从结果中可以看出，这边丢失了5个数据包，假设这里没有真实的丢包，而仅仅是网络抖动，那么可以知道这个算法抗网络的抖动性很低。

Jacobaon/Karels 算法

这是一种比经典算法更新的算法，当然表现也更好，等会亲手试试，看看结果就知道了。

rfc原文

The Basic Algorithm

   To compute the current RTO, a TCP sender maintains two state
   variables, SRTT (smoothed round-trip time) and RTTVAR (round-trip
   time variation).  In addition, we assume a clock granularity of G
   seconds.

   The rules governing the computation of SRTT, RTTVAR, and RTO are as
   follows:

   (2.1) Until a round-trip time (RTT) measurement has been made for a
         segment sent between the sender and receiver, the sender SHOULD
         set RTO <- 1 second, though the "backing off" on repeated
         retransmission discussed in (5.5) still applies.

         Note that the previous version of this document used an initial
         RTO of 3 seconds [PA00].  A TCP implementation MAY still use
         this value (or any other value > 1 second).  This change in the
         lower bound on the initial RTO is discussed in further detail
         in Appendix A.

   (2.2) When the first RTT measurement R is made, the host MUST set

            SRTT <- R
            RTTVAR <- R/2
            RTO <- SRTT + max (G, K*RTTVAR)

         where K = 4.

   (2.3) When a subsequent RTT measurement R' is made, a host MUST set

            RTTVAR <- (1 - beta) * RTTVAR + beta * |SRTT - R'|
            SRTT <- (1 - alpha) * SRTT + alpha * R'

         The value of SRTT used in the update to RTTVAR is its value
         before updating SRTT itself using the second assignment.  That
         is, updating RTTVAR and SRTT MUST be computed in the above
         order.

         The above SHOULD be computed using alpha=1/8 and beta=1/4 (as
         suggested in [JK88]).

         After the computation, a host MUST update
         RTO <- SRTT + max (G, K*RTTVAR)

   (2.4) Whenever RTO is computed, if it is less than 1 second, then the
         RTO SHOULD be rounded up to 1 second.

         Traditionally, TCP implementations use coarse grain clocks to
         measure the RTT and trigger the RTO, which imposes a large
         minimum value on the RTO.  Research suggests that a large
         minimum RTO is needed to keep TCP conservative and avoid
         spurious retransmissions [AP99].  Therefore, this specification
         requires a large minimum RTO as a conservative approach, while

         at the same time acknowledging that at some future point,
         research may show that a smaller minimum RTO is acceptable or
         superior.

   (2.5) A maximum value MAY be placed on RTO provided it is at least 60
         seconds.

rto的计算公式：

第一次rtt计算： 
SRTT = R
RTTVAR = R/2
RTO = SRTT + max (G, K*RTTVAR)
之后：
RTTVAR = (1 - beta) * RTTVAR + beta * |SRTT - R'|
SRTT = (1 - alpha) * SRTT + alpha * R'
RTO = SRTT + max (G, K*RTTVAR)

SRTT(smoothed round-trip time)平滑RTT时间
RTTVAR(round-trip time variation)RTT变量，其实就是rtt平均偏差
G表示系统时钟的粒度，一般很小，us级别。 beta = ¹⁄₄, alpha = ¹⁄₈

代码实现

package main

import (
	"fmt"
)

const (
	LBOUND = 100
	UBOUND = 60000
)

type TCP struct {
	srtt, rttvar int32
	rto          uint32
}

func main() {
	tcp := &TCP{}
	lastRTO := uint32(0)
	rtts := []int32{
		100, 103, 105, 102, 100, 103, 105, 100, 103, 105, 102, 100, 102, 105, 100, 103, 105, 104, 100, 103, 105,
		200, 203, 205, 202, 200, 203, 205, 200, 203, 205, 202, 200, 202, 205, 200, 203, 205, 204, 200, 203, 205,
		400, 403, 405, 404,
		100, 103, 105, 102, 100, 103, 105, 100, 103, 105, 102, 100, 102, 105, 100, 103, 105, 104, 100, 103, 105,
		// 400, 403, 405, 400, 403, 405, 404, 400, 404, 405, 400, 403, 405, 404, 400, 403, 405,
	}
	for i, rtt := range rtts {
		if lastRTO == 0 {
			lastRTO = uint32(rtt)
		}
		rto := tcp.updateRTT(rtt)
		if uint32(rtt) > lastRTO {
			fmt.Printf("warn: loss packet, index: %d\n", i-1)
		}
		fmt.Printf("index: %d, rtt: %d, rto: %d\n", i, rtt, rto)
		lastRTO = rto
	}
}

func (t *TCP) updateRTT(rtt int32) uint32 {
	var rto uint32
	if t.srtt == 0 {
		t.srtt = rtt
		t.rttvar = rtt >> 1
	} else {
		delta := rtt - t.srtt
		if delta < 0 {
			delta = -delta
		}
		t.rttvar = (3*t.srtt + delta) >> 2
		t.srtt = (7*t.srtt + rtt) >> 3
	}
	rto = uint32(t.srtt + 4*t.rttvar)
	rto = min(UBOUND, max(LBOUND, rto))
	return rto
}

func min(a, b uint32) uint32 {
	if a <= b {
		return a
	}
	return b
}

func max(a, b uint32) uint32 {
	if a >= b {
		return a
	}
	return b
}

运行结果

index: 0, rtt: 100, rto: 300
index: 1, rtt: 103, rto: 400
index: 2, rtt: 105, rto: 404
index: 3, rtt: 102, rto: 400
index: 4, rtt: 100, rto: 400
index: 5, rtt: 103, rto: 400
index: 6, rtt: 105, rto: 404
index: 7, rtt: 100, rto: 400
index: 8, rtt: 103, rto: 400
index: 9, rtt: 105, rto: 404
index: 10, rtt: 102, rto: 400
index: 11, rtt: 100, rto: 400
index: 12, rtt: 102, rto: 400
index: 13, rtt: 105, rto: 404
index: 14, rtt: 100, rto: 400
index: 15, rtt: 103, rto: 400
index: 16, rtt: 105, rto: 404
index: 17, rtt: 104, rto: 404
index: 18, rtt: 100, rto: 400
index: 19, rtt: 103, rto: 400
index: 20, rtt: 105, rto: 404
index: 21, rtt: 200, rto: 512
index: 22, rtt: 203, rto: 547
index: 23, rtt: 205, rto: 581
index: 24, rtt: 202, rto: 609
index: 25, rtt: 200, rto: 628
index: 26, rtt: 203, rto: 650
index: 27, rtt: 205, rto: 672
index: 28, rtt: 200, rto: 685
index: 29, rtt: 203, rto: 701
index: 30, rtt: 205, rto: 713
index: 31, rtt: 202, rto: 724
index: 32, rtt: 200, rto: 731
index: 33, rtt: 202, rto: 741
index: 34, rtt: 205, rto: 748
index: 35, rtt: 200, rto: 754
index: 36, rtt: 203, rto: 760
index: 37, rtt: 205, rto: 770
index: 38, rtt: 204, rto: 775
index: 39, rtt: 200, rto: 772
index: 40, rtt: 203, rto: 777
index: 41, rtt: 205, rto: 782
index: 42, rtt: 400, rto: 1007
index: 43, rtt: 403, rto: 1082
index: 44, rtt: 405, rto: 1150
index: 45, rtt: 404, rto: 1207
index: 46, rtt: 100, rto: 1272
index: 47, rtt: 103, rto: 1156
index: 48, rtt: 105, rto: 1055
index: 49, rtt: 102, rto: 976
index: 50, rtt: 100, rto: 907
index: 51, rtt: 103, rto: 840
index: 52, rtt: 105, rto: 782
index: 53, rtt: 100, rto: 741
index: 54, rtt: 103, rto: 693
index: 55, rtt: 105, rto: 651
index: 56, rtt: 102, rto: 625
index: 57, rtt: 100, rto: 599
index: 58, rtt: 102, rto: 566
index: 59, rtt: 105, rto: 538
index: 60, rtt: 100, rto: 526
index: 61, rtt: 103, rto: 503
index: 62, rtt: 105, rto: 485
index: 63, rtt: 104, rto: 479
index: 64, rtt: 100, rto: 473
index: 65, rtt: 103, rto: 459
index: 66, rtt: 105, rto: 446

即使和经典算法一样的rtt样本，但却没有出现任何一个丢包，证明其抗网络抖动性是比较强的，但这里也有个问题，就是算出来的rto值比真实rtt值大很多，如果网络中出现真实丢包，那重传将会慢很多，导致速度很慢，我想在Linux中应该不是完全按照这个算法来计算rto值的，而且上面rfc文档说当rto小于1s时，应该让rto等于1s，这个应该也说不过去啊，rto等于1s，会不会有点大了，这里也没有说结合快速重传等，那只靠rto超时重传速度会慢死。

RTO重传间隔是指数增加的

上面我们介绍的是初次重传时的RTO，如果重传后还没收到另一端的响应，下一次重传RTO则会指数增加，例如第一次重传RTO是1，之后分别2，4，8，16…。这种行为叫做指数回避策略，所以对于tcp来说，当丢包率高时，有可能一个包要很久才能穿过去。

TCP RTO计算方法以及go实现验证

TCP RTO计算方法和思考以及go实现验证

概述

基本概念

RTO计算算法

经典算法

rfc原文

代码实现

运行结果

Jacobaon/Karels 算法

rfc原文

代码实现

运行结果

RTO重传间隔是指数增加的

END