Improving TCP Congestion Control with Machine Intelligence

  • Link2Paper
  • [Author: Y Kong et al.]
  • [Keywords: TCP, Congestion Control, Reinforcement Learning]

Motivation

Previous congestion control methods (NewReno, Vegas[1], Cubic[2], Compound[3]) are:

  • Mechanism-driven instead of objective-driven
  • Pre-defined operations in response to specific feedback signals
  • Do not learn and adapt from experience

Related Work

RemyCC [4]

RemyCC

  • Delay-throughput tradeoff as objective function
  • Offline training to generate lookup tables
  • Inflexible for the network & traffic model changes

Q-TCP [5]

  • Based on Q-learning
  • Designed with mostly a single flow in mind
  • Sufficient buffering available at the bottleneck

Contribution

Loss prediction based TCP (LP-TCP)

Teach TCP to optimize its cwnd to minimize packet loss events during congestion avoidance.

LP-TCP

  • When a new ACK is received, cwnd += 1/cwnd
  • Before sending a packet
    • Sensing engine updates the feature vector
      • cwnd
      • ewma of ACK intervals
      • ewma of sending intervals
      • minimum of ACK intervals
      • minimum of sending intervals
      • minimum of RTT
      • time series (TS) of ack intervals
      • TS of sending intervals
      • TS of RTT ratios
    • Loss predictor outputs loss probability p
      • Training data gathered through NewReno simulations on NS2
      • Train a random forest classifier offline
      • Re-train LP upon network changes
    • If p < threshold, the actuator sends the packet
    • Otherwise, the packet is not sent, and cwnd -= 1
  • Set threshold to max
    • M_e = log(throughput) - 0.1 log(delay)
    • where delay = RTT - RTT_{min}

Reinforcement learning based TCP (RL-TCP)

Learn to adjust cwnd to increase an utility function.

RL - Utility Function

  • State s_n
    • ewma of the ACK inter-arrival time
    • ewma of packet inter-sending time
    • RTT ratio
    • slow start threshold
    • current cwnd size
  • Action a_n
    • cwnd += a_n, where a_n = -1, 0, +1, +3
  • Reward R_{n+1}

RL - Reward

  • Q-Learning
    • Update every RTT
    • SARSA
    • Temporal credit assignment of reward

RL - Credit Assignment

  • Action Selection
    • ɛ-greedy exploration & exploitation

Evaluation

LP-TCP

LP-TCP predicts all packet losses (during congestion avoidance) & keeps the cwnd at the network ceiling.

LP - Evaluation

Single Sender - Varying buffer size

Evaluation - Single Server

  • LP-TCP has the best throughput (M_e) when L = 5
  • RL-TCP has the best throughput (M_e) when L = 50, 150
  • Performance of RL-TCP is less sensitive to the varying buffer size

Multiple Senders

Evaluation - Multiple Servers

  • Buffer size L = 50
  • Left: 4 senders, homogeneous
  • Right: 3 NewReno senders, 1 LP/RL-TCP

Future Work

  • Explore policy-based RL-TCP
  • Improve fairness for learning-based TCP congestion control schemes

Ref