ARQ Storms: What are they, and how to avoid them!

Ciro A. Noronha, Ph.D. RIST Forum President CTO, Cobalt Digital

ARQ (Automatic Repeat reQuest) is a packet recovery technique whereby if a packet is determined to be lost by the receiver, it sends a retransmission request for that specific packet. This technique was first invented in the 1960s for packet radio and is in every networking textbook. It is sometimes also called “Selective Retransmission”. Pretty much all modern UDP-based media transmission protocols use ARQ.

Understanding ARQ’s limitations

ARQ is a good technique for recovering packets lost during transport of broadcast video, but this is a situation where too much of a good thing may be bad. Let’s say that a block of packets is lost. The receiver asks for them, and so the sender retransmits this block of packets while still transmitting the original content. If the path bandwidth is limited, and the sender bursts the retransmission block, this burst can create additional packet loss. Then, the receiver asks for more retransmissions, the sender sends them along the original stream, and still more packets are lost. If left unchecked, this situation can go downhill fast and take down the link, which gets overwhelmed with so much loss that nothing goes through. This situation, where retransmissions of lost packets cause further congestion and additional packet loss, is known as an ARQ storm.

The ARQ storm is what I experienced last year at VidTrans. I had a couple of RIST streams, along with one SRT stream that I had no control over. After a block loss, the SRT stream saturated the link. This can be even worse because, depending on the version, the default SRT behavior is to re-request a retransmission after RTT/2, which guarantees that there are more retransmissions than necessary.

Preventing ARQ storms

The solution to the ARQ storm is relatively straightforward and is listed in the RIST Simple Profile Specification: throttle the retransmissions. The situation can be avoided by enforcing a configurable bandwidth limit. Although this may mean that you can’t recover all the lost packets, this is really of less significance because the important thing here is that the link stays up and remains functional. With the RIST protocol, users can even be more sophisticated and decide which packets to retransmit first.

The RIST Activity Group (AG) in the Video Services Forum is continually working to develop existing and new RIST specifications to include additional features, functions and capabilities. The members of the RIST AG have years of experience designing and deploying ARQ-based links (hundreds of years, if you add them all), and will freely share their expertise.

The group are currently developing guidance on ARQ best practices for ARQ-based systems. These guidelines will be freely available on the Video Services Forum website when published.

To find out more about the work of the RIST Activity Group or to discuss the best way to deploy ARQ-based links for broadcast video, get in touch.

Helen Weedon