Jitter Buffer

Jitter

抖动是由网络路径上的排队、争用和序列化效应引起的数据包传输延迟的变化。

一般而言,在慢速或严重拥塞的链路上更可能发生更高级别的抖动。

Audio Quality

  • 语音的连续性

声音是连续变化的信号,若突然遇到一小段 mute,听起来会是一下爆音或者杂音

For packet loss, artificial voice will be generated by receiver. For example: mute, repeat last packet

  • Latency in conversation

A one-way latency of up to 200 ms is considered acceptable

Voice overlap becomes a concern when the one-way latency is more than 200 ms

Jitter 分类

  1. Type A – constant jitter. This is a roughly constant level of packet to packet delay variation.

  2. Type B – transient jitter. This is characterized by a substantial incremental delay that may be incurred by a single packet.

  3. Type C – short term delay variation. This is characterized by an increase in delay that persists for some number of packets, and may be accompanied by an increase in packet to packet delay variation. Type C jitter is commonly associated with congestion and route changes.

  1. A 类 – 恒定抖动。

这是数据包到数据包通过网络传输延迟变化的大致恒定水平。

  1. B 类——瞬态抖动。

以单个数据包可能引起的大量的增量延迟为特征。

  1. C 类——短期延迟变化。

特点是延迟的增加持续了一定数量的数据包,并且可能伴随着数据包到数据包延迟变化的增加。

C 类抖动通常与拥塞和路由变化有关。

Jitter buffer Overview

The network delivers RTP packets asynchronously, with variable delays.

To be able to play the audio stream with reasonable quality, the receiving endpoint needs to turn the variable delays into constant delays.

This can be done by using a jitter buffer.

对于接收方来说,提高 voice quality 的首要工作是减少 packet loss 情况发生,最大限度保证 playout voice 是连续的并且是按照原顺序的

Jitter is defined as a variation in the delay of received packets.

Jitter buffer induces a small delay to collect a certain number of packets for rearranging them in the proper order as well as inducing equal spacing between them before sending them for decompression.

The (fixed) jitter buffer implementation is quite simple.

For example:

  • create a buffer to hold 100ms of audio (jitter buffer max size = 100ms)

  • place incoming audio frames to the buffer

  • start the playout when the buffer has at least 40ms data (delay = 40ms)

How long JB is better?

  • Latency/delay 设置得比较小,声音虽然及时了,但就有更大几率会出现 packet loss,导致音质不好

  • Latency/delay 设置得比较大,packet loss 机会变小,但过大的延迟会造成对话障碍

At the sending side, packets are sent in a continuous stream with the packets spaced evenly apart. Due to network congestion, improper queuing, or configuration errors, this steady stream can become lumpy, or the delay between each packet can vary instead of remaining constant.

When a router receives a Real-Time Protocol (RTP) audio stream for Voice over IP (VoIP), it must compensate for the jitter that is encountered. The mechanism that handles this function is the playout delay buffer. The playout delay buffer must buffer these packets and then play them out in a steady stream to the digital signal processors (DSPs) to be converted back to an analog audio stream. The playout delay buffer is referred to as the Jitter Buffer.

If the jitter is so large that it causes packets to be received out of the range of this buffer, the out-of-range packets are discarded and dropouts are heard in the audio. For losses as small as one packet, the DSP interpolates what it thinks the audio should be and no problem is audible. When jitter exceeds what the DSP can do to make up for the missing packets, audio problems are heard.

Jitter Buffer Purpose

  • Absorb packet arrival variability to a decoder interface Buffers sets of data Compromises between buffering delay and concealment

  • Enable stream synchronization Delays a stream to match the playout of another: lip sync is an example

  • Delay the first packet decoding

  • Re-order the arrived packets

  • Determine packet loss

Jitter Buffer types

Fixed Jitter Buffer

  • Latency is fixed

  • Delta/Jitter statistics is not necessary since it is not referenced

  • Easy to implement.

  • Voice quality will be not good if jitter has large changes.

Adaptive Jitter Buffer

trait:

  • Latency is dynamic according to jitter real-time changes

  • Key point: time scaling (adjust playout time without affecting voice quality)

    • Scaling on voice packets

    • Scaling on non-voice packets

  • Not easy to implement.

  • By using good time scaling algorithms, voice quality may not be impacted by large changes of jitter.