Rate-Distortion Optimized Packet Scheduling for Video Streaming

The most significant recent advance in streaming technology is the emergence of rate-distortion optimized (RaDiO) streaming techniques that take into account packet importance and knowledge about the channel in a Lagrangian rate-distortion cost function J=D+λR. Our vision is that rate-distortion optimized packet scheduling will become the intellectual core of the young field of media streaming.

We developed an advanced framework for rate-distortion optimized video streaming which, for the first time, incorporates accurate non-additive distortion modeling for lost video packets, incorporates dependent packet delay and loss statistics, and employs unconventional methods for acknowledging and requesting video packets. We assessed the performance of these techniques for several video streaming architectures, applications, and coding techniques, such as streaming with diversity, proxy-driven streaming, and multiple description coding, which could not be previously combined with rate-distortion optimized streaming due to its initial limitations. Furthermore, as part of this research we have also shown that contrary to conventional wisdom layered scalable video representations are superior to multiple description coding for streaming with packet path diversity, when packet schedules are optimally coordinated. Finally, in another related avenue of this research we have studied low-complexity variants of our advanced framework which can be considered as the first practical implementation of rate-distortion optimized streaming in the Internet today. These are in turn based on Distortion Chains, which is a novel framework for distortion modeling in lossy video communication that we have designed in part for this purpose.

Distortion Modeling for Rate-Distortion Preambles
Any rate-distortion optimized packet scheduler depends critically on an accurate prediction of the video quality that results at the client, if some of the packets arrive late and others are lost or omitted. If the importance of some packets is overestimated, while the importance of others is underestimated, the overall video quality suffers, since the scheduler optimizes the transmission policy based on flawed assumptions.

In general, it is not feasible to measure and tabulate the distortion values for all combinations of packet losses, as the size of this table would grow exponentially with the number of packets. Therefore, a model is needed to extrapolate distortion values for general loss patterns from a limited number of measurements. We developed a new and flexible model, based on a directed acyclic graph (DAG) that, in addition to capturing packet dependencies as in prior works on rate-distortion optimized streaming, also captures possible distortion interactions. An example of a distortion interaction is the independent encoding of the same video frame at two different rates, sent in two independent packets. Each received packet can be independently decoded, but, if both packets arrive, the packet containing the coarser representation of the frame is discarded. Another example is the temporal propagation of previous concealment errors through an I-frame, if the corresponding I-frame packets are lost.

Figure 1. Proposed distortion model comprising packet nodes (boxes) and media unit nodes (circles). State information representing received packet (solid lines) or distortion (dashed lines) is propagated to generate the distortion values D_i associated with each media unit.

In our DAG model, there are two types of nodes (Figure 1). The boxes correspond to packets. The circles correspond to video frames (or, more generally, media units) and capture distortion interactions. Each edge in the graph originating from a node propagates a state variable. The state variable either indicates which of the packet associated with the node and its ancestors have been decoded (solid lines), or it represents a distortion value associated with the media unit represented by the node (dashed lines). For example, it can be seen from Figure 1 that Media Unit 1 has four independent encodings and that Media Unit 2 has two dependent layered encodings, where the second layer cannot be decoded without the first layer. Based on the propagated state information on edges directed towards a media unit node, the node computes the distortion D_i associated with media unit i.
Modeling Dependent Packet Delays and Losses
Another critical component of a rate-distortion optimized packet scheduler is an accurate model of packet loss and delay. The scheduler must determine the probability of each packet loss pattern. Losses can be due, e.g., to packet dropping in routers, to adverse wireless access link conditions, or to arrivals after the deadline.

Rate-distortion optimized packet schedulers to date have treated loss and delay of successive packets as independent events. However, in the Internet, packets usually arrive in the same order as they were transmitted and end-to-end delays of successive packets are strongly correlated. Losses often appear in bursts that are correlated to increases in delay. In our advanced framework, we model the forward and the backward channel on a network path using a K-state Hidden Markov Model (HMM), whose state transitions are described by a state transition probability matrix. In addition, each state of the model is characterized with random packet loss or delay. That is, given that the model is in a certain state, a packet is either lost with probability epsilon, or if it is not lost, it is delayed by a random amount. For the purpose of illustration we show in Figure 2 an HMM for K=2 with the corresponding delay distribution for each state, with mass epsilon at infinite delay to signify a loss.

Figure 2. A two-state HMM for the forward and the backward channel on a network path.
Computing the Optimal Transmission Policy π*
Suppose there are L data units in the multimedia session. Let π_l be the transmission policy for data unit l ∈ {1,...,L} and let π = {π₁,...,π_L} be the vector of transmission policies for all L data units. In our framework, a transmission policy π is a schedule according to which an agent in the streaming process performs transmission actions in regard to a data unit. The specific form of π will depend on the actual streaming scenario under consideration. In particular, let t₀,t₁,...,t_N-1 denote a finite set of N transmission opportunities at which transmission actions for a data unit can be executed. Then, in the case of sender-driven streaming, π will correspond to the transmission actions a₀,a₁,...,a_N-1 according to which the server will send (a_j = 1) or not send (a_j = 0) a packet with the data unit at every transmission slot t_j, for j = 0,1,...,N-1. In the case of receiver-driven streaming, π will denote the schedule according to which the receiving agent, i.e., the client, will send request packets to the server requesting transmission of the data unit. Finally, in the case of hybrid receiver/sender-driven streaming, as studied below, π will correspond to the transmission actions of the intervening proxy-server which simultaneously resends cached data packets to the client and requests transmission of new data packets from the server. It should be mentioned that within our optimization framework we also study the advanced scenarios of streaming with diversity and streaming with rich acknowledgements or rich requests, as described in the following. There, the notion of transmission policy is yet different from the cases described above since the execution of transmission actions in each of these scenarios will need to take into consideration their specific characteristics.

Now, for a given scenario, any given policy vector π induces an expected distortion D(π) and an expected number of transmitted bytes R(π) for the multimedia presentation. For a given policy π these quantities can be computed using the source and channel models described above. Then, we seek the policy vector that minimizes D(π) subject to a constraint on R(π). This can be achieved by minimizing the Lagrangian J(π) = D(π) + λ R(π) for some Lagrange multiplier λ > 0, thus achieving a point on the lower convex hull of the set of all achievable distortion-rate pairs. We minimize the objective funtion J(π) using an iterative descent algorithm, one variable at a time, until convergence. Its core step involves finding a point on the lower convex hull of a set of points in the so-called "error-cost" plane, { ( ρ(π), ε(π) ) : π ∈ Π }, where Π is a family of transmission policies corresponding to the scenario at hand, ε(π) is the expected error or the probability that a data unit does not arrive at the client on time under policy π, and ρ(π) is the expected cost or the expected number of transmitted bytes per source byte under policy π. The point of interest corresponds to the policy π* minimizing ε(π) + λ ρ(π), for a given Lagrange multiplier λ > 0. Computing π* is typically done via a dynamic programming algorithm on the Markov decision tree associated with the policies from the family Π. We have derived instances of Markov decision trees and the associated optimization algorithms for finding π* for each of the streaming scenarios considered next. Figure 3 below illustrates a decision tree associated with the Markov process of transmitting a data unit in sender-driven streaming.

Figure 3. Markov decision tree for sender-driven streaming: Observations o_i correspond to prospective acknowledgements received at server by t_i+1 due to prior transmissions of the data unit. t_DTS is the delivery deadline by which the data unit needs to be received by the client. The sequence of action-observation pairs (a₀,o₀) ο (a₁,o₁) ο · · · ο (a_i,o_i) leading up to time t_i+1, determines the state q_i+1 at time t_i+1.
Proxy-driven Streaming from the Network Edge
In this project, we propose a framework for streaming of packetized media over a lossy packet network through an intermediate proxy server to a client, in a rate-distortion optimized way. The proxy, located at the edge of the backbone network (see Figure 4), coordinates the communication between the media server and the client using a hybrid receiver/sender-driven streaming in a rate-distortion optimization framework. Note that proxy-driven streaming is the most complicated case, as it combines sender- and receiver-driven scheduling. Possible actions are requests of packets from the server and retransmissions of proxy-buffered packets to the client. Observations are acknowledgments received from the client or requested packets arriving from the server. Proxy-driven streaming improves the end-to-end performance when compared to a sender- or receiver-driven RaDiO streaming system and in addition relieves the backbone network from the traffic load created by retransmissions of media packets lost in the last hop to the client. The proposed framework enables the proxy to determine at every instant which packets, if any, it should either request from the media server or retransmit directly to the client, in order to meet a constraint on the average transmission rate while minimizing the average end-to-end distortion.

Figure 4. Hybrid receiver/sender-driven streaming from a proxy server located at the network edge.
Diversity in Rate-distortion Optimized Streaming of Packetized Media
We consider diversity for media streaming in a rate-distortion optimization framework. In a sender-driven transmission, diversity is achieved by using multiple transmission paths over the network (see Figure 5a). In a receiver-driven transmission, diversity is achieved by requesting media packets from multiple servers (see Figure 5b). In the first scenario, the proposed framework enables the sender to decide at every instant which packets, if any, to transmit and over which transmission paths in order to meet a rate constraint while minimizing the end-to-end distortion. In the second scenario, the framework enables the receiver to decide at every instant which packets, if any, to request for transmission and from which servers again in order to meet a rate constraint while minimizing the end-to-end distortion. Note that by determining the best path(s) to send each packet, the packet scheduler also becomes a rate-distortion optimized packet router. Experimental results demonstrate the benefit of exploiting packet diversity in rate-distortion optimized streaming of packetized media.

Figure 5. Diversity in packet switched networks. (a) Path diversity. (b) Server diversity.
Streaming with Rich Acknowledgements and Rich Requests
We consider unconventional procedures for acknowledging reception of media packets in sender-driven streaming and for requesting transmission of media packets in receiver-driven streaming. The advantage of using them over conventional approaches is twofold: first, the end-to-end distortion-rate performance improves, and second, they consume less bandwidth on the backward channel, from the client to the server, which can be extremely useful for two classes of networks: networks with asymmetric links where the relative cost of sending a packet on the backward channel is higher and wireless sensor networks that have stringent power constraints.

For sender-driven transmission, instead of separately acknowledging each media packet as it arrives, we periodically send to the server a single acknowledgement packet, denoted henceforth rich acknowledgement, that contains information about all media packets that have arrived at the client by the time the rich acknowledgment is sent. This information in essence reflects the current state of the client's buffer, i.e., which packets have been received by the client thus far. The proposed scenario of streaming with rich acknowledgements is illustrated in Figure 6.

Figure 6. Streaming on demand using rich acknowledgements.

Similarly, for receiver-driven streaming, instead of separately requesting individual media packets, the client periodically sends to the server a single request packet, denoted henceforth rich request, that simultaneously requests transmission of different media packets. In addition, rather than sending via the rich request a single transmission decision (send or not send) for all media packets under consideration, the client sends complete transmission schedules over a window of time for every media packet. The proposed scenario of streaming with rich requests is illustrated in Figure 7.

Figure 7. Streaming on demand using rich requests.