Rate-Distortion
Optimized Packet Scheduling for Video Streaming
The most significant recent advance in streaming technology is
the emergence of rate-distortion optimized (RaDiO) streaming techniques that
take into account packet importance and knowledge about the channel in a
Lagrangian rate-distortion cost function J=D+λR. Our vision is that
rate-distortion optimized packet scheduling will become the intellectual core of
the young field of media streaming.
We developed an advanced framework for rate-distortion
optimized video streaming which, for the first time, incorporates accurate
non-additive distortion
modeling for lost video packets, incorporates dependent packet delay and loss
statistics, and employs unconventional methods for acknowledging and requesting
video packets. We assessed the performance of these techniques for several video
streaming architectures, applications, and coding techniques, such as streaming
with diversity, proxy-driven streaming, and multiple description coding, which
could not be previously combined with rate-distortion optimized streaming due to
its initial limitations. Furthermore, as part of this research we have also
shown that contrary to conventional wisdom layered scalable video
representations are superior to multiple description coding for streaming with
packet path diversity, when packet schedules are optimally coordinated. Finally,
in another related avenue of this research we have studied low-complexity variants
of our advanced framework which can be considered as the first practical
implementation of rate-distortion optimized streaming in the Internet today.
These are in turn based on Distortion Chains,
which is a novel framework for distortion modeling in lossy video communication
that we have designed in part for this purpose.
- Distortion Modeling for
Rate-Distortion Preambles
Any rate-distortion optimized packet scheduler depends
critically on an accurate prediction of the video quality that results at the
client, if some of the packets arrive late and others are lost or omitted. If
the importance of some packets is overestimated, while the importance of
others is underestimated, the overall video quality suffers, since the
scheduler optimizes the transmission policy based on flawed assumptions.
In general, it is not feasible to measure and tabulate the
distortion values for all combinations of packet losses, as the size of this
table would grow exponentially with the number of packets. Therefore, a model
is needed to extrapolate distortion values for general loss patterns from a
limited number of measurements. We developed a new and flexible model, based
on a directed acyclic graph (DAG) that, in addition to capturing packet
dependencies as in prior works on rate-distortion optimized streaming, also
captures possible distortion interactions. An example of a distortion
interaction is the independent encoding of the same video frame at two
different rates, sent in two independent packets. Each received packet can be
independently decoded, but, if both packets arrive, the packet containing the
coarser representation of the frame is discarded. Another example is the
temporal propagation of previous concealment errors through an I-frame, if the
corresponding I-frame packets are lost.
Figure 1. Proposed distortion model comprising packet
nodes (boxes) and media unit nodes (circles). State information representing
received packet (solid lines) or distortion (dashed lines) is propagated to
generate the distortion values Di associated with each
media unit.
In our DAG model, there are two types of nodes (Figure 1).
The boxes correspond to packets. The circles correspond to video frames (or,
more generally, media units) and capture distortion interactions. Each edge in
the graph originating from a node propagates a state variable. The state
variable either indicates which of the packet associated with the node and its
ancestors have been decoded (solid lines), or it represents a distortion value
associated with the media unit represented by the node (dashed lines). For
example, it can be seen from Figure 1 that Media Unit 1 has four independent
encodings and that Media Unit 2 has two dependent layered encodings, where the
second layer cannot be decoded without the first layer. Based on the
propagated state information on edges directed towards a media unit node, the
node computes the distortion Di associated with media unit
i.
- Modeling Dependent
Packet Delays and Losses
Another critical component of a rate-distortion optimized
packet scheduler is an accurate model of packet loss and delay. The scheduler
must determine the probability of each packet loss pattern. Losses can be due,
e.g., to packet dropping in routers, to adverse wireless access link
conditions, or to arrivals after the deadline.
Rate-distortion optimized packet schedulers to date have
treated loss and delay of successive packets as independent events. However,
in the Internet, packets usually arrive in the same order as they were
transmitted and end-to-end delays of successive packets are strongly
correlated. Losses often appear in bursts that are correlated to increases in
delay. In our advanced framework, we model the forward and the backward
channel on a network path using a K-state Hidden Markov Model (HMM),
whose state transitions are described by a state transition probability
matrix. In addition, each state of the model is characterized with random
packet loss or delay. That is, given that the model is in a certain state, a
packet is either lost with probability epsilon, or if it is not lost, it is
delayed by a random amount. For the purpose of illustration we show in Figure
2 an HMM for K=2 with the corresponding delay distribution for each
state, with mass epsilon at infinite delay to signify a loss.
Figure 2. A two-state HMM for the forward and the
backward channel on a network path.
- Computing the Optimal
Transmission Policy π*
Suppose there are L data units in the multimedia session.
Let πl be the transmission policy for data unit l ∈
{1,...,L} and let π = {π1,...,πL}
be the vector of transmission policies for all L data units. In our framework,
a transmission policy π is a schedule according to which an
agent in the streaming process performs transmission actions in regard to a data unit.
The specific form of π will depend on the actual streaming scenario under consideration.
In particular, let t0,t1,...,tN-1
denote a finite set of N transmission opportunities at which transmission actions
for a data unit can be executed. Then, in the case of sender-driven streaming, π
will correspond to the transmission actions a0,a1,...,aN-1
according to which the server will send (aj = 1) or not send (aj = 0)
a packet with the data unit at every transmission slot tj, for j = 0,1,...,N-1.
In the case of receiver-driven streaming, π will denote the schedule according to which
the receiving agent, i.e., the client, will send request packets to the server requesting
transmission of the data unit. Finally, in the case of hybrid receiver/sender-driven streaming, as studied
below, π will correspond to the transmission actions of the intervening proxy-server
which simultaneously resends cached data packets to the client and requests transmission of new
data packets from the server. It should be mentioned that within our optimization framework
we also study the advanced scenarios of streaming with diversity and streaming with rich
acknowledgements or rich requests, as described in the following. There, the notion of
transmission policy is yet different from the cases described above since the execution
of transmission actions in each of these scenarios will need to take into consideration
their specific characteristics.
Now, for a given scenario, any given policy vector π induces an
expected distortion D(π) and an expected number of transmitted bytes
R(π) for the multimedia presentation. For a given policy π
these quantities can be computed using the source and channel models described above.
Then, we seek the policy vector that minimizes D(π) subject to a constraint
on R(π). This can be achieved by minimizing the Lagrangian J(π)
= D(π) + λ R(π) for some
Lagrange multiplier λ > 0, thus achieving a point on the lower convex hull of the
set of all achievable distortion-rate pairs. We minimize the objective funtion J(π)
using an iterative descent algorithm, one variable at a time, until convergence. Its core
step involves finding a point on the lower convex hull of a set of points in the
so-called "error-cost" plane, { ( ρ(π), ε(π) ) :
π ∈ Π }, where Π is a family of transmission policies
corresponding to the scenario at hand, ε(π) is the expected
error or the probability that a data unit does not arrive at the
client on time under policy π, and ρ(π) is the expected
cost or the expected number of transmitted bytes per source byte under policy π.
The point of interest corresponds to the policy π* minimizing ε(π)
+ λ ρ(π), for a given Lagrange multiplier λ > 0.
Computing π* is typically done via a dynamic programming algorithm on the Markov decision tree
associated with the policies from the family Π. We have derived instances of Markov decision
trees and the associated optimization algorithms for finding π* for each of the
streaming scenarios considered next. Figure 3 below illustrates a decision tree associated
with the Markov process of transmitting a data unit in sender-driven streaming.
Figure 3. Markov decision tree for sender-driven streaming: Observations
oi correspond to prospective acknowledgements received at server by ti+1
due to prior transmissions of the data unit. tDTS is the delivery deadline by which
the data unit needs to be received by the client. The sequence of action-observation pairs
(a0,o0) ο (a1,o1) ο
· · · ο (ai,oi) leading up to
time ti+1, determines the state qi+1 at time ti+1.
- Proxy-driven Streaming
from the Network Edge
In this project, we propose a framework for streaming of
packetized media over a lossy packet network through an intermediate proxy
server to a client, in a rate-distortion optimized way. The proxy, located at
the edge of the backbone network (see Figure 4), coordinates the communication
between the media server and the client using a hybrid receiver/sender-driven
streaming in a rate-distortion optimization framework. Note that proxy-driven
streaming is the most complicated case, as it combines sender- and
receiver-driven scheduling. Possible actions are requests of packets from the
server and retransmissions of proxy-buffered packets to the client.
Observations are acknowledgments received from the client or requested packets
arriving from the server. Proxy-driven streaming improves the end-to-end
performance when compared to a sender- or receiver-driven RaDiO streaming
system and in addition relieves the backbone network from the traffic load
created by retransmissions of media packets lost in the last hop to the
client. The proposed framework enables the proxy to determine at every instant
which packets, if any, it should either request from the media server or
retransmit directly to the client, in order to meet a constraint on the
average transmission rate while minimizing the average end-to-end
distortion.
Figure 4. Hybrid receiver/sender-driven streaming from a
proxy server located at the network edge.
- Diversity in
Rate-distortion Optimized Streaming of Packetized Media
We consider diversity for media streaming in a
rate-distortion optimization framework. In a sender-driven transmission,
diversity is achieved by using multiple transmission paths over the network
(see Figure 5a). In a receiver-driven transmission, diversity is achieved by
requesting media packets from multiple servers (see Figure 5b). In the first
scenario, the proposed framework enables the sender to decide at every instant
which packets, if any, to transmit and over which transmission paths in order
to meet a rate constraint while minimizing the end-to-end distortion. In the
second scenario, the framework enables the receiver to decide at every instant
which packets, if any, to request for transmission and from which servers
again in order to meet a rate constraint while minimizing the end-to-end
distortion. Note that by determining the best path(s) to send each packet, the
packet scheduler also becomes a rate-distortion optimized packet router.
Experimental results demonstrate the benefit of exploiting packet diversity in
rate-distortion optimized streaming of packetized media.
Figure 5. Diversity in packet switched
networks. (a) Path diversity. (b) Server diversity.
- Streaming with Rich
Acknowledgements and Rich Requests
We consider unconventional procedures for acknowledging
reception of media packets in sender-driven streaming and for requesting
transmission of media packets in receiver-driven streaming. The advantage of
using them over conventional approaches is twofold: first, the end-to-end
distortion-rate performance improves, and second, they consume less bandwidth
on the backward channel, from the client to the server, which can be extremely
useful for two classes of networks: networks with asymmetric links where the
relative cost of sending a packet on the backward channel is higher and
wireless sensor networks that have stringent power constraints.
For sender-driven transmission, instead of separately
acknowledging each media packet as it arrives, we periodically send to the
server a single acknowledgement packet, denoted henceforth rich
acknowledgement, that contains information about all media packets that
have arrived at the client by the time the rich acknowledgment is sent. This
information in essence reflects the current state of the client's buffer,
i.e., which packets have been received by the client thus far. The proposed
scenario of streaming with rich acknowledgements is illustrated in Figure
6.
Figure 6. Streaming on demand using rich
acknowledgements.
Similarly, for receiver-driven streaming, instead of
separately requesting individual media packets, the client periodically sends
to the server a single request packet, denoted henceforth rich
request, that simultaneously requests transmission of different media
packets. In addition, rather than sending via the rich request a single
transmission decision (send or not send) for all media packets under
consideration, the client sends complete transmission schedules over a window
of time for every media packet. The proposed scenario of streaming with rich
requests is illustrated in Figure 7.
Figure 7. Streaming on demand using rich
requests.
|