EP1032881A1 - Robustes zuverlässiges kompressions- und packetierungsverfahren zur übertragung von video - Google Patents

Robustes zuverlässiges kompressions- und packetierungsverfahren zur übertragung von video

Info

Publication number
EP1032881A1
EP1032881A1 EP98939098A EP98939098A EP1032881A1 EP 1032881 A1 EP1032881 A1 EP 1032881A1 EP 98939098 A EP98939098 A EP 98939098A EP 98939098 A EP98939098 A EP 98939098A EP 1032881 A1 EP1032881 A1 EP 1032881A1
Authority
EP
European Patent Office
Prior art keywords
coding
frame
video
packet
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP98939098A
Other languages
English (en)
French (fr)
Other versions
EP1032881A4 (de
Inventor
Zhigang Chen
Roy H. Campbell
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of EP1032881A1 publication Critical patent/EP1032881A1/de
Publication of EP1032881A4 publication Critical patent/EP1032881A4/de
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • H04N19/64Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission
    • H04N19/647Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets characterised by ordering of coefficients or of bits for transmission using significance based coding, e.g. Embedded Zerotrees of Wavelets [EZW] or Set Partitioning in Hierarchical Trees [SPIHT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/65Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/89Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder

Definitions

  • the Internet and its most important application the World Wide Web(WWW) have experienced exponential growth and gained widespread recognition during the past few years.
  • the Internet and the WWW show the promise of becoming a global platform for computing, communication and collaboration.
  • One reason for the phenomenal success of the Internet and the WWW is the successful integration of textual and graphical data and transmission of these static data types.
  • the value of real time media, like real time video and audio on the Internet and WWW has been widely recognized [Adi93, Adi94, BLCGP92, KS92, CTCL95, wp96a, wp96b, Inc97].
  • Supporting dynamic real time media such as real time video and audio, on the Internet enables new applications like real time visual communication, entertainment and distance learning and training, while enhancing the capability of existing ones.
  • the present invention is directed to the problem of how to effectively encode and transmit video over the Internet.
  • Bit Rate and Bandwidth Gap A large gap exists between the compressed video bit rate and Internet bandwidth. Even with sophisticated video compression, the bit rate of digital video is often too high for most Internet connections. For example, a compressed full frame rate(30 f/s) broadcast quality(720 x 480) video runs at a bit rate of 3-8 Mbps [BK95], using MPEG compression. A good Internet connection with a shared Tl line has a maximum bandwidth of 1.5 Mbps. Even with compromised video frame rate, quality and frame size, the bit rate is often high for average and low bandwidth connections. For example, a 10 / 320 x 240 video typically has a bit rate of 100Kbps to 400Kbps with MPEG compression. Currently a home user with dial-up or ISDN service can get a typical bit rate in the range from 14Kbps to 128Kbps.
  • Video compression algorithms achieve compression by exploiting the similarities in the uncompressed video stream and removing redundancy. Similarities in video take two forms, spatial redundancy and temporal redundancy. Within one video frame, neighboring pixels tend to be similar in intensity and color values. Across video frames, frames tend to be similar because of the slow, continuous movement and change in the video sequence. As discussed later herein, spatial redundancy is often removed by transformation and variable length coding. When variable length encoding introduces state information in the bitstream, a bit error can cause the decoder to lose synchronization with the correct decoding state and consequently the decoding process may collapse. Temporal redundancy is removed by predictive coding, which codes only the difference between the current frame and its reference frame.
  • a frame When a frame is to be coded, it uses a coded frame from the past and/or the future as the reference frame. Only the difference between the frame and its reference frame is coded. Difference coding is a major factor for achieving compression. For example, for a sequence of 10 frames of H.263 encoded video, the display order of frames is from left to right and from top to bottom. The first frame is coded as an independent frame that does not use any reference frame. Each of the subsequent 9 frames uses its immediately previous frame as a reference frame. The size for the independently coded frame may be, e.g., 1236 bytes; the average size for the difference coded 9 frames may be 258 bytes. Using difference coding can achieve a size reduction of 70% for this sequence.
  • difference coding is essential in achieving efficient compression, it also introduces dependencies between frames, since a difference coded frame needs its reference frame for correct decoding. Loss of the reference frame will cause damage to the decoding of the difference encoded frame. Sometimes, since the reference frame is also difference coded, the dependencies among the frames form a chain, propagating damage. E.g., if the sixth frame in the ten frame sequence discussed above is lost in transmission, the decoder uses frame 5 as a replacement, damaging the decoding of frame 6. Since frames 7-10 all depend on their immediate predecessors, the damage caused by the loss of frame 6 propagates to all these frames.
  • Low bit rate video coding relies on efficient compression, which is achieved by introducing dependencies between different parts of the encoded bitstream. When one part of the stream is lost, the parts which depend on it are damaged. Sometimes the damage can be propagated. Assuming an accurate similarity measurement and assessment, typically the more dependency a compression scheme introduces into the bitstream, the more efficient the compression scheme is and the lower bit rate it can- generate. However, the more dependencies in the bitstream, the more damage results when part of the stream is lost. In other words, the aforementioned efficient compression is less robust because it is susceptible to packet loss. Thus, there is a conflict between low bit rate and robustness. For effective Internet video delivery, meeting either one of these requirements does not necessarily improve the overall performance.
  • the existing research on Internet video transmission is divided into two camps with two different approaches to addressing the conflict between coding efficiency and coding robustness.
  • FEC Forward Error Correction
  • bit error due to bit corruption during transmission is insignificant compared to packet loss [CS91, Ste90, Par93].
  • a bit error can cause loss of synchronization between the encoder and decoder because variable length coding is used, usually a bit error is corrected in the IP layer and therefore encapsulated from the applications.
  • Another popular approach to dealing with error and loss is layered coding
  • Video data is partitioned into important data, like a lower frequency band, and unimportant data like a higher frequency enhancement band.
  • Different partitions are coded into different layers so that important layers can be sent with a channel that has a better transmission behavior like low delay and low loss rate.
  • the enhancing layer is sent through a channel that has fewer quality of service guarantees.
  • This approach is very suitable for networks where packets can be assigned different priorities and ensured different quality of service. In the Internet environment, however, quality of service guarantees do not exist and no distinctions are made between packet types. Improving the coding efficiency has been the focus for most traditional video compression research [BK95, MPFL97, G. 91, Uni90,
  • the present invention provides a practical solution to the above problems, based on a comprehensive Internet traffic behavior study and with the results used as guidelines for the design and implementation of low bit rate and robust video coding and transmission schemes.
  • the invention addresses problems in both coding and- transmission.
  • For coding a hybrid coding scheme is proposed to increase the robustness.
  • For transmission an effective dependency isolation algorithm is designed to minimize the propagation of packet loss damage.
  • Internet video coding and transmission scheme can thus be realized by properly balancing the bit rate and robustness through improved I frame coding and efficient frame packetization.
  • the invention is based on three major components: a comprehensive Internet video traffic experiment to study packet loss and delay behavior, a hybrid wavelet
  • H.263 coding scheme that improves robustness while keeping the bit rate low, and an efficient packetization scheme which minimizes packet loss damage.
  • the video traffic experiment is used to study the unreliable nature of the Internet. Delay and loss behavior are studied and analysis of their impact are used to guide the design and implementation of the coding and transmission schemes.
  • the experiment is conducted in a wide area network involving a wide range of Internet connectivities.
  • the experimental results are analyzed and their implications for video coding and transmission are presented.
  • the unreliable nature of the Internet is well known [CS91, Ste90]. Packets can get delayed, duplicated or lost. Reliable transmission of textual data is achieved through retransmission provided by TCP.
  • the flow control and retransmission scheme in TCP is designed to work for all network situations with different levels of unreliability. Transmitted data is 100% correct despite the delay and loss rate. The requirement of 100% correctness but exible delay tolerance makes it possible to have a conservative, universal error correction and handling scheme.
  • the Internet's delivery of video is much more flexible in term of correctness, but delay is less tolerated. That video delivery can tolerate some loss opens a set of error handling schemes including retransmission, layered coding, and FEC. These schemes are applicable in different loss and delay situations.
  • Packet loss pattern Is there a pattern of packet loss? Is the packet loss bursty? - Loss rate vs. packet size: Is there a relationship between packet loss and packet size?
  • the sender contacts the receiver and establishes the two connections.
  • the sender also sends the parameters for the current experiment to the receiver.
  • the setup phase is followed by the experiment phase, during which the actual experiment is conducted. Finally when the experiment is done, the result is sent back from the receiver to the sender.
  • the host machine in DCL, UIUC is designated as the sender machine.
  • the other three machines are designated as receiver machines.
  • Figure 1 shows this configuration.
  • the sender machine conducts experiments with one of the receiver machines at a time.
  • the experiments are done in groups at different times during work days. Each group consists of 20 to 32 individual experiments with different settings. They are carried out one by one with a 5 minute interval. The interval is designed to let the network settle down so that experiments will not affect each other.
  • Packet size specifies how large one packet is. This size does not count the extra header bytes that the IP and UDP layers add.
  • Sending interval specifies how often a packet is injected into the network. The combination of packet size and sending interval determines the bandwidth usage. Packet sizes are chosen from 500, 1024, 2048, 5120, 8000 bytes to simulate MPEG B, P and I frame sizes. Sending interval simulates the periodic real time behavior of video. The sending intervals are chosen from 50, 100, 200, 500 milliseconds to simulate 20, 10, 5 and 2 frames per second video sequence. Besides the data segment, each packet has a 4 byte header filed with a sequence number.
  • the sequence number is extracted by the receiver machine and is used to identify which- packet is lost.
  • the arrival time at the server side is recorded. This information is sent back to the sender machine at the end of the experiment by reliable TCP connection for analysis. Based on this information, the sender machine measures packet loss rate, inter-arrival delay, and bandwidth usage.
  • Loss Rate vs. Packet Size Figure 2 plots the loss rate vs. packet size and sending interval of the first group of each site. The experiments show a clear correlation between packet size and packet loss rate: the loss rate increases with packet size.
  • the experimental video traffic packet is sent through UDP by the sender. When it reaches the IP layer, it is assembled into IP packets.
  • the IP layer can accommodate packets as large as 64Kbytes[Ste90]. Therefore the IP layer does not perform any fragmentation since our experiment packet size ranges from 500 to 8000 bytes.
  • the packet is broken down into smaller transmission units suitable for different networks. For example, the maximum transmission unit- (MTU) on the Ethernet is 1500 bytes. Therefore, en route to the destination, a large IP packet will be decomposed into unit packets when entering a network.
  • MTU maximum transmission unit-
  • the size of the packet is determined by the MTU of the specific network, the smaller packets are reassembled back into the original IP packet when exiting the network. Given that the probability of a unit packet being lost is fixed, then the more unit packets a IP packet needs to be decomposed to, the more likely it will be lost. Therefore, large packet size results in higher loss rate. Further examination of the result complies with the analysis. Assuming that most of the networks on the Internet are Ethernet using an MTU of 1500 bytes, a video packet with size of 8000 bytes will be broken down into about 6 packets. Let the possibility of one packet getting lost be x, then the probability for at least one of the 6 packets to be lost is 1 - (1 - x)6 .
  • the average loss rates for a packet size of 1024 bytes for each site are 3:5%, 12%, and 24%.
  • the expected loss rates therefore are 19%, 57%, and 80%, respectively.
  • the average loss rate from the experiment shows an actual average loss rates of 21%, 62% and 76%, which are very close to the expected loss rates.
  • Packet Loss vs. Arrival Interval In an ideal situation, in which packets are not lost and the network delay is the same for all packets, packets will arrive at the receiver side in a fixed interval that is equal to the sending interval. However, in a packet switched network like the Internet, different packets may experience different network delays. As congestion begins to build up, packets are likely to experience increased delay until a packet is lost. This prediction of congestion has been used in some transmission systems [BDS96]. Our experiment, however, shows no noticeable increase in delay before loss. Figures 3 to 5 plot the arriving interval for a packet size of 1024 for the first group of experiments at each site. The majority of the losses happen without any increase in the arriving interval of the previous packets.
  • Switch delays or queuing delays are not the dominant delays. If propagation delays and processing delays are dominant, the queuing delay would not be reflected in the final arrival interval.
  • the first 30 packets arrived at the- server side at the same time. This occasional phenomenon can be explained by looking at the packet route set up. Once the client and server established UDP connections, the probe packets were injected into the network. The first packet, when traversing through the network, required the routers to perform destination lookup [CS91, Ste90, Par93, Tan89]. In the Internet routing implementation, the lookup result is cached for later packets, thus the later packets can get through quickly and eventually catch up with the first packet. Therefore the initial series of packets arrive at the same time at the destination.
  • Figures 7A-7C shows the loss pattern of the first group of experiments from the Maryland site. The number of consecutive losses is plotted. In all experiments except the one with packet size of 8000 bytes and sending interval 50 milliseconds, the number of lost packets that occurs in N consecutive losses is always greater than that in M consecutive losses where N ⁇ M. The majority of the packet losses happen independently. Two consecutive losses are small and it is rare to lose more than 2 packets. Packets are most likely lost in a random fashion rather than in a burst.
  • packet loss is usually assumed to result from bursty loss behavior. This is because when an intermediate switch becomes congested, the congestion will last for a while before the situation restores to normal condition. Packets arriving during this period of time will be dropped. Therefore, packet drop can potentially happen in a burst.
  • the bursty effect is not so obvious. Packet loss behaves in a random fashion. Again this may be caused by the many switches that are involved in this experiment.
  • Packet Size The experiment shows that the loss rate increases with large packet size. The increase in loss rate confirms the result analysis by packet segmentation. It is reasonable to use large packets since the statistical loss rate is the same with small packets. Additionally, using large packets can reduce the overall header overhead imposed by IP and UDP header on each packet. However, using large packet size increases the bandwidth usage if a retransmission based scheme is used for loss recovery. This is explained by the following simple example. Suppose we have an 8
  • Kbyte video frame It can be sent as one 8 Kbyte packet or 8 packets of 1 Kbyte. Under the same loss rate, the chance for the 8 Kbyte frame to arrive equals the chance of all 8 small packets to arrive. So to the IP layer, the amount of data that will be transferred by sending one large packet is equivalent to the amount of data sent by 8 small packets. However, at the application level, when a packet is lost, the application can still get some packets when using small packets. But the application- will probably get none if using large packets. Under the same loss rate, when using small packets, the application can see at least some of the packets instead of seeing none. Under a retransmission scheme, the application using small packets needs only to request those lost small packets.
  • Video Bit Rate The experiments show that in the wide area network environment, the loss rate of a particular traffic stream is not directly related to the sending rate. This might be explained by the fact that some intermediate switches are handling so many traffic streams that the change in one stream has little impact on the overall result. This observation may encourage higher bandwidth usage. However, an excessively high bit rate is neither feasible nor desirable. It is not feasible because the bottleneck for the majority of low bit rate connections are from their modems or ISP entry points.
  • Packet Loss Handling An ideal coding and transmission scheme should have good recovery scheme for the random loss. Errors are typically recovered by retransmission or redundancy in the bit stream. Retransmission will increase the delay and use of redundancy increases the bit rate and may not work well under very high loss rate. These recovery schemes may not work in situations with strict time requirements and limited bit rates. When unrecoverable losses occur, the coding scheme and transmission scheme should be flexible enough to recover the video to the best possible quality with partially arrived messages.
  • the codec is designed to enhance the robustness of the bitstream against packet losses while keeping the bit rate reasonably low. It takes advantage of the improved predictive coding in H.263 and low bit rate I frame coding provided by wavelet.
  • H.263 Low Bit rate Coding H.263 [Uni95, Tel96, Res95] is based on H.261[Uni90]. It is targeted at low bit video coding applications with bit rate low enough to get through phone lines (Plain Old Telephone) with bit rate in the range of 10 to 30 kb/s. Prototype H.263 systems [Tel96, Res95] have demonstrated that its coding is very efficient.
  • H.263 is a 16 x 16 macroblock, 8 x 8 subblocks DCT-based coding scheme with motion estimation and motion compensation. It supports standard ITU formats. There are three kinds of frames defined in the standard. Two of them, I frame and P frame, are inherited from H.261.
  • H.263 introduces the PB frame as the third frame type.
  • PB frame combines two frames in one: a P frame in the future and the current B bidirectional frames using the previous P frame the P frame in the future.
  • I frame encoding is a DCT-based coding similar to that used in MPEG.
  • Unrestricted motion vector mode Motion vector can point across a frame to allow more accurate motion estimation.
  • - Syntax-based arithmetic coding mode Arithmetic coding [WNC87] can be used to replace the variable length coding.
  • - Advanced prediction mode 8 x 8 blocks can be used instead of 16 x 16 macroblocks for motion estimation, allowing more accurate motion estimation.
  • the overhead is that four pairs of motion vectors are needed instead of one.
  • - Half-pel motion estimation Instead of basing on integral pixel, motion is estimated to an accuracy of a half pixel.
  • H.263 very effective in predictive coding and allow it to produce very low bit rate bitstreams.
  • the coding of the I frame remains similar to the other standards like H.261 and MPEG except that arithmetic coding can be used instead of the traditional variable length coding.
  • I frame coding can propagate the damage caused by one lost frame.
  • loss of the first P frame damages the decoding of the second P frame.
  • the third frame uses the second P frame as a reference frame, the damaged second frame causes the third frame to be incorrectly decoded. The initial damage by the loss of the first P frame propagates to the rest of the sequence.
  • I frames can effectively provide this kind of resynchronization.
  • the more I frames that are introduced into the sequence the more resistant the bitstream is against frame losses. For example, for the Miss America video sequence, assuming a packet loss rate of 10%, packet loss happens once in every ten packets. If a frame rate of 10 f/s is used and an I frame is inserted every 10 frames, then loss of a packet will cause an average of 5 frames to be damaged, and the effect lasts half a second before an I frame repairs the damages. More I frames need to be inserted to reduce the number of frames that can be affected. The above analysis makes it reasonable to add more I frames into the bitstream.
  • - DCT-based I frame coding for low bit rate produces blocky picture.
  • Wavelet I frame coding produces low bit rate compression and does not affect motion estimation and motion compensation processes of the predictive frame coding.
  • Wavelet Based Still Image Coding The structure of the wavelet transform-based image coder also complies with the generic structure described in Figure 2.2 of Chapter 2 of the above-cited Chen thesis.
  • wavelet transformations By using a combination of low-pass and high-pass filters, wavelet transformations successively convert an image from space domain to multiple frequency subbands.
  • the image is first transformed into four frequency subbands as shown in Figure 8.
  • the four subbands denoted by LL1, HL1, LH1 and HH1, correspond to the four subbands with combinations of low frequency and high frequency components along horizontal and vertical dimensions.
  • HL1 denotes a subband that has a high frequency component along the vertical direction and low frequency component along the horizontal direction.
  • Subband LL1 has the low frequency components for both dimensions. It is further transformed to get finer scale (see Figure 9). This process continues until a desired scale is reached.
  • Figure 10 shows a 3 level wavelet transformation of an example image.
  • Wavelet Transformation vs. DCT Transformation At a very low bit rate (less than 1 bit per pixel), DCT based transformation produces very blocky images while the wavelet based compression scheme gives reasonable results.
  • the top of Figure 11 shows a picture compressed with JPEG at 0:08 bit per pixel(bpp). The blocky effect is very noticeable.
  • the lower part of Figure 11 is the same image compressed with the zero-tree based wavelet compression algorithm detailed later in this chapter.
  • the blocky effect at very low bit rate is inherent in the DCT based compression schemes.
  • the frequency domain is decomposed into 64 fixed-length frequency subbands. Each coefficient of the 64 subbands represents the overall energy level of that subband for the block.
  • Wavelet based compression algorithms tend to allocate bits evenly to low and high frequency parts. This is because unlike DCT, wavelet has a nice space-time locality property [SRO96]. After an image is transformed, coefficients with large magnitude within each frequency band tend to form clusters that correspond to the high energy edge locations in the original image. These clusters resemble the edges in the space domain.
  • Figure 10 shows the coefficients of the example Lena image after 3 levels of wavelet decomposition. For the purpose of illustration, the coefficients are normalized to be in the range of 0 to 255 for drawing. We can see from the figure that in each subband, large significant coefficients correspond to the- edge of the original image. Therefore it gives an outline of the original image in each subband. This outline resembles each other across all the subbands. Wavelet coding algorithms take advantage of this clustering within each subband and resemblance across subbands offered by the time-frequency locality property. Therefore, even at very low bit rate, the wavelet compression algorithms can maintain both low and high frequency components.
  • the zero-tree approach captures the clustering with subbands and resemblance between subbands by a tree- like data structure called zero - tree. In this structure, if a coefficient is insignificant with regard to a certain threshold, then the coefficients that represent the same spatial area in the higher frequency subbands are likely to be insignificant.
  • the Zero-tree method introduces a special zero-tree symbol to denote such a tree.
  • the morphological based approach uses morphological operations to capture the intra-band clustering. It uses the statistical model of one subband to guide the coding of another subband, taking advantage of the similarity between subbands.
  • I frame coder based on the Zero-tree approach is implemented.
  • the implementation is kept simple and efficient. While image coding usually does not have very strict time constraints and can afford sophisticated computation, I frame coding and decoding in video have time restrictions. Also the computational complexity needs to be controlled since video is often accompanied by other timely media like audio. Excessive consumption of CPU cycles will cause degradation of the overall playback performance.
  • each subband represents a certain frequency band.
  • coefficients form clusters corresponding to positions in the space domain.
  • the- outlines of the original image shape formed by the clusters resemble each other.
  • Each coefficient in the lower frequency subbands has four coefficients in the next higher subband representing the same spatial area.
  • a coefficient in a lower frequency band is regarded as the parent of the four coefficients covering the same spatial area in the next higher frequency band. This parent-children relationship generates a tree structure representation of coefficients across subbands.
  • Each coefficient has four descendents in the higher frequency subband and one parent in the lower subband. Exception to this rule is that coefficients in the last subband (representing the highest frequency subband) do not have any children and coefficients in the first subband (lowest frequency subband) do not have parents.
  • Figure 12 shows this tree structure.
  • the parent-children tree structure captures both the infra-band clustering and inter-band resemblance. Coefficients at the same level in a parent-children tree represent neighboring coefficients in the same subband. The tree also captures the resemblance across subbands. A coefficient and its children are likely to have similar energy levels because they represent the same spatial areas.
  • the zero-tree approach takes advantage of these properties by using a special data structure called zero-tree.
  • the Zero-tree is a special parent-children tree in which all the nodes of the tree are insignificant to a certain threshold. For example, given a threshold 1024, a parent-children tree is said to be a zero-tree if all of its nodes have values less than 1024.
  • the Zero-tree captures infra-band clustering and inter-band resemblance. If a coefficient is insignificant to a certain threshold, it is likely that all its children will be insignificant too. Therefore zero-tree can be represented by only one symbol(zero tree root) in the encoded bitstream.
  • Embedded coding has the property of being able to stop at any point in the bitstream for decoding with maximum contribution. It is ideal for rate control since the encoding or sending process can control precisely the sending rate by stopping whenever the rate exceeds the predefined rate.
  • Embedded coding in the zero-tree approach uses bit plane scanning and zigzag subband scanning. Before the encoding process, a magnitude of the largest coefficient is determined and based on that value an initial threshold is determined.
  • the embedded coding process can be viewed as two nested iterations ( Figure 14).
  • the first is the iteration through the bit plane, from the largest coefficients to 0.
  • the second iteration is within each bit plane scanning, subbands are scanned in a zigzag manner. Coding of each subband can be regarded as a two-step process. In the first step the significant coefficients and their positions are identified. This information is called the significance map. Coding of the significance map is followed by the refinement pass, which refines by one bit the coefficients of this subband that are previously found significant.
  • each coefficient can have one of the following statuses:
  • Zero-tree node Determined to be part of a coded zero-tree, then this coefficient is skipped.
  • Zero-tree root This coefficient is insignificant with regard to this threshold and all its children are insignificant too. A zero-tree symbol is coded. - Isolated zero: This coefficient is insignificant but one or more of its children are significant. An isolated zero symbol is coded.
  • the wavelet I frame codec includes both an encoder and a decoder. It is combined with the Telenor H.263 codec [Res95] to form a hybrid H.263 wavelet codec.
  • the hybrid codec codes each video frame as an I frame or P frame according to a I frame distance factor.
  • I frame distance factor specifies how often a frame needs to be coded as an I frame. For example, an I frame distance factor of 5 specifies that an I frame is coded in every 5 frames, generating a coding sequence of IPPPPIPPPP:::. The performance is measured on a SGI Indy with 150 MHZ IP22 Processor with MIPS R4010 Floating Point Chip and MIPS R4400 Processor Chip.
  • the compiler is the C++ compiler shipped with the Irix 5.3 Unix system.
  • Compiler option -O optically used for compiling the codecs.
  • Two video sequences are used for the measurements.
  • a Miss America video is a talking head sequence with little motion.
  • a Jesus to a Child sequence is a music video with higher motion.
  • the video sequences are coded first using the regular H.263 codec [Res95, Tel96] and then using the hybrid wavelet H.263 codec.
  • Two distance factors, 5 and 10 are used.
  • the following sections present the coding efficiency and coding time of these two schemes.
  • the performance is compared to the modified H.263 codec with support for the I frame distance in which the I frame is coded through regular H.263 I frame coding.
  • Tables 6 and 7 show the coding results using the modified H.263 coder with the conventional H.263 I frame coder with distance factors 10 and 5.
  • Tables 8 and 9 show the coding results for the hybrid wavelet H.263 codec with wavelet I frame coder with the same distance factors.
  • the measurements in the tables show the number of I and P frames, total I frame sizes and P frame sizes, the average Peak Signal to Noise Ratio of the luminance and two chrominance planes, and total size.
  • I frame sizes are reduced by about 38% and 39% for the two video sequences.
  • the overall size reductions are 19% and 26% for the Miss- America video sequence with distance factors 10 and 5.
  • the size reductions for the Jesus to a Child video are 15% and 22%.
  • the reduction of the video sequence size does not degrade the video quality.
  • the PSNR of all the three component coded with the hybrid codec is equal or higher than the conventional H.263 I frame codec.
  • the computational cost of the hybrid wavelet H.263 codec is measured and analyzed in this section. Nearly all three stages in the wavelet I frame coding are more computationally involved than in the DCT transformation. At the transformation stage, Wavelet transformation is more expensive since it uses floating point operations while DCT transformation uses integer operations. At the quantization step, the bit plane scanning and zigzag subband scanning of the wavelet coding has more organizational complexity than the simple division based quantization matrix used in DCT. Finally, wavelet uses adaptive arithmetic coding at the final compression stage, which is more expensive comparing to the Huffman coding in DCT, which has been implemented efficiently as table lookup.
  • wavelet compression has been used mainly for still image coding.
  • the cost of the wavelet coding can be amortized since wavelet encoding and decoding happen only at fixed interval specified by the distance factor, during which inexpensive DCT-based coding is used.
  • Tables 10 and 11 show the encoding time of the H.263 I frame codec with distance factors 10 and 5.
  • Tables 12 and 13 present the encoding time of the hybrid codec with the same distance factors 10 and 5. The table shows that the I frame takes about 2.5 times longer to code when the hybrid codec replaces the conventional H.263 I frame codec.
  • the overall encoding frame rate is decreased by only about 2% to 12%. This is because coding of the P frames, with extensive and sophisticated motion estimation and motion compensation, dominates the total coding time. As a result, the increased coding time caused by wavelet I frame coding does not severely degrade the overall encoding performance.
  • Tables 14 and 15 present the decoding time of the H.263 codec with distance factor 10 and 5.
  • Tables 16 and 17 show the decoding time of the hybrid codec.
  • the decoding time of a wavelet I frame is about 5 times greater than that of a H.263 I frame decoding.
  • the decoding frame rate drops from 25 to 30 f/s to 12 to 18 f7s. The degradation is large, but, 12 to 18 f/s is still an acceptable playback rate for low bit rate connections.
  • a hybrid coding scheme has been described based on H.263 and wavelet.
  • H.263 offers excellent prediction frame coding and is ideal for low bit rate coding.
  • Zero-tree based wavelet I frame coder is described and it proves to be effective in increasing robustness while keeping the bit rate low.
  • the packetization method utilizes the macroblock level dependency structure and packetizes macroblocks to minimize dependency between packets.
  • the design and implementation of this packetization method and measure its efficiency and improvement over traditional packetization method.
  • Figure 1.2 of Chapter 1 of the above-cited Chen thesis shows the effect of cascade damage caused by the loss of one P frame in an H.263 inter-frame coded sequence.
  • One way to improve the robustness is to introduce more infra-coded macroblocks and frames.
  • Infra-coded frames provide resynchronization points and prevent the damage from propagating. This method has been adopted in traditional coding schemes as in NV and F/S [Ron92, IR] where all blocks are intra- coded to achieve maximum robustness. However, using exclusive infra-coded frames results in high bit rate.
  • NV and F/S [Ron92, IR] where all blocks are intra- coded to achieve maximum robustness.
  • exclusive infra-coded frames results in high bit rate.
  • We have described above a proposed hybrid H.263 wavelet codec which utilizes the wavelet for I frame coding to lower I frame size. Regularly inserting frames maintains reasonably low bit rate.
  • the I frame vs. P frame size ratio is still large.
  • the wavelet coded I frame size drops about 40% as described above.
  • I frame size is still big compared to the small P frame.
  • the bit rate increase from distance factor of 10 is manageable, but, the very small distance factor will increase the bit rate dramatically. Therefore, efficient predictive coding is still needed to meet the low bit rate requirement.
  • Dependencies in the bitstream commonly form chains.
  • three parts (sections) of a bitstream A, B, and C are inter-coded macroblocks or frames.
  • C depends on B and B references A, then A, B and C form a dependency chain. Damage propagates when A B C are separated in different packets. Losing A damages B and the damage propagates to C. This problem can be solved by keeping the dependent parts in the same packet.
  • Traditional packetization schemes which are often based on frame or simple macroblock groupings like those used in [Smi93, MJ95, CTCL95], do not consider the dependency structure.
  • An objective of the present invention is to analyze the dependency structure and to isolate and minimize the dependency between packets. In the following discussion, we investigate the P frame macroblock level dependency structure and propose a new packetization scheme that minimizes dependency between packets. The dependency isolation scheme prevents damage propagation, minimizing packet loss damage.
  • the DCT-based transform coding scheme has two types of basic frames: infra-coded frame (I frame) and inter-coded frame (P frame).
  • I frame infra-coded frame
  • P frame inter-coded frame
  • An infra-coded frame is coded independently of other frames.
  • a predictive coded frame uses an I frame or another P frame as its reference frame. Only the difference is coded between the P frame and its reference frame. Correct decoding of a P frame relies on the availability of the reference frame.
  • the dependency between two frames is actually a dependency between the macroblocks of the two frames.
  • video frames are partitioned into macroblocks.
  • a macroblock covers a 16 x 16 pixel area.
  • a macroblock is considered as one unit when coded.
  • I blocks are coded independently of other blocks.
  • the macroblock is transformed, coded using DCT transform and entropy coding.
  • I frames are made of I blocks. I blocks also appear in P frames when the motion estimation process cannot fend a good match with blocks in the reference frame.
  • P block is encoded by the difference between this block and blocks it references, along with its motion vectors.
  • Skipped P block The block is skipped. The difference between this block and the block in the same position in the reference frame is measured by motion estimation to be below a threshold and the difference is regarded as zero.
  • the decoding process simply copies the reconstructed image data from the corresponding block in the reference frame.
  • Decoding an I block is a decoding process which performs variable length decoding, inverse quantization and inverse inverse DCT transformation.
  • For a P block is a two-step process with a regular decoding and a reconstruction process. First the difference and motion vectors are decoded (decoding process), then the image data from the corresponding block in the reference frame is copied and added to the decoded difference (reconstruction process).
  • Figure 16 shows the macroblock dependency structure.
  • Two blocks in the I frame are coded as I blocks and are used by two P blocks in the first P frame.
  • the skipped P block in the first P frame does not have anything to be coded. It has only an indication bit specifying that it is a skipped P block and the decoding process can just copy the previous P block in the same position.
  • the skipped P block also has zero difference and zero as motion vector.
  • the coded P block in the first P frame codes the difference and has a motion vector [0; 0].
  • the coded P block in the second P frame has non-zero motion vectors and it depends on the coded P block in the first
  • the reference area can cover up to 4 macroblocks. Therefore, it references a range of blocks from 1 to 4.
  • a coded P block with motion vector [- 8, -8] references to a area that covers four macro blocks in the reference frame.
  • packetization can take into consideration the dependency structure and can minimize the damage caused by the packet loss. Under the same packet loss rate, a packetization scheme produces packets independent. The damage resulting from packet loss is localized to the lost macroblocks. This is better than arbitrary packetization scheme where loss of one packet not only renders its own macroblocks not decodable, but also damages to macroblocks that depend on it.
  • Figure 18 shows this concept. Assuming a simplified dependency structure as shown at the top part of the figure, two packetization methods are shown. The first, labeled "Common method" at the bottom of the figure, packetizes each P frame as one packet. The star and the dot elements in the original sequence still depends upon each other. Packet 3 relies on Packet 2 for correct decoding and packet 2 relies on
  • Packet 1 In this packetization method, when Packet 1 is dropped, Packets 2 and 3 will be damaged.
  • the second method labeled as “Dependency isolation”, packetizes the macroblocks according to their dependency relationships. All the star and dot elements are put into two separate packets. The packets preserve the dependency within the packet and eliminate dependency between packets. Therefore, the packets are independent of each other, so loss of one packet will not affect others. Unlike the first scheme, this method allows the same number of packets to get through under the same loss rate. Using the dependency isolation method all packets that arrive can be- decoded correctly, while the packets delivered by the common method suffer from damage caused by the lost packets.
  • Sort and Merge Packetization The previous discussion involves the dependency isolation packetization method for a simplified dependency structure.
  • H.263 is much more complex. This is because motion vectors point to a macroblock region in the reference frame which may be across macroblock boundaries.- Therefore a block may depend on multiple macroblocks, as shown in Figure 17.
  • a group of P frames are to be analyzed, the dependency relationship between macroblocks across frames can be effectively described as a tree, as depicted in
  • the coded P block in frame 3 references 4 blocks in frame 2.
  • the 4 blocks in frame 2 reference 9 blocks in frame 1.
  • the coded P block in frame 3 represents a tree root and each of the other blocks involved can be regarded as a node in the dependency tree.
  • the objective for the packetization algorithm is to packetize the trees so that losing one tree will not affect others.
  • each macroblock of the last P frames may have a dependency tree.
  • each dependency tree can be collected as one packet.
  • one packet can be organized from the dependency tree consisting of one macroblock in frame 3, four macroblocks in frame 2, and six macroblocks in frame
  • each dependency tree will be put in one separate packet, resulting in simple packetization, completely eliminating dependency between packets.
  • the disadvantage is that dependency trees may share nodes at their lower levels, causing some reference blocks to be packetized multiple times. This is shown in Figure 20.
  • Two dependency trees are formed for the two illustrated macroblocks in frame 3. While the two trees have distinct reference macroblocks at frame 2, they share two common reference blocks at frame 1. Packetizing each tree as one packet causes maximum redundancy. The solution in the other spectrum is to packetize all the trees in one packet to eliminate dependency and redundancy. This is not feasible because the packet has size limitation. The right packetization must lie between these two extreme cases to minimize dependency while maintaining small packet size.
  • the number of common nodes between two dependency trees can be regarded as the number of dependencies between them. Given dependency trees A and B sharing m common nodes, if A and B are put into two different packets and redundant packetization is not allowed, then the dependencies between the two packets increase by m. Losing the packet with these common nodes will damage the other packet. The more common nodes A and B share, the more severe the damage will be. If A and B are put in one packet, then m dependencies between the two packets are eliminated. To minimize dependencies between packets, an intuitive approach would combine dependency trees with the largest number of common nodes into the same packet, eliminating the largest number of dependencies first. This is based on the observation that the most dynamic part of the frames, for example a moving object, tends to cause the largest
  • a packetization algorithm can eliminate most dependency if it can capture the moving objects and group the dependency frees covering the motion together into one packet. Since the number of nodes a tree holds in common with neighboring trees is a good indication of motion, a packetization algorithm can be designed to group dependency trees according to their common nodes. This motivates a sort and merge algorithm.
  • the sort and merge algorithm for packetization is shown in Figure 23.
  • the input is the dependency trees of macroblocks for the last P frame.
  • the output is an array of packets, each consisting of one or more dependency trees. Each packet does not exceed a predefined packet size. At the beginning, each dependency tree is initialized as one packet.
  • the main step of the algorithm is a loop of sort and merge until no more packets can be combined.
  • each loop first the number of common nodes with neighbors for each packet are calculated, then the packets are sorted and the pair of packets is picked up for a merge. Two packets may be merged if they have the largest possible number of common nodes and their merge will not create a packet exceeding the predefined maximum size.
  • the last step in the loop body is the update of neighboring relationships. After two packets are merged, they become one packet and their neighbors are combined. Packets neighboring either or both of the two merged packets are updated with the new neighboring relationship.
  • the packetization algorithm is a greedy algorithm, It picks packets to merge which have the largest number of dependencies, eliminating most dependencies between two trees first. Another feature of this algorithm is that it can capture the moving objects into packets. This is because dependency trees that form a moving" object have the most common nodes between them. The packetization algorithm merges these trees together first, as a result, the dependency trees that form a moving object are merged and captured in one packet.
  • the packetization algorithm is applied to the H.263 coding scheme.
  • the video sequence is segmented into groups. Each group has an I frame and a number of P frames determined by the distance factor. For example, a video sequence coded by the hybrid codec with distance factor of 10 will have one wavelet I frame and 9
  • H.263 P frames The coded wavelet I frame as described above serves as a resynchronization point to limit the packet loss damage within each frame groups.
  • the H.263 video coding and decoding algorithms must be modified. These include:
  • the modifications essentially change the coder and decoder of the H.263 coding algorithm from frame based encoding and decoding to macroblock based.
  • the original H.263 is frame based and macroblocks are ordered according to their original position in the frame.
  • the encoding process sequentially encodes each block and arranges them in a certain order in the coded bitstream. Once the frame size and macro block size information is available to the decoding process, it accordingly decodes each block and determines the block's position from the order of the blocks in the encoded bitstream. In such a scheme, the coding of macroblock position is implicit. When macroblocks are separated from each frame and reorganized into different packets, their original frame positions need to be carried to the decoder.
  • a skipped P block is represented by 1 bit in the bitstream to preserve its position since there is no explicit coding of macroblock positions. This might not be necessary if there is another way to code the macroblock positions.
  • the H.263 coding algorithm is highly optimized, in addition to improvements and fine tunings over the inter-frame frame coding.
  • the coding of macroblocks within one frame is also improved.
  • the motion vectors for neighboring macroblocks are coded with differential prediction. For each vector, a predictor is computed according to the motion vectors of its neighboring macroblocks. Then the difference between the motion vector and its predictor is coded. This is shown in Figure 24 adopted from [Uni95], coding of MV, the current motion vector uses a predictor of the median of three other motion vectors of the neighboring macroblocks.
  • Predictive coding of motion vectors Predictive coding of motion vectors is validated because motion vectors for neighboring macroblocks tend to be similar.
  • the H.263 coder is modified to disable the predictive coding of motion vectors.
  • the overhead caused by this modification is not so significant.
  • the overhead for disabling the predictive coding of motion vector is less than 4%.
  • each dependency tree For each dependency tree to be packetized, the position of each macroblock can also be implicitly coded. Rather than relying on the orderly encoding and decoding of macroblocks in the frame based encoding and decoding scheme, this algorithm relies on the knowledge of the motion vectors.
  • the motion vector of first macroblock on the top of tree is used to determine which macroblocks are needed from the next level. These macroblocks are inserted to the second level.
  • the macroblocks it depends on in the next level are inserted in the next level queue according to their order.
  • macroblocks When macroblocks are inserted into the queue, they are checked if the same macroblock is already in the queue. They are discarded if the macroblocks are already in the queue.
  • Figure 25 demonstrates such an organization algorithm and Figure 26 shows the organization of macroblocks for one example dependency free.
  • the positions of the macroblocks are implicitly coded in such an organization scheme and can be recovered at the decoder side.
  • the decoder needs only the position of the topmost block (root block), from which it can derive the positions for all other macroblocks by executing the same sort and insert algorithm. Therefore the position information is implicitly carried in the motion vector and the organization process.
  • the same algorithm can be used except that the position of each top level macroblock must be stored for starting position calculation. After the first dependency tree is packetized, a queue similar to that shown in Figure 26 is constructed.
  • the algorithm checks the queue to see if it has already been packetized, the macroblock is discarded if it has been packetized. Otherwise it is inserted into the queue and the macroblock is added to the packet.
  • the depacketization and decoding is a reverse of the packetization process where the queues can be reconstructed.
  • This algorithm is notable for using its execution path as a means of compression.
  • the macroblock positions are not carried by the natural ordering of blocks as used in the conventional H.263 scheme.
  • the position information is not coded either to avoid the large overhead of extra data. Instead the information is carried in the sorting algorithm assisted by the availability of the motion vectors.
  • the decoder and encoder use the same sorting algorithm.
  • Packetization and depacketization algorithms are the reverse of each other. Consequently, the macroblock positions are carried in the sorting algorithm execution path. This is a means of compression using computation process rather than computation result.
  • Conventional compression involves computation as well as computation results. Take DCT transformation for example: the compression is carried out in the fransformation and quantization, but the computation path is not necessarily known to the decoding process. This is because the computation process produces a bitstream and the decoding process can look up its dictionary and recover from the bitstream.
  • the compression of the macroblock positions does not produce a bitstream; there is no information transmitted to the decoder for a dictionary lookup. Instead the information is all carried in the sorting algorithm.
  • Macroblock Decoding and Reconstruction Order As stated previously, the decoding of the coded P macroblock is a two step process. First the difference between the macroblock and its reference blocks is decoded, then a reconstruct process occurs when the difference is added to the reference block pointed to by the motion vectors.
  • Frame based H.263 decoding bundles these two steps together because all the reference blocks are available immediately after decoding. Packetizing the macroblocks horizontally eliminates the natural decoding order of the macroblocks, Since packets can be delivered out of order, reference macroblocks may not arrive in time for decoding. In such situations, the reconstruction step has to be delayed; decoding and reconstruction in the frame based decoding process must be separated.
  • the skipped blocks in the frame based H.263 coding scheme are represented by one COD bit, signaling that these macroblocks are not coded. Reconstruction simply copies the corresponding macroblocks in the reference frame. These skipped macroblocks, when present in a dependency tree, still need to be coded to represent the macroblock position in the dependency tree. However, skipped macroblocks that are not part of the dependency tree can be discarded. Discarded macroblocks can be treated as lost macroblocks, so accuracy is not affected.
  • Figure 27 shows how skipped macroblocks can be pruned from the dependency tree.
  • the circled nodes in the figure are skipped P blocks, since they are leaf nodes and do not carry position information, they can be pruned from the dependency tree and the decoder can treat them as lost blocks.
  • Video Sequence Selection To test the efficiency of the packetization scheme, we selected 10 video sequences: 5 video sequences from the standard H.263 test sequence from bonde.nta.no, and 5 from the Vosaic [LLC96] video library. The video sequences are selected intuitively by visual appearance to cover a wide range of videos, from low motion talking head videos to rapid motion music video and movie previews. Table 18 presents the selected video sequences and their descriptions.
  • the video sequences are analyzed and categorized according to their motion levels.
  • the number of the blocks of each kind are counted and averaged. Finally the average P frame size is used to reflect the overall motion level; small frame size indicates dominant number of skipped P blocks. Large size results from a large number of coded P blocks and I blocks. In video sequences with similar P frame size, the average number of coded P blocks and I blocks are regarded as at the same motion level. Table 19 shows the result of this motion level measurement.
  • the categorization confirms the visual perception. The measurement method divides the video sequences into three categories. The first three video sequences are talking head videos with low motion. P frame sizes are below 150 bytes, and no or very few blocks are coded as I blocks, indicating no scene change. The average number of P blocks is below 30, suggesting that the movement is slow.
  • the second group of videos also has few I blocks, but it has substantially more coded P blocks than the first group. Although this group has no or few scene changes, each scene exhibits more motion.
  • the videos in the third group have numerous I blocks and P blocks indicating many scene changes and high motion levels within each scene.
  • the three selected video sequences are first coded with regular H.263 codec.
  • the first frame is coded as an I frame and all subsequent frames are coded as P frames.
  • a frame based packetization scheme is used and the robustness is measured under different packet loss rates. Table 20 gives the result PSNR values which are also plotted in Figure 28.
  • the original coding and primitive packetization is extremely vulnerable to packet losses. Packet loss of more than 5% causes the medium and high motion level videos to deteriorate. The low motion video shows slightly better results but it still cannot tolerate packet loss of more than 10%.
  • the Miss America sequence shows the best quality for all the loss rates. Throughout the Miss America video sequence, the frames do not change much and the changes are not significant; therefore, loss of these updates has relatively less impact and the playback can maintain slightly better quality.
  • the medium and high motion videos the two sequences have similar quality degradation for the loss rate in the range of 5% to 10%. However, for the loss rate in the range of 10% to 40%, the medium motion Jesus to a Child sequence has sharper quality degradation, which can be explained by the packet loss damage propagation.
  • FIG. 30 The video sequence is severely damaged by the packet loss and propagation.
  • the three selected sequences are coded and tested using the new hybrid codec and dependency isolation packetization method. Different distance factors are used for each video.
  • the video sequences are tested against different packet loss rates. For each test, since the video sequences have relatively few frames, for a certain loss rate and distance factor, the sequences are tested multiple times (10 times or more) with different random seed.
  • Tables 21-23 show the results from using the hybrid coding scheme and the dependency isolation packetization scheme with different distance factors under different packet loss rates. Figures 31-33 plot these results. At each packet loss rate, the new scheme consistently outperforms the basic scheme by 2 dB to 10 dB. The new scheme shows better robustness and improves the overall playback quality dramatically.
  • the tolerable packet loss rates are pushed up.
  • the Miss America sequence can have reasonable quality at packet loss rate from 0% to less than 10%.
  • the range is extended from 0% to nearly 50%.
  • the basic method experiences a sharp drop in quality for the packet loss range of 10% to 40%.
  • the new scheme holds the quality drop up to 50%, so the acceptable playback quality can be achieved in the range from 0% to nearly
  • Figure 34 shows the effects to the Jesus to a Child sequence under 40% loss with the new scheme. This shows that the packetization scheme effectively controls the loss damage propagation. Even with the high motion video sequence Energy, reasonable playback quality can be achieved in the range of 0% to 20%, improving the original 0% to 5% interval.
  • Distance Factor determines how often an I frame is coded in the bitstream: The more I frames in the bitstream, the more robust the bitstream is against packet loss. However, small distance factors increase the bit rate because I frames are larger than P frames. Therefore small distance factors yield large bit rates. For example, as described earlier, changing the distance factor in the Miss America video sequence from 10 to 5 increases the bit rate by 43%. Under certain bit rate restrictions, selecting a reasonable distance factor is important.
  • Figures 35-37 show the PSNR measurements of the three test video under different loss rates as a function of the distance factors. For a given loss rate and a desirable PSNR value, a distance factor can be determined.
  • the distance factor should be 9 or less.
  • the network packet loss rate is around 10%, to get a desirable quality of 28, a distance factor of 4 or less should be used.
  • the packetization algorithm relies on the macroblock level dependency structure, which normally forms dependency trees.
  • the dependency tree is maximum for one macroblock if its motion vector points to a region covering four macroblocks.
  • Another possible way to reduce dependency between packets at the coding level is to restrict the motion vector so that fewer macroblocks in the reference frame will be involved, as shown in Figure 38.
  • this restriction may result in non-optimal motion vectors for compression ratio. Again this is actually a tradeoff between compression ratio and facilitation of the coding scheme for packetization and transmission. This argues for an encoding scheme that takes the network transmission into consideration and provides flexibility for the packetization process.
  • a goal of the present invention is to produce a low bit rate and robust video coding and transmission scheme.
  • This thesis makes three major contributions to the field. First, it proposes a practical approach to address the transmission problem by characterizing the behavior of Internet video traffic. Through WAN experiments, video traffic experiments are conducted to study the network transmission delay and loss behavior. The results are then used to guide the design and implementation of efficient coding and transmission schemes for digital video over the Internet.
  • a second major contribution of the present invention is an efficient macroblock level packetization scheme that minimizes the packet loss damage.
  • Traditional packetization algorithms are frame or GOB based. They do not take into consideration the dependency between macroblocks. The resulting packets may be heavily dependent on each other. As a result, when some packets are lost during the transmission, the arrived packets may also be damaged since they may need to use the lost macroblocks as references.
  • the packetization scheme described in this thesis breaks the natural macroblock ordering and packetizes them according to their dependency relationships. It packetizes macroblocks horizontally and minimizes the dependency between packets. When packets are lost, the damage is minimized.
  • the packetization algorithm introduces mimmum bit rate overhead by implicit coding of the macroblock positions, and also avoids transmitting skipped P macroblocks.
  • the scheme overcomes a number of difficulties incurred by macroblock packetization and decoding by modifying of the original frame based encoding and decoding scheme to be more macroblock level based.
  • a coding scheme based on macroblocks provides efficiency, flexibility, and robustness for network transmission.
  • a third contribution of the present invention is the design and implementation of a hybrid wavelet/H.263 coding scheme.
  • Robust transmission requires more I frames in the video sequence to provide resynchronization points.
  • H.263 is very efficient in inter-frame coding.
  • the coding of the I frame has not been improved much and the I frame is usually large in size.
  • the discrepancy between I frame size and P frame size prevents the insertion of the more I frames in the video sequence, since large I frame size increases the bit rate dramatically.
  • Wavelet coding is efficient for still image and infra- frame codings. However, it has difficulty performing the inter-frame and difference coding. Wavelet is also more computation intensive than simple DCT-based codings.
  • This invention implements a hybrid coding scheme using wavelet for I frame coding and H.263 for inter-frame coding, which is ideal for low bit rate and robust transmission.
  • the hybrid coding scheme is implemented to reduce the bit rate while increasing the robustness of the coding scheme.
  • the packetization scheme based on macroblock provides benefit for the split of the video stream into a higher priority layer where important information is carried and an enhancing layer where improvement resides.
  • the prioritized portion can be transmitted in the allocation bandwidth band and the enhancing layer in the best effort channel.
  • Macro block packetization can split the video stream into two parts, which can minimize the damage of the packet loss from the best effort band.
  • Macroblock Level Analysis in Object Identification Macroblock dependency analysis as described above is useful in identifying moving objects and moving regions. The identification of the moving objects enables a number of interesting applications like video hyperlinks and layered object coding and transmissions.
  • Video hyperlinks allow interactive and non-linear access to video data.
  • Macroblock level motion vector analysis allows the identification of object outlines. The outlines can be associated with a hyperlink, which upon mouse click leads to the another video or other hyper documents.
  • Traditional video hyperlinks like those implemented in Hyper-G [AKM95] and Vosaic [CTCL95] rely on user assistance in identifying interesting objects. Usually the identification is primitive. Interesting objects, often moving objects, are given rectangle outlines to indicate they are hyper objects. The process for making these outlines is not easy. With the help of the macroblock level motion identification, it can assist the user and automate the process.
  • a second application is object layering.
  • Current trends in video coding allow layering of objects in the video sequence.
  • the object can be coded using different resolution and compression ratios [Chi96]. While this idea comes from the traditional video coding camp to increase compression ratio, it is also useful in video transmission.
  • Layered object coding allows differentiation of objects and the packetization and transmission algorithm can assign different priority to the objects depending upon which are of current interest. It also preserves bandwidth. This can be exemplified by a situation where a person is talking in front of a black board. If the person is of current interest, the transmission algorithm can assign higher priority to the packets which capture the movement of the person. On the other hand, if the writing on the board is important, then packets capturing the board are assigned higher priority and the packets capturing the person can be dropped.
  • AMV96 E. Amir, S. McCanne, and M. Vetterii. A Layered DCT Coder for Internet Video. In Proc. of ICIP'96, Lausanne, Switzerland, 1996.
  • Usenet comp compression Newsgroup. Usenet comp. compression Frequently Asked Questions. Available from: ftp://rtfrn.mit.edu/pub/usenet/news.answers/- compression-faq, 1996.
  • CTCL95 Z. Chen, S. Tan, R. Campbell, and Y. Li. Real time video and audio in the World Wide Web. In Proc Fourth International World Wide Web Conference, 1995.
  • ATM networks Credit update protocol, adaptive credit allocation, and statistical multiplexing. In Proc SIGCOM'94, 1994.
  • Ron Frederick Experiences with software real time video compression.
  • Technical report Xerox Palo Alto Research Center, July 1992. Available on the WWW via ftp://parcftp.xerox.com/pub/net-research/nv-paper.ps.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
EP98939098A 1997-07-28 1998-07-28 Robustes zuverlässiges kompressions- und packetierungsverfahren zur übertragung von video Withdrawn EP1032881A4 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US5387197P 1997-07-28 1997-07-28
US53871P 1997-07-28
PCT/US1998/015564 WO1999005602A1 (en) 1997-07-28 1998-07-28 A robust, reliable compression and packetization scheme for transmitting video

Publications (2)

Publication Number Publication Date
EP1032881A1 true EP1032881A1 (de) 2000-09-06
EP1032881A4 EP1032881A4 (de) 2005-08-31

Family

ID=21987115

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98939098A Withdrawn EP1032881A4 (de) 1997-07-28 1998-07-28 Robustes zuverlässiges kompressions- und packetierungsverfahren zur übertragung von video

Country Status (3)

Country Link
EP (1) EP1032881A4 (de)
AU (1) AU8759198A (de)
WO (1) WO1999005602A1 (de)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1454492A4 (de) * 2001-07-18 2005-01-05 Polycom Inc System und verfahren zur verbesserung der qualität der videokommunikation über ein netzwerk auf paketbasis
JP4040577B2 (ja) * 2001-11-26 2008-01-30 コーニンクリク・フィリップス・エレクトロニクス・ナムローゼ・フエンノートシャップ スキーマ、構文解析法、およびスキーマに基づいてビットストリームを発生させる方法
EP1661317B1 (de) 2003-08-26 2007-12-26 Nxp B.V. Datentrennung und fragmentierung in einem drahtlosen netzwerk zur verbesserung der video-leistungsfähigkeit
EP2019522B1 (de) 2007-07-23 2018-08-15 Polycom, Inc. Vorrichtung und verfahren zur paketverlustwiederherstellung mit überlastungsvermeidung
EP2101503A1 (de) 2008-03-11 2009-09-16 British Telecommunications Public Limited Company Videokodierung
EP2200319A1 (de) 2008-12-10 2010-06-23 BRITISH TELECOMMUNICATIONS public limited company Multiplex-Videostreaming
EP2219342A1 (de) 2009-02-12 2010-08-18 BRITISH TELECOMMUNICATIONS public limited company Video-Streaming
CN112822549B (zh) * 2020-12-30 2022-08-05 北京大学 基于分片重组的视频流解码方法、系统、终端及介质
CN112788336B (zh) * 2020-12-30 2023-04-14 北京大数据研究院 数据元素的排序还原方法、系统、终端及标记方法
CN112822488B (zh) * 2020-12-30 2023-04-07 北京大学 基于块重组的视频编解码系统、方法、装置、终端及介质
CN112788344B (zh) * 2020-12-30 2023-03-21 北京大数据研究院 基于编码单元重组的视频解码方法、装置、系统、介质及终端

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5572333A (en) * 1993-07-16 1996-11-05 Pioneer Electronic Corporation Compressed data recording method using integral logical block size and physical block size ratios
US5629736A (en) * 1994-11-01 1997-05-13 Lucent Technologies Inc. Coded domain picture composition for multimedia communications systems
WO1997017797A2 (en) * 1995-10-25 1997-05-15 Sarnoff Corporation Apparatus and method for quadtree based variable block size motion estimation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5615287A (en) * 1994-12-02 1997-03-25 The Regents Of The University Of California Image compression technique
JP2842796B2 (ja) * 1994-12-06 1999-01-06 富士通株式会社 動画像の暗号化処理方法及び装置及び暗号化された動画像の復号化処理方法及び装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5572333A (en) * 1993-07-16 1996-11-05 Pioneer Electronic Corporation Compressed data recording method using integral logical block size and physical block size ratios
US5629736A (en) * 1994-11-01 1997-05-13 Lucent Technologies Inc. Coded domain picture composition for multimedia communications systems
WO1997017797A2 (en) * 1995-10-25 1997-05-15 Sarnoff Corporation Apparatus and method for quadtree based variable block size motion estimation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LEE H-J ET AL: "AN ERROR CONCEALMENT ALGORITHM FOR WAVELET-CODED IMAGES OVER PACKET-SWITCHED NETWORKS" PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA, US, vol. 2667, 29 January 1996 (1996-01-29), pages 222-233, XP000770115 ISSN: 0277-786X *
MASAHIRO WADA: "SELECTIVE RECOVERY OF VIDEO PACKET LOSS USING ERROR CONCEALMENT" IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, IEEE INC. NEW YORK, US, vol. 7, no. 5, 1 June 1989 (1989-06-01), pages 807-814, XP000036749 ISSN: 0733-8716 *
See also references of WO9905602A1 *

Also Published As

Publication number Publication date
EP1032881A4 (de) 2005-08-31
WO1999005602A1 (en) 1999-02-04
AU8759198A (en) 1999-02-16

Similar Documents

Publication Publication Date Title
US6680976B1 (en) Robust, reliable compression and packetization scheme for transmitting video
McCanne et al. Low-complexity video coding for receiver-driven layered multicast
Wang et al. Error control and concealment for video communication: A review
Radha et al. Scalable internet video using MPEG-4
RU2385541C2 (ru) Изменение размера буфера в кодере и декодере
CN1242623C (zh) 视频编码方法、解码方法以及相关的编码器和解码器
EP1439705A2 (de) Vorrichtung für Bewegtbilddatenverarbeitung, -übertragung und -empfang
Farber et al. Extensions of ITU-T recommendation H. 324 for error-resilient video transmission
CN101729898B (zh) 视频编码、解码方法与视频编码、解码装置
EP1479244B1 (de) Ungleicher fehlerschutz von video basierend auf bewegungsvektormerkmalen
JP2008306734A (ja) ビデオデコーダ、その中でエラーを隠蔽する方法、および、ビデオ画像を生成する方法
WO2000072599A1 (en) Media server with multi-dimensional scalable data compression
Wakeman Packetized video—options for interaction between the user, the network and the codec
Wenger et al. Using RFC2429 and H. 263+ at low to medium bit-rates for low-latency applications
Parthasarathy et al. Design of a transport coding scheme for high-quality video over ATM networks
WO1999005602A1 (en) A robust, reliable compression and packetization scheme for transmitting video
Willebeek-LeMair et al. Bamba—Audio and video streaming over the Internet
JP2005033556A (ja) データ送信装置、データ送信方法、データ受信装置、データ受信方法
Chen Coding and transmission of digital video on the Internet
Varadarajan et al. An adaptive, perception-driven error spreading scheme in continuous media streaming
Pereira et al. Multiple description coding for internet video streaming
Hassan et al. Adaptive and ubiquitous video streaming over Wireless Mesh Networks
Heybey Video coding and the application level framing protocol architecture
Han et al. Multi-resolution layered coding for real-time image transmission: Architectural and error control considerations
Dong et al. User-Oriented Qualitative Communication for JPEG/MPEG Packet Transmission

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000303

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

RIC1 Information provided on ipc code assigned before grant

Ipc: 7H 04N 7/26 B

Ipc: 7H 04N 7/64 B

Ipc: 7G 06F 13/00 A

A4 Supplementary search report drawn up and despatched

Effective date: 20050711

17Q First examination report despatched

Effective date: 20080129

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20100402