AU2012100851A4

AU2012100851A4 - Multiple description video codec based on adaptive temporal sub-sampling

Info

Publication number: AU2012100851A4
Application number: AU2012100851A
Authority: AU
Inventors: Huihui Bai; Yao Zhao
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2012-06-08
Filing date: 2012-06-08
Publication date: 2012-07-12
Anticipated expiration: 2020-06-08

Abstract

An effective MD video codec based on adaptive temporal sub-sampling with a detail process as follows. Firstly, motion information of inter-frame in the original video can be regulated by adaptive temporal sub-sampling. Then, the modified video can be split into two correlated video sub-sequences, which can be compressed as two descriptions and transmitted over channels. At the decoder, if two descriptions are received, then the video with higher quality is reconstructed using central decoder. If only one description is received, then the video with acceptable quality is reconstructed using side decoder. The invention has advantages of being helpful for video robust transmission over unreliable channels with high efficiency, and without any modification to the current standard video codec, being able to be compatible with the current video standard. (Selective Drawing: Figure 1) Video Sequence ---------------- ---------------- 0 H .264 encoder Channel 1 frame inpi Odd/Even frame frame interpolation ow-sampling pitn P o-processing interleaving frame interpolation Figure 1 Original Video: 1 WEi[][I[ILI I I ~ D o aftereode ogown-saooopl Video on l channel 1 :9 Video n mo-o Reconstruction- from sideureiI ecoder 1 onW -- W K Reconstruct-ion frOriia idelo : odeo terI fr-rocesng 7 Reconstruction ii~ii A Merged frame at the decoder : Interpolated frame at the decoder Figure 2

Description

AUSTRALIA Regulation 3.2 Patents Act 1990 Complete Specification Innovation Patent APPLICANT: Beijing Jiaotong University Invention Title: MULTIPLE DESCRIPTION VIDEO CODEC BASED ON ADAPTIVE TEMPORAL SUB-SAMPLING The following statement is a full description of this invention, including the best method of performing it known to me: Multiple Description Video Codec Based on Adaptive Temporal Sub-sampling Field of the Invention The present invention is related to an effective Multiple Description (MD) video codec based on adaptive temporal sub-sampling. Prior Art In recent years, with development of Internet and wireless communications, video transmission has been widely used in video telephone, teleconferencing, distance education, telemedicine, advertising, entertainment, information retrieval and other fields. In packet switching network and narrow band network, real-time reliable video transmission has become an inevitable demand. However, the practical Internet and wireless communication networks are not very reliable. On Internet, channel interference, network congestion routing delay and other problems often result in data errors and packet loss. In wireless communication network, random bit errors, burst errors and other problems often lead to the loss of a large number of video data and the entire failure of the video transmission. The above problems mentioned are fatal for compressed video data because the compressed data usually consist of unequal length codes. Therefore, error occurs or packet loss will lead to error propagation, which will not only seriously affect the video quality but the whole video communication system is failure. This has become the bottleneck of real-time video technology over networks. The conventional video coding mainly focuses on improving compression efficiency, and mainly depends on error correction ability of channel coding if the errors occur in video transmission. The newly established international video coding standards, such as fine granularity scalable (FGS) coding in MPEG-4, have begun to adopt new coding framework, which can adapt to network transmission. In FGS, the baseline adopts stronger error protection such as stronger FEC (Forward Error Correction) and ARQ (Automatic Repeat request) to guarantee the robust transmission. However, this method has some problems as follows. Firstly, the error in the base layer will lead to severe quality decline in the reconstructed video. Then the enhanced layer depends on the baseline layer. If the base layer is lost, the enhancement layer will be useless. At last, repeat ARQ may lead to serious delay and stronger FEC may

I

also lead to serious delay due to its complexity, which may further lead to worse effect on real-time video display. Multiple description (MD) video coding has emerged to solve real-time video transmission over unreliable network. Multiple description coding (MDC) is based on the two assumptions. The first assumption is that multiple channels exist between the source and the receiver. The second assumption is that it is very low probability when all channels are failure at the same time. MDC encodes the source message into several bit streams (descriptions) carrying different information which then can be transmitted over the channels. If only one channel works, the descriptions can be individually decoded to sufficiently guarantee a minimum fidelity in the reconstruction at the receiver. However, when more channels work, the descriptions from the channels can be combined to yield a higher fidelity reconstruction. MDC can be used in many fields, such as Internet, MIMO wireless channels, speech coding, image coding, video coding and other system. Compared with scalable coding, MDC can satisfy the video transmission over the networks without priority limitation. The history of MDC can be traced in 1970s. For providing uninterrupted telephone service over the telephone network, Bell Laboratory split the speech signal into two signals by odd and even means, which can be transmitted over two separate channels. Then the corresponding theories of MDC have been researched. When Vaishampaya proposed the first practical MDC based on scalar quantization, the research of MDC has been changed from pure theories to practical systems. Many MDC methods have emerged, mainly including MDC based on scalar quantization, MDC based on correlation transform, and MDC based on lattice vector quantization. MDC for video has begun since 1990s, mainly including two methods, that is, MDC based on motion compensation loop and pre-/post-processing. Since the system of MD video coding has a side decoder and a central decoder, the reference frames used by MD encoder may be very different from the ones used by decoder, which leads to drift problem and worse decoding quality. This is the significant problem of MDC. In MDC based on motion compensation loop, a part or all of error information from motion prediction can be regarded as redundant information and sent to the decoder, which can avoid drift problem to some extent. However, the decoding quality is improved at the cost of compression efficiency and encoding complexity. Furthermore, it is not compatible with the current video standard. Therefore, it is hard 2 to be use in the practical applications. In MDC based on pre-/post-processing, redundant information can be added into the original video as pre-processing before encoding. Then, the pre-processed video can be split into multiple video sub-sequences and each sub-sequence can be encoded and decoded independently. Therefore, it can solve the drift problem efficiently. At the decoder, if more video sub-sequences are received, better reconstruction quality can be achieved. Furthermore, this method is compatible with the current standard, so that it is promising in practical applications. In pre-/post-processing based on spatial sampling, DCT (Discrete Cosine Transform) is firstly used for each frame of the original video. Then, redundancy is added into the original video using zero-padding of DCT coefficients. After inverse DCT, each pre-processed frame can be split into two descriptions by spatial sampling. As an example, the pixels from odd row and odd column and the ones from even row and even column can be organized as one description while other pixels can be organized as the other description. These two descriptions can be compressed independently by standard encoder. At the decoder, if both descriptions can be received, the reconstructed video can be obtained according to the reverse process of the MD encoder. If only one description is received, the lost pixels can be reconstructed by the duplication of its neighboring pixels at the same column. In the conventional temporal sampling, the original video can be split into two sub-sequences by odd and even means. Each sub-sequence can be encoded and decoded independently. Although no redundancy is added into the original video, the temporal correlation between frames may be destroyed. As mentioned above, redundancy allocation is very significant to the performance of MD codec. More redundancy means more correlations between descriptions, which is helpful for side reconstruction but often leads to low compression efficiency. Therefore, the design of MDC aims to efficient redundancy allocation, which can achieve better tradeoff between reconstruction quality and compression efficiency. However, in the current methods, redundancy is added in frequency domain or spatial domain. Motion information of the original video is not taken into account. And temporal correlation of video sequence is neglected, which is not helpful for motion estimation and motion compensation. Therefore, it leads to low compression efficiency and it is hard to estimate the loss information at side decoder. 3 Accordingly, to solve these problems, the present inventors propose an effective MD video codec based on adaptive temporal sub-sampling. Summary of the Invention The object of the present invention is to provide an effective MD video codec based on adaptive temporal sub-sampling. Its characteristics are as follows. Motion information of inter-frame in the original video can be regulated by adaptive temporal sub-sampling. The modified video can be split into two correlated video sub-sequences, which can be compressed as two descriptions and transmitted over channels. At the decoder, if two descriptions are received, then the video with higher quality will be reconstructed using central decoder. If only one description is received, then the video with acceptable quality will be reconstructed using side decoder. How to regulate the motion information of the original video sequence is as follows. If the motion information is smooth, down-sampling is adopted to realize frame skipping. If the motion information is sudden and obvious, up-sampling is adopted to realize frame interpolation. For any two neighboring frames, all the motion vectors for each macro block are computed and the maximum module can be obtained. Then the maximal motion vector is compared with the threshold, which can be used to judge the motion intensity. The number of frame skipping and frame interpolation is even. The modified video can be split into two correlated video sub-sequences by odd and even means, which can be compressed as two descriptions and transmitted over channels. The characteristics of side decoder are as follows. 1) If the label of the current frame is original but the label of the following frame is interpolation, then the current frame is just the reconstructed one. 2) If the label of the current frame is interpolation but the label of the following frame is original, the current frame is the interpolated frame and it can be regarded as 4 the reconstructed one. 3) If the label of the current frame is interpolation and the label of the following frame is also interpolation, the continuous frames represented by interpolation should be merged to a reconstructed frame. 4) If the label of the current frame is original, and the label of the following frame is also original, a new frame should be interpolated between these two frames. Compared with the current methods, the present invention has the advantages as follows. Not only compression performance but also visual quality has been improved. According to different motion information, the temporal correlation of video sub-sequences can be maintained for efficient motion estimation and motion compensation, which can improve compression efficiency. Furthermore, better temporal correlation is helpful for the estimation of lost information at the side decoder, which can improve the reconstructed quality at the side decoder. Due to pre-/post-processing based, in this invention, the current video standard source codec and channel codec are not modified. Therefore, the proposed MD video codec can be compatible with the video standard, which may be promising in the practical applications. Brief Description of Accompanying Drawings Figure 1 shows a framework of MD encoder and decoder in this invention; Figure 2 shows an example of pre-/post-processing; Figure 3a shows a rate-distortion performance comparison with spatial sampling when two descriptions are received; Figure 3b shows a rate-distortion performance comparison with spatial sampling when one description is received; Figure 4a shows a rate-distortion performance comparison with the conventional temporal sampling when two descriptions are received; Figure 4b shows a rate-distortion performance comparison with the conventional temporal sampling when one description is received; Detailed Description of Preferred Embodiments In Figure 1, an effective MD video codec based on adaptive temporal sub-sampling is proposed. Firstly, motion information of inter-frame in the original 5 video can be regulated by pre-processing. By odd and even means, the modified video can be split into two correlated video sub-sequences with smoother motion information. These two sub-sequences can be compressed by the standard encoder as two descriptions and transmitted over channels. At the decoder, if two descriptions are received, then the video with higher quality will be reconstructed by interleaving odd and even frames. In post-processing, frame interpolation can be used to estimate the lost information, and down sampling can be used to remove the redundant frames. (1) Pre-processing In pre-processing, adaptive redundancy can be added by various frame-rate sampling. In other words, redundant frames can be interpolated adaptively between the frames with sudden and obvious motion. Sampling includes up-sampling and down-sampling. For any two neighboring frames, all the motion vectors for each macro block are computed, and the maximum module can be obtained. Then the maximal motion vector is compared with the threshold T, which can be used to judge the motion intensity. Here the threshold is a statistical result from the maximum modules of motion vectors between all the adjacent frames. For the same video sequence, the threshold is fixed. If the current maximum module is smaller than the threshold, it means that motion information is smooth. So down-sampling is adopted to realize frame skipping. Otherwise, if the current maximum module is bigger than the threshold, it means that the motion information is sudden and obvious. So up-sampling is adopted to realize frame interpolation. According to the balance of two descriptions, the number of frame skipping and frame interpolation is even. Here, the labels '1', '0' and '2' can be used to distinguish the original frame, interpolated frame and skipped frame. These labels can be compressed by entropy coding. In fact, compared with the total bit rate of the proposed scheme, the size of labels can be neglected. (2) Post-processing In post-processing, central decoder and side decoder are designed. Central decoder Since the two descriptions are generated by odd and even means, at the central 6 decoder the video streams from the standard decoder can be interleaved and realigned in the same way firstly. According to the labels, the interpolated frames can be down-sampled and the skipped frames can be interpolated to obtain the central reconstruction. Side decoder If only one channel works, that is, the side decoder is employed, all the skipped frames with label '2' should be interpolated firstly and reset their labels with '1'. Then four possibilities exist at the side decoder. 1) If the current label is '1' but its following label is '0', the represented frame is just the reconstructed one. 2) If the current label is '0' but its following label is '1', the represented frame is the interpolated frame and it can be regarded as the reconstructed one. 3) If the current label is '0' and its following label is also '0', the continuous frames represented by '0' should be merged to a reconstructed frame. 4) If the current label is '1', and its following label is also '1', a new frame should be interpolated between the two frames denoted by '1'. In Figure 2, a simple example illustrates the pre- and post-processing. The original video sequence has 10 frames denoted by F 1 to Fio. After pre-processing, the motion-modified video has 16 frames. The Figure 2 show that even frames are interpolated adaptively, such as two frames interpolated between F 1 and F 2 , four frames interpolated between F 4 and F 5 . At the same time, even frames are skipped adaptively, such as two frames F 5 and F 8 skipped. After splitting by odd and even means, the generated descriptions are denoted by video on channel 1 and video on channel 2 and the labels are '101001201' and '011002101', respectively. At the receiver, the skipped frames with label '2' are reconstructed firstly and reset the labels with '1', so the labels can be updated as '101001101' on channel 1 or '011001101' on channel 2, respectively. When only channel 1 works, the reconstruction from the side decoder 1 is achieved like the figure. The two interpolated frames between F 3 and F 5 will be merged into a new reconstructed one while a new frame is interpolated between F 5 and F 7 to estimate the lost F 6 . When only channel 2 works, two interpolated frames between F 4 and F 6 will be merged into a new reconstructed one, while a new frame is interpolated between F 6 and F 8 to estimate the lost F 7 . On the other hand, if two channels work, the lossless frames except two skipped frames F 6 7 and F 7 can be achieved without the processing by standard codec. Figures 3 and Figure 4 show the experimental results of the standard video sequence "Foreman" and "Coastguard" with QCIF format. In Figure 3, compared with spatial sampling, the proposed scheme in this invention can achieve about 1-2dB gains in central reconstruction and about 1-3dB in side reconstruction at the same bit rate. In Figure 4, compared with the conventional temporal sampling, the proposed scheme in this invention can achieve about 0.8-1.5dB gains in central reconstruction and about 0.5-1dB in side reconstruction at the same bit rate. At the bit rate 140kbps, the proposed scheme can also achieve better visual quality compared with the conventional temporal sampling. Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and "comprising", will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps. The reference to any prior art in this specification is not, and should not be taken as, an acknowledgement or any form of suggestion that the prior art forms part of the common general knowledge in Australia. 8

Claims

1. A MD video codec based on adaptive temporal sub-sampling, characterized in that a motion information of an inter-frame in an original video is regulated by adaptive temporal sub-sampling; a modified video is splitted into two correlated video sub-sequences, which are compressed as two descriptions and transmitted over channels; at a decoder, if two descriptions are received, then a video with higher quality is reconstructed using a central decoder; and if only one description is received, then a video with acceptable quality is reconstructed using a side decoder.

2. The MD video codec according to Claim 1, characterized in that in regulating the motion information of a original video sequence, if the motion information is smooth, down-sampling is adopted to realize frame skipping; and if the motion information is sudden and obvious, up-sampling is adopted to realize frame interpolation, or in the side decoder, if a label of the current frame is original but a label of the following frame is interpolation, then a current frame is just the reconstructed one; if a label of the current frame is interpolation but a label of the following frame is original, the current frame is the interpolated frame and it is regarded as the reconstructed one; if a label of the current frame is interpolation and a label of the following frame is also interpolation, a continuous frames represented by interpolation are merged to a reconstructed frame; and if a label of the current frame is original, and a label of the following frame is also original, a new frame is interpolated between the two frames.

3. The MD video codec according to Claim 2, characterized in that for any two neighboring frames, all motion vectors for each macro block are computed and the maximum module is obtained, then the maximal motion vector is compared with the threshold T, which is used to judge a motion intensity.

4. The MD video codec according to Claim 2, characterized in that the number of frame skipping and frame interpolation is even. 9

5. The MD video codec according to Claim 1, characterized in that the modified video is splitted into two correlated video sub-sequences by odd and even means, which are compressed as two descriptions and transmitted over channels. 10