WO2008128388A1

WO2008128388A1 - Method for encoding video data in a scalable manner

Info

Publication number: WO2008128388A1
Application number: PCT/CN2007/002031
Authority: WO
Inventors: Lihua Zhu; Jiheng Yang; Zhibo Chen
Original assignee: Thomson Licensing
Priority date: 2007-04-18
Filing date: 2007-06-29
Publication date: 2008-10-30
Also published as: EP2160902A4; CN101663893A; CN101653002A; BRPI0721501A2; KR20100015642A; US20100142613A1; EP2160902A1; CN101663893B; JP2010531554A

Abstract

A method for encoding video data in a scalable manner according to H.264/SVC standard comprises the steps of: inserting a scalable nesting Supplemental Enhancement Information message for each layer of the data stream comprising at least one reference to the layer and a link to a Supplemental Enhancement Information message, following the nesting Supplemental Enhancement Information message, inserting the Supplemental Enhancement Information message for each scalable nesting Supplemental Enhancement Information message comprising the video usability information for the layer.

Description

Method for encoding video data in a scalable manner

FIELD OF THE INVENTION

The invention concerns a method to encode video data in a scalable manner.

BACKGROUND OF THE INVENTION

The invention concerns mainly the field of video coding when data can be coded in a scalable manner. Coding video data according to several layers can be of a great help when terminals for which data are intended have different capacities and therefore cannot decode full data stream but only part of it. When the video data are coded according to several layers in a scalable manner, the receiving terminal can extract from the received bit-stream the data according to its profile.

Several video coding standards exist today which can code video data according to different layers and/or profiles. Among them, one can cite H.264/SVC, also referenced as ITU-T H.264 standard.

However, one existing problem is the overload that it creates by transmitting more data than often needed at the end-side.

Indeed, for instance in H.264/SVC or MVC (SVC standing for scalable video coding and MVC standing for multi view video coding), the transmission of several layers requests the transmission of many headers in order to transmit all the parameters requested by the different layers. In the current release of the standard, one header comprises the parameters corresponding to all the layers. So, when one needs to transmit only the base layer, all the information related to the enhancement layers have to be transmitted. Therefore, it creates a big overload on the network to transmit all the parameters for all the layers even if all layers data are not requested by the different devices to which the data are addressed.

The invention proposes to solve at least one of these drawbacks. SUMMARY OF THE INVENTION

To this end, the invention proposes a method for encoding video data in a scalable manner according to H.264/SVC standard. According to the invention, the method comprises the steps of

- inserting a scalable nesting Supplemental Enhancement Information message for each layer of the data stream comprising at least one reference to the layer and a link to a Supplemental Enhancement Information message,

- following the nesting Supplemental Enhancement Information message, inserting said Supplemental Enhancement Information message for each scalable nesting Supplemental Enhancement Information message comprising the video usability information for said layer.

According to a preferred embodiment, the Supplemental Enhancement Information message comprises a reference to the Sequence Parameter Set (SPS) that said layer is linked to.

According to a preferred embodiment, the Supplemental Enhancement Information message comprises the video usability information as defined in the H264/SVC standard.

In some coding methods, the parameters for all the layers are all transmitted as a whole, no matter how many layers are transmitted. Therefore, this creates a big overload on the network. This is mainly due to the fact that some of the parameters are layer dependant and some others are common to all layers and therefore, one header being defined for all parameters, all layer dependant and independent parameters are transmitted together.

Thanks to the invention, the layer dependant parameters are only transmitted when needed, that is when the data coded according to these layers are transmitted instead of transmitting the whole header comprising the parameters for all the layers.

BRIEF DESCRIPTION OF THE DRAWINGS Other characteristics and advantages of the invention will appear through the description of a non-limiting embodiment of the invention, which will be illustrated, with the help of the enclosed drawings:

- Figure 1 represents the structure of the NAL unit used for scalable layers coding according to the prior art,

- Figure 2 represent an embodiment of the structure as proposed in the current invention, - Figure 3 represents an overview of the scalable video coder according to a preferred embodiment of the invention,

- Figure 4 represents an overview of the data stream according to a preferred embodiment of the invention,

- Figure 5 represents an example of a bitstream according to a preferred embodiment of the invention,

DETAILED DESCRIPTION OF PREFERED EMBODIMENTS According to the preferred embodiment described here, the video data are coded according to H264/SVC. SVC proposes the transmission of video data according to several spatial levels, temporal levels, and quality levels. For one spatial level, one can code according to several temporal levels and for each temporal level according to several quality levels. Therefore when m spatial levels are defined, n temporal levels and O quality levels, the video data can be coded according to m*n*O different levels. According to the client capabilities, different layers are transmitted up to a certain level corresponding to the maximum of the client capabilities.

As shown on figure 1 representing the prior art of the invention, currently in SVC, SPS is a syntax structure which contains syntax elements that apply to zero or more entire coded video sequences as determined by the content of a seq_parameter_set_id syntax element found in the picture parameter set referred to by the pic_parameter_set_id syntax element found in each slice header. In SVC, the values of some syntax elements conveyed in the SPS are layer dependant. These syntax elements include but are not limited to, the timing information, HRD (standing for "Hypothetical Reference Decoder") parameters, bitstream restriction information. Therefore, it is necessary to allow the transmission of the aforementioned syntax elements for each layer. One Sequence Parameter Set (SPS) comprises all the needed parameters for all the corresponding spatial (Dj)₁ temporal (Tj) and quality (Qi) levels whenever all the layers are transmitted or not

SPS comprises the VUI (standing for Video Usability Information) parameters for all the layers. The VUI parameters represent a very important quantity of data as they comprise the HRD parameters for all the layers. In practical applications, as the channel rate is constrained, only certain layers are transmitted through the network. As SPS represent a basic syntax element in SVC, it is transmitted as a whole. Therefore, no matter which layer is transmitted, the HRD parameters for all the layers are transmitted.

As shown on figure 2, in order to reduce the overload of the Sequence Parameter set (SPS) for scalable video coding, the invention proposes to use a nesting_sei prefix/suffix NAL and to store the VUI parameters in a SEI message. The scalablejiesting, also called nesting SEI (and represented as

NSEI on the drawings) is acting as a header of a prefix/suffix type NAL unit indicating the layer information. The scalable__nesting is linked, thanks to the vui_parameter_sei() field, to the vui__parameter_sei message comprising all the properties of the layer specified by the nesting SEI. The following table 1 illustrates the scalablejiesting as defined by the prefix/suffix NAL.

Table 1

A scalable nesting SEI message concerns an access unit. When present, this SEI message appears before any VCL NAL unit of the corresponding access unit. Scalable nesting SEI is contained in a NAL unit. The scope to which the nested SEI message applies is indicated by the syntax elements all_pictures_in_au_flag, and num_pictures, dependency_id[ i ] and quality_id[ i ], when present.

- all_pictures_in_au_flag equal to 1 indicates that the nested SEI message applies to all the coded pictures of the access unit. all_pictures_in_au_flag equal to 0 indicates that the applicable scope of the nested SEI message is signaled by the syntax elements num_pictures, dependency_id[ i ] and quality_id[ i ].

- num_pictures_minus1 indicates the number of coded pictures to which the nested SEI message applies.

- dependency_id[ i ] and quality_id[ i ] indicate, respectively, the dependencyjd (spatial level) and the qualityjd of the i-th coded picture to which the nested SEI message applies.

- sei_nesting_zero_bit is equal to 0.

The following table illustrates the sei message containing the parameters specific to each layer.

Table 2

The sequence_parameter_set_id identifies the sequence parameter set (SPS) which current vui_parameter__sei message maps to and includes the common sequence parameter properties for the current layer.

The other parameters mentioned in table 2 are defined in the standard H.264/SVC.

The following table 3 illustrates the modification to be done to the existing definition of the sei_payload as currently defined in the standard

H.264/SVC. The vui_parameter_sei is defined as being of type 30. In other embodiments of the invention, it can be any other field which is still made available by the standard H.264/SVC.

Table 3

Figure 3 shows an embodiment of a scalable video coder 1 according to the invention. A video is received at the input of the scalable video coder 1.

The video is coded according to different spatial levels. Spatial levels mainly refer to different levels of resolution of the same video. For example, as the input of a scalable video coder, one can have a CIF sequence (352 per

288) or a QCIF sequence (176 per 144) which represent each one spatial level.

Each of the spatial level is sent to a hierarchical motion compensated prediction module. The spatial level 1 is sent to the hierarchical motion compensated prediction module 2", the spatial level 2 is sent to the hierarchical motion compensated prediction module 2' and the spatial level n is sent to the hierarchical motion compensated prediction module 2.

The spatial levels being coded on 3 bits, using the dependencyjd, therefore the maximum number of spatial levels is 8.

Once hierarchical motion predicted compensation is done, two kinds of data are generated, one being motion which describes the disparity between the different layers, the other being texture, which is the estimation error.

For each of the spatial level, the data are coded according to a base layer and to an enhancement layer. For spatial level 1, data are coded through enhancement layer coder 3" and base layer coder 4", for spatial level 2, data are coded through enhancement layer coder 3' and base layer coder 4', for spatial level 1 , data are coded through enhancement layer coder 3 and base layer coder 4.

After the coding, the headers are prepared and for each of the spatial layer, a SPS and a PPS messages are created and several NSEI-VUI_SEI messages.

For spatial level 1 , as represented on figure 3, SPS and PPS 5" are created and a set of NSEI - VUI_SEI{ , NSEI -VUI_SEI^ ,... , NSEI - VUIJSEIj_nS₀ , are also created according to this embodiment of the invention.

For spatial level 2, as represented on figure 3, SPS and PPS 5' are created and a set of NSEI

, NSEI - VUI-SEI². ,... ,

NSEI - VUIJSEI²^₀ are also created according to this embodiment of the invention. For spatial level n, as represented on figure 3, SPS and PPS 5 are created and a set of NSEI - VUIJSEIJ¹ , NSEI - VUIJSEi; ,... , NSEI -VUI-SEI J^₀ are also created according to this embodiment of the invention.

The bitstreams encoded by the base layer coding modules and the enhancement layer coding modules are following the plurality of SPS, PPS and SUP_SPS headers in the global bitstream.

On figure 3, 8" comprises SPS and PPS 5", NSEI-VUI_SEI| .NSEI-VULSEI¹^... , NSEI -VULSElI_n 6" and bitstream 7" which constitute all the encoded data associated with spatial level 1.

On figure 3, 8' comprises SPS and PPS 5', NSEI -VUI_SEI² , NSEI - VUI_SEI² ,... , NSEI - VULSEI_n, 6' and bitstream T which constitute all the encoded data associated with spatial level 2.

On figure 3, 8 comprises SPS and PPS 5, NSEI -VULSEi; , NSEI - VUI_SEI!| ,... , NSEI - VULSEI_n; 6 and bitstream 7 which constitute all the encoded data associated with spatial level n. The different NSEI-VUI_SEI headers are compliant with the headers described in the above tables.

Figure 4 represents a bitstream as coded by the scalable video encoder of figure 3.

The bitstream comprises one SPS for each of the spatial levels. When m spatial levels are encoded, the bitstream comprises SPS1 , SPS2 and SPSm represented by 10, 10' and 10" on figure 4.

In the bitstream, each SPS coding the general information relative to the spatial level, is followed by a header 10 of NSEI-VUI_SEI type itself followed by the corresponding encoded video data corresponding each to one temporal level and one quality level.

Therefore, when one level corresponding to one quality level is not transmitted, the corresponding header is also not transmitted as there is one header NSEI-VUI_SEI corresponding to each level.

So, let's take an example to illustrate the data stream to be transmitted as shown on figure 5.

Figure 5 illustrates the transmission of the following levels. The references indicated in the bitstream correspond to the references used in figure 2.

The following layers are transmitted:

• spatial layer 1

■ temporal level 1 o Quality level 1 ■ temporal level 2 o Quality level 1

• spatial layer 2

■ temporal level 1 o quality level 1

• spatial layer 3

■ temporal level 1 o Quality level 1 ^■ temporal level 2 o Quality level 1

^■ temporal level 3 o Quality level 1

Therefore, one can see that not all the different parameters for all the layers are transmitted but only the ones corresponding to the requested layers as they are comprised in the NSEI-VUI_SEI messages and no more in the SPS messages.

Claims

1. Method for encoding video data in a scalable manner according to H.264/SVC standard characterized in that it comprises the steps of

2. Method according to claim 1 characterized in that said Supplemental Enhancement Information message comprises a reference to the Sequence

Parameter Set (SPS) that said layer is linked to.

3. Method according to claim 2 characterized in that said Supplemental Enhancement Information message comprises the video usability information as defined in the H264/SVC standard.