WO2007020233A2

WO2007020233A2 - Method for encoding and decoding interleave high-resolution and progressive low-resolution images

Info

Publication number: WO2007020233A2
Application number: PCT/EP2006/065228
Authority: WO
Inventors: Gwenaelle Marquant; Jérome Vieron; Patrick Lopez
Original assignee: Thomson Licensing
Priority date: 2005-08-18
Filing date: 2006-08-10
Publication date: 2007-02-22
Also published as: WO2007020233A3

Abstract

The inventive method consists in encoding a low-resolution image (2) for delivering an encoded low-resolution image, in decoding the low-resolution image (3) for delivering a reconstructed low-resolution image, in upsampling (4) the reconstructed image for delivering a prediction image, in separating (5) a reconstructed upsampled image into even and odd prediction frames, in deinterleaving (6) a high-resolution image for delivering even and odd high-resolution frames, in encoding (7, 8) the even or odd high-resolution frame by computing the difference between the high-resolution frame and the prediction frame of the same parity in such a way that an encodable residue is available.

Description

METHOD FOR ENCODING AND DECODING INTERLACED HIGH-RESOLUTION AND LOW-RESOLUTION PROGRESSIVE IMAGES

The invention relates to a video coding and decoding method and device with spatial or temporal scalability or scalability, temporal and / or SNR between interlaced and progressive video. It relates more particularly to the coding of high resolution interlaced images and low resolution progressive images. The ratio between the resolutions can be, on the other hand, non-dyadic.

The field is that of data compression relative to the MPEG standard, in particular the standard currently being defined MPEG4-SVC, for example described in the document by J. Reichel, M. Wien, H. Schwarz, entitled "Scalable video model 3.0 ", ISO / IEC JTC1 / SC29 / WG11 / N6716, Palma de Mallorca, Spain, October 2004.

The problem is to be able to generate, for the coding of a video content, a scalable bit stream that can be decoded at certain resolutions in interlaced mode, to others in progressive mode, the high / low resolution ratio which itself can not be resolved. necessarily be equal to 2, and be distinct horizontally and vertically. Different scenarios can be envisaged, for example:

- enhancement layer or "enhancement layer" relative to the original video in 108Oi, 60Hz, and base layer or "base layer" in 72Op 60Hz or 30Hz, - enhancement layer in 108Oi, 60Hz, and base layer ( base layer) SDi 60Hz

- Improvement layer in 108Oi 60Hz and base layer in 108Op 30 Hz.

The index i means an interlaced scan and the index p a progressive (progressive) scan.

The compatible compression of such scenarios, and according to the known schemes, is not effected effectively in terms of the compression ratio, the image quality or the simplicity of implementation.

One of the aims of the invention is to overcome the aforementioned drawbacks. The subject of the invention is a method for coding video images with spatial scalability, coding a first low-resolution progressive image and at least a second higher-order interlaced image. resolution from the low resolution image, the first image having a common video portion with the second image, characterized in that it comprises:

a step of encoding the low resolution image to provide a coded low resolution image,

a step of decoding the low resolution image to provide a reconstructed low resolution image,

an oversampling step of the reconstructed image to provide a prediction image; a step of separating the oversampled reconstructed image into an even prediction frame and an odd prediction frame;

a step of deinterleaving said second image to provide an even high resolution frame and an odd high resolution frame,

a coding step of an even or odd high resolution frame performing a difference calculation between a high resolution frame and the same parity prediction frame, to give a residue to be coded.

According to a particular implementation, it comprises a step of deinterleaving said second image and a step of sub-sampling the deinterleaved image to provide a first low resolution progressive image to be encoded.

According to a particular implementation, the step of encoding a high resolution frame uses a prediction frame derived from a synchronous low resolution image of the high resolution frame to be coded or a prediction frame coming from a low resolution image. synchronous with a previous or next high resolution frame by reference to the high resolution current frame to be encoded.

According to one particular implementation, the oversampling step is performed according to the ESS (Extended Spatial Scalability) method. According to one particular embodiment, the decoding of the low resolution image corresponds to the calculation of the local decoded image to provide the image reconstructed during the coding step of the low resolution image.

The invention also relates to a method for decoding images, characterized in that it comprises: a step of decoding a progressive low resolution image performing a calculation of a reconstructed image, a step of oversampling the reconstructed image to obtain an oversampled image,

a de-interleaving step of the oversampled image to obtain a predicted frame; a step of decoding a higher resolution frame by adding to the predicted frame of the same parity the corresponding residue.

According to a particular implementation, the oversampling step is performed according to the ESS (Extended Spatial Scalability) method.

The invention consists of a global architecture for processing video texture data available in interleaved form so as to propose a spatial, temporal and quality (SNR) scalability. Given an interlaced video source, adequate data separation operations and spatio-temporal filtering make it possible to generate scalable video sequences at sub-resolutions.

Partitioning of the video source interleaved in two layers, also called "layer" is performed:

a base layer, consisting of a completely scalable or scalable description, ie in the spatial, temporal and quality domain, of the progressive mode video source. This base layer describes the video of low spatial resolution (BR),

an enhancement layer, which is also scalable and which allows the perfect reconstruction of the original video in high spatial resolution interlaced mode when it is associated with the base layer.

The adopted algorithmic solution makes it possible to suitably deal with the cases of spatial scalability between a high interlaced resolution and a low progressive resolution, the image frequency or "frame rate" remaining however identical.

Other features and advantages of the invention will become clear in the following description given by way of non-limiting example and with reference to the appended figures which represent:

FIG. 1, an encoding architecture,

- Figure 2, the evolution of the format of the video, FIG. 3, the prediction modes for the high frame of the source image,

- Figure 4, the prediction modes for the low frame of the source image.

The source video sequence is in interlaced mode with a 2 MHz frequency resolution 2MxN.

The complete architecture of the encoding is shown in Figure 1.

The process can be broken down into several steps as described below. The video source is first de-interlaced by the subsampling and deinterleaving circuit, referenced 1 in the figure, to obtain a progressive mode image of the same width and height as the interlaced source. The resulting progressive sequence is then subsampled, by this deinterleaving and subsampling circuit 1, to the desired spatial resolution (LR), to provide the base layer called

"Base layer". This sub-sampling is performed from known filtering methods. For example, after applying a low-pass filter, subsampling can be performed from a separable linear filter whose coefficients correspond to a cardinal sinus weighted by an attenuation window. For example, the following filter can be used:

H

Φ) = '' <2 where x is the fractional part of the

position of the pixel to interpolate in the source image.

A half-band low-pass filter can be used. The separable filter proposed for information purposes for the MPEG-4 standard and defined in the document

ISO / IEC JTC1 / SC29 / WG11 N3515 entitled "Generic Extended Spatial

Scalability "can be applied. This filter of length 13 has for coefficients

(2.0, -4, -3.5,19,26,19,5, -3, -4,0,2), each of the coefficients being divided by

128. Low-pass filtering limits the risk of spectrum recovery and is a step prior to the subsampling stage.

Then, this base layer is encoded by a coder referenced 2 using a software or "software" encoding, for example MPEG4-AVC described in ITU-T and ISO / IEC JTC1 entitled "Advanced video coding for generic audiovisual services", ITU-T recommendation H.264 - ISO / IEC 14496-10 AVC, 2003. This subsampled layer corresponds to the layer basic in progressive mode. The next steps are to predict the texture for high resolution (HR). For this, the bit stream constituting the base layer is decoded with the decoder associated with the coding software previously used and referenced 3, giving a reconstructed base layer, denoted r_BL. The latter is then oversampled of the necessary ratio to find the dimensions of the high resolution video. This is the object of the oversampling circuit 4. This oversampling is performed with the ESS method, the acronym for Extended Spatial Scalability, described for example in ISO / IEC JTC1 / SC29 / WG11, entitled M11957. "Generic Extended Spatial Scalability", authors François, Vieron, Marquant, Burdin, Lopez. Busan, 2005. Thus, in the case of 108Oi 60Hz - 72Op 30Hz, the ESS process is applied with a ratio of 3/2. The video obtained is in contrast in progressive mode. Consequently, a separation circuit of the even and odd lines 5 carries out, for each progressive image, a separation in two frames, one formed by the even lines and the other by the odd lines of the progressive image, respectively r_BL UpTOP and r_BL Up BOT.

These frames are exploited during the coding of each frame of the source interlaced image, frames obtained by deinterleaving the source image made by the deinterleaver circuit 6, to provide the enhancement layer. Thus, the encoder 7 and the encoder 8 respectively perform the coding of the low frame and the high frame of the source image by exploiting, for a coding mode, the corresponding low resolution reconstructed frames. The even or high frame is coded first and can be exploited as a reference frame for coding the odd or low frame.

Figure 2 shows the evolution of the format of the video sequentially after each brick of operations in the case 72Op 60Hz - SDi 60 Hz. The interlaced source image of 1080 lines of 1920 pixels at a frequency of 60 Hz, the so-called 1080i60 standard, is deinterleaved and downsampled to provide progressive images of 720 lines of 1280 pixels at the frequency of 30 Hz, a so-called 720p30 standard. for the coding of the base layer. The reconstructed images are oversampled to provide a r_BL Up image in 1080p30 format. From this image are extracted a high r_BL Up TOP frame and a 540p30 r_BL Up BOT low frame which are the prediction frames for the coding of the frames of the source image. Figures 3 and 4 show source image prediction modes used by encoders 7 and 8 for coding the enhancement layer.

The texture of the enhancement layer is encoded from prediction texture for the high or even field and prediction texture for the low or odd field.

The prediction of the high frame, also called "top field", depends on the prediction modes chosen, among others:

intra-layer prediction modes, for example described in MPEG4-AVC (H264 / AVC) represented by the dashed lines in FIG. 3. This is a temporal prediction starting from previous or next high resolution frame. identical or opposite parity.

inter-layer prediction modes of the high frames (i.e. Top field), prediction performed between the "high frame" or Topjnput frames and r_BL Up TOP as shown in the figure

2, frames named respectively T_ipt and T_up in FIG. 3. These modes symbolized by solid lines in FIG. 3 are called time prediction modes for frames at different times, ie between T_ipt (k) and T_up (k-1 , k + 1) and synchronous prediction modes for frames at the same time k, ie between T_ipt (k) and

T_up (k).

The prediction of the low frame, also called "bottom field", depends on the prediction modes chosen, among others: intra-layer prediction modes represented by dashed lines in FIG. 4. This is a temporal prediction from a previous or next high resolution frame of identical or opposite parity. inter-layer prediction modes (inter layer) of the low frame (ie Botiom field), prediction made between the HR Bot_input frames and odd field from the circuit 5 or even frame from the circuit 5 via the encoder 8, as represented on Figure 1 for the encoder 7. These frames are called respectively "low frame" or B_ipt for the HR frame, r_BL Up BOT corresponding to

B_up for the odd field and r_BL Up TOP corresponding to T_up for the even field, in FIGS. 4 and 2. These modes symbolized by solid lines in FIG. 4 are called time prediction modes for staggered frames, due to the offset or "time shift" between even and odd frames, ie between B_ipt (k + ts) and

T_up (k) and between B_ipt (k + ts) and B_up (k). These modes are called synchronous prediction modes for frames at the same time k + ts, i.e. between B_ipt (k + ts) and the frames Bm (k + ts) and Tm (k + ts). These last frames correspond respectively to interpolations between B_up (k) and B_up (k + 1) and between T_up (k) and T_up (k + 1).

At the decoder, the base layer is first decoded. Then, the reconstructed images are oversampled and divided with the same methods as those used at the encoder. Once these prediction frames have been obtained, the enhancement layer data relating to the corresponding high resolution frames are decoded and added in order to reconstruct the content of the high resolution video.

The decoder performs the inverse operations of the encoder. The residue information of the enhancement layer is added to the data corresponding to the prediction texture. The exploited prediction image is that defined, for the macroblock to be encoded, from the coding information of the data stream received. Thus, the oversampling operations of the image relating to the base layer, of the image separation thus oversampled into two frames, as described in FIG. 2, are realized. Similarly, the temporal interpolation operations making it possible to obtain the prediction frames Tm and Bm.

Given a video sequence, several solutions are possible for managing high resolution and low resolution interlaced data. Among these solutions:

- A first simplified solution where only the frames are taken into account: this is the frame mode or "Field" mode, as proposed in Figure 1. - A second solution, called PAFF encoding, acronym for English Picture Adaptive Frame / Field, where the interlaced high resolution images are either encoded as a frame as in Figure 1, or encoded as an image, and this by temporal prediction with the other high resolution interlaced images or by spatial prediction relative to like the over-sampled base layer.

- Another solution, called MBAFF, acronym for the English MacroBlock Adaptive Frame Field, using the same approach as for the PAFF coding but at the level of each macroblock.

The choice among these different coding modes, combined with our coding method, is carried out in a conventional manner, for example as a function of the coding cost.

The modes of encoding MBAFF or PAF are known and described for example in the H264 standard.

Claims

A method of encoding video images with spatial scalability encoding a first low resolution progressive image and at least one second higher resolution interlaced image from the low resolution image, the first image having a resolution common video part with the second image, characterized in that it comprises:

a coding step of the low-resolution image (2) to provide an encoded low-resolution image,

a step of decoding the low resolution image (3) to provide a reconstructed low resolution image,

an oversampling step (4) of the reconstructed image to provide a prediction image; a step of separating (5) the oversampled reconstructed image into an even prediction frame and an odd prediction frame;

a deinterleaving step (6) of said second image to provide an even high resolution frame and an odd high resolution frame; a coding step (7, 8) of an even or odd high resolution frame performing a calculation of difference between a high resolution frame and the prediction frame of the same parity, to give a residue to be coded.

2. Method according to claim 1, characterized in that it comprises a deinterleaving step (1) of said second image and a step of sub-sampling (1) of the deinterleaved image to provide a first low resolution progressive image. to code.

3. Method according to claim 1, characterized in that the step of encoding a high resolution frame (7, 8) uses a prediction frame from a synchronous low resolution image of the high resolution frame to be coded or a prediction frame from a synchronous low resolution image of a previous or next high resolution frame to the current high resolution frame to be encoded.

4. Method according to claim 1, characterized in that the oversampling step is performed according to the method ESS (Extended Spatial Scalability)

5. Method according to claim 1, characterized in that the decoding of the low resolution image corresponds to the calculation of the local decoded image to provide the reconstructed image during the coding step of the low resolution image.

6. Method for decoding coded images according to the method of claim 1, characterized in that it comprises:

a step of decoding a progressive low resolution image performing a calculation of a reconstructed image,

a step of oversampling (4) of the reconstructed image to obtain an oversampled image,

a deinterleaving step (5) of the oversampled image to obtain a predicted frame,

a step of decoding a frame of higher resolution by adding to the predicted frame of the same parity, the corresponding residue.

7. Method according to claim 6, characterized in that the oversampling step (4) is performed according to the ESS (Extended Spatial Scalability) method.