WO2007063017A1 - Method of predicting motion and texture data - Google Patents

Method of predicting motion and texture data Download PDF

Info

Publication number
WO2007063017A1
WO2007063017A1 PCT/EP2006/068776 EP2006068776W WO2007063017A1 WO 2007063017 A1 WO2007063017 A1 WO 2007063017A1 EP 2006068776 W EP2006068776 W EP 2006068776W WO 2007063017 A1 WO2007063017 A1 WO 2007063017A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
low resolution
pictures
high resolution
pixels
Prior art date
Application number
PCT/EP2006/068776
Other languages
French (fr)
Inventor
Edouard Francois
Patrick Lopez
Vincent Bottreau
Jérome Vieron
Ying Chen
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Priority to CN200680045076.XA priority Critical patent/CN101322414B/en
Priority to EP06807830.2A priority patent/EP2005760B1/en
Priority to JP2008542726A priority patent/JP5037517B2/en
Priority to US12/085,791 priority patent/US8520141B2/en
Publication of WO2007063017A1 publication Critical patent/WO2007063017A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/16Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter for a given display mode, e.g. for interlaced or progressive display mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation

Definitions

  • the invention relates to a method for generating, for pictures of a high resolution progressive sequence, at least one motion predictor and, where appropriate, at least one texture predictor from motion data and, where appropriate, texture data associated with pictures of a low resolution interlaced sequence.
  • Scalability represents the ability to stagger information to make it decodable at multiple resolution and/or quality levels. More specifically, a data stream generated by this type of encoding method is divided into several layers, in particular a basic layer and one or more enhancement layers. These methods are used in particular to adapt a single data stream to variable transport conditions (bandwidth, error ratios, etc.), and to the expectations of the customers and the varying capabilities of their receivers (CPU, specifications of the display device, etc.).
  • the part of the data stream corresponding to low resolution pictures of the sequence can be decoded independently of the part of the data stream corresponding to the high resolution pictures.
  • the part of the data stream corresponding to high resolution pictures of the sequence can be decoded only from the part of the data stream corresponding to the low resolution pictures.
  • Hierarchical encoding with spatial scalability makes it possible to encode a first data part called basic layer, relative to the low resolution pictures and, from this basic layer, a second data part called enhancement layer, relative to the high resolution pictures.
  • each macroblock of the high resolution picture is temporally predicted according to a conventional prediction mode (for example, bidirectional prediction mode, direct prediction mode, early prediction mode, etc.) or indeed is predicted according to an inter-layer prediction mode.
  • motion data for example, a partitioning of the macroblock into blocks, possibly motion vectors and reference picture indices
  • texture data associated with a block of pixels of the high resolution picture is deduced or inherited from the motion data, respectively texture data, associated with blocks of pixels of a low resolution picture.
  • the known methods do not allow such predictors to be generated in the case where the low resolution sequence is interlaced and the high resolution sequence is progressive.
  • the invention relates in particular to a method for generating for at least one block of pixels of a picture of a sequence of high resolution progressive pictures, called high resolution sequence, at least one motion predictor from motion data associated with the pictures of a sequence of low resolution interlaced pictures, called low resolution sequence, of the same temporal frequency as the high resolution sequence.
  • Each interlaced picture comprises a top field interlaced with a bottom field and that can be coded in field mode or in frame mode.
  • Each progressive picture and each field of an interlaced picture has associated with it a temporal reference.
  • At least one motion predictor is generated for a block of pixels of the high resolution picture on the basis of the motion data associated with at least one block of pixels of the top or bottom field of a low resolution picture of the same temporal reference as the high resolution picture if the low resolution picture is coded in field mode. If the high resolution picture is of the same temporal reference as the top field of a low resolution picture and if the low resolution picture is coded in frame mode, a motion predictor is generated for a block of pixels of the high resolution picture on the basis of the motion data associated with at least one block of pixels of the low resolution picture. Otherwise no motion predictor is generated.
  • a motion predictor is generated by sub-sampling the said motion data associated with at least one block of pixels of the top or bottom field of the low resolution picture of the same temporal reference as the high resolution picture with a horizontal inter-layer ratio in the horizontal direction of the picture and a first vertical inter-layer ratio in the vertical direction of the picture.
  • a motion predictor is generated by subsampling the said motion data associated with at least one block of pixels of the low resolution picture with the horizontal inter-layer ratio in the horizontal direction of the picture and a second vertical inter-layer ratio in the vertical direction of the picture.
  • a texture predictor is generated for a block of pixels of the high resolution picture on the basis of the texture data associated with at least one block of pixels of the top or bottom field of a low resolution picture of the same temporal reference as the high resolution picture.
  • the texture predictor is generated by subsampling the said texture data associated with at least one block of pixels of the top or bottom field of the low resolution picture of the same temporal reference as the high resolution picture with the horizontal inter-layer ratio in the horizontal direction of the picture and the first vertical inter-layer ratio in the vertical direction of the picture.
  • the horizontal inter-layer ratio is equal to the width of the high resolution pictures divided by the width of the fields of the low resolution pictures
  • the first vertical inter-layer ratio is equal to the height of the high resolution pictures divided by the height of the fields of the low resolution pictures
  • the second vertical inter-layer ratio is equal to the height of the high resolution pictures divided by the height of the low resolution pictures.
  • the motion data associated with the low resolution pictures comprises motion vectors and the motion vectors associated with a low resolution picture that are coded in frame mode or with each of the top and bottom fields of a low resolution picture coded in field mode have the same parity.
  • the method is used by a method of encoding high resolution pictures from low resolution pictures.
  • the low resolution pictures are encoded according to the MPEG-4 AVC standard.
  • the method is used by a method of decoding high resolution pictures from low resolution pictures.
  • FIG. 1 represents an interlaced sequence of low resolution pictures and a progressive sequence of high resolution pictures with the same temporal frequency
  • - figure 2 illustrates the method of generating texture predictors according to the invention in the case where the sequence of low resolution pictures is interlaced and the sequence of high resolution pictures is progressive
  • - figure 3 illustrates the method of generating motion predictors according to the invention in the case where the sequence of low resolution pictures is interlaced and the sequence of high resolution pictures is progressive;
  • FIG. 5 illustrates the sub-sampling by a factor of 2 in the horizontal direction of the picture of two macroblocks MB1 and
  • the invention relates to an inter-layer prediction method which consists in generating motion predictors and, where appropriate, texture predictors for pictures of a sequence of high resolution progressive pictures, called high resolution sequence, from pictures of a sequence of low resolution interlaced pictures, called low resolution sequence.
  • the sequences are divided into groups of pictures (GOP).
  • Each low resolution picture comprises a top field interlaced with a bottom field.
  • an interlaced picture of index k is made up of a top field referenced kT and a bottom field referenced kB and a progressive picture is referenced by its index k.
  • a temporal reference is associated with each picture of the high resolution sequence and with each field of the low resolution sequence.
  • the low resolution pictures also referenced LR pictures, have a width w (w representing a number of pixels or columns) and a height of 2h (2h representing a number of pixels or lines and means 2 multiplied by h).
  • Each field of a low resolution interlaced picture has a width w and a height h.
  • the high resolution pictures, also referenced HR pictures have a width W (W representing a number of pixels or columns) and a height of 2H (2H representing a number of pixels or lines and meaning 2 multiplied by H).
  • the interlaced pictures can be encoded either in field picture mode, i.e.
  • each field is encoded as a separate picture, or even in frame picture mode, i.e. the two fields are encoded together.
  • the lines of a picture are numbered from 0 and therefore the first line is an even line and the second line (numbered 1 ) is an odd line.
  • the invention therefore consists in generating, for pictures of the high resolution sequence or for at least one block of pixels of the latter, at least one motion predictor and, where appropriate, at least one texture predictor.
  • a texture predictor associated with a high resolution picture or with at least one block of pixels of a high resolution picture is a picture or a prediction block which associates with each of its pixels texture data (for example, a luminance value and, where appropriate, chrominance values), which is generated from texture data associated with at least one picture (or field) or at least one block of pixels of a low resolution picture (or at least one block of pixels of a field) according to a method of sub-sampling the texture such as the ESS method applied to the texture (ESS standing for Extended Spatial Scalability) which is described in sections S.8.3.6.4 and S.8.5.14.2 of document ISO/IEC MPEG & ITU-T VCEG, entitled "Joint Scalable Video Model JSVM3 Annex-S", referenced JVT-P202, J.Rei
  • a motion predictor associated with a high resolution picture or with at least one block of pixels of a high resolution picture is defined as a prediction picture or a prediction block with which is associated motion data (for example, a type of partitioning, possibly reference picture indices making it possible to identify the reference pictures to which the motion vectors point).
  • the motion predictor is generated from motion data associated with at least one picture (or field) or at least one block of pixels of a low resolution picture (or at least one block of pixels of a field) according to a motion sub-sampling method such as the ESS method applied to the motion which is described in section S.8.4.1.6.3 of JSVM3, or such as the modified ESS method, described below, derived from the ESS method applied to the motion.
  • the modified ESS method, referenced MESS in figure 3 makes it possible in particular to process high and/or low resolution interlaced sequences. More specifically, it makes it possible to deal with the case where the height or the width of the high resolution picture is less than that of the low resolution picture.
  • the motion predictors include invalid motion vectors, i.e. vectors that point to unavailable reference pictures, when the prediction method according to the invention is used by a hierarchical encoding or decoding method.
  • an intermediate motion predictor is generated by sub-sampling by 2 the motion data associated with the low resolution picture, more particularly, the motion data associated with each of the macroblocks of the low resolution picture, in the vertical direction of the picture, in the horizontal direction of the picture or in both directions.
  • the method of sub-sampling by 2 is repeated in the vertical direction of the picture as long as the height of said intermediate predictor is greater than the height of the high resolution picture and it is repeated in the horizontal direction of the picture as long as the width of said intermediate predictor is greater than the width of the high resolution picture.
  • the sub-sampling consists in particular in dividing by two the coordinates of the motion vectors associated with the blocks of pixels.
  • a macroblock MB of the intermediate motion predictor is generated.
  • the size of the blocks of pixels in a macroblock is indicated above said macroblock.
  • the macroblock MB1 is not divided
  • the macroblock MB2 is divided into two blocks measuring 8 by 16 pixels (denoted 8x16) and the macroblock MB generated from these two macroblocks is divided into four 8x8 blocks, two of which are divided into 4x8 blocks.
  • the indices of reference pictures are made uniform between the blocks of 8 by 8 pixel size within a macroblock MB, and isolated intra-type blocks within a macroblock MB are deleted in the same way as in the ESS inter-layer prediction method applied to the motion and described in JSVM3.
  • the motion predictor associated with the high resolution picture is generated from the last intermediate motion predictor generated in this way, by applying the ESS method (section S.8.4.1.6.3 of JSVM3) with an
  • Ih 1 the height of the last intermediate motion predictor generated.
  • the motion vector inheritance method is modified so as not to generate invalid motion vectors, i.e. vectors that point to fields or frame pictures that are not available in the temporal breakdown process. In this case, if all the motion vectors associated with a prediction macroblock MB_pred are invalid then the inter-layer motion prediction is not authorized for this macroblock. Otherwise, (i.e. if at least one of the vectors is valid), the ESS prediction method applied to the motion is used.
  • the method according to the invention is described for a picture but can be applied to a part of a picture and in particular to a block of pixels, for example a macroblock. It makes it possible for example to handle the case of a low resolution interlaced sequence in the SD format, i.e. of dimension 720 by 288 pixels, 60 Hz and of a high resolution progressive sequence in the 72Op format, i.e. 1280 by 720 pixels, 60 Hz or else the case of a low resolution interlaced sequence in the 108Oi format, i.e. of dimension 1920 by 540 pixels, 60 Hz and of a high resolution progressive sequence in the 108Op format, i.e. 1920 by 1080 pixels, 60 Hz.
  • Texture predictors associated with high resolution pictures of index 2k and 2k+1 in figure 1 are generated in the following manner as illustrated by figure 2: • A frame texture predictor of dimension W by 2H associated with the high resolution picture of index 2k is generated 20 on the basis of the texture data of the top field of the low resolution picture of index
  • a frame texture predictor of dimension W by 2 ⁇ associated with the high resolution picture of index 2k+1 is generated 21 on the basis of the texture data of the bottom field of the low resolution picture of
  • motion predictors associated with the high resolution pictures of index 2k and 2k+1 in figure 1 are generated in the following manner as illustrated by figure 3: • A frame motion predictor of dimension W by 2 ⁇ associated with the high resolution picture of index 2k is generated 31 on the basis of the motion data of the top field of the low resolution picture of index k by applying the modified ESS method with an inter-layer ratio of w 2H
  • a frame motion predictor of dimension W by 2 ⁇ associated with a high resolution picture of index 2k+1 is generated 32 on the basis of the motion data of the bottom field of the low resolution picture of index k by applying the modified ESS method with an inter-layer w 2H ratio of — in the horizontal direction of the picture and — in the w h vertical direction of the picture.
  • all motion predictors may be generated in order to select the most appropriate one according to a given criteria, e.g. a rate distorsion criteria. If said method is used by a decoding method then a single motion predictor (respectively a single texture predictor) is generated, the type of predictor being specified in the bitstream.
  • the invention is not limited to the abovementioned exemplary embodiments.
  • those skilled in the art can apply any variant to the embodiments described and combine them to benefit from their different advantages.
  • the method according to the invention can be applied to a part of the high resolution picture.
  • the invention has been described in the case where the top field of an interlaced picture is displayed first ("top field first" case) and can be extended directly to the case where the bottom field is displayed first ("bottom field first” case) by reversing the top and bottom fields.
  • the invention can also be extended to the case of several high resolution sequences (i.e. several enhancement layers).
  • the invention is advantageously used by a method of encoding or decoding a sequence of pictures or video.
  • the sequence of low resolution pictures is encoded according to the MPEG4 AVC encoding standard defined in document ISO/IEC 14496- 10 ("Information technology -- Coding of audio-visual objects -- Part 10: Advanced Video Coding").

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a method for generating for at least one block of pixels of a picture of a sequence of progressive pictures at least one motion predictor and at least one texture predictor from motion data, respectively texture data, associated with the pictures of a sequence of low resolution interlaced pictures.

Description

METHOD OF PREDICTING MOTION AND TEXTURE DATA
1. Background of the invention
The invention relates to a method for generating, for pictures of a high resolution progressive sequence, at least one motion predictor and, where appropriate, at least one texture predictor from motion data and, where appropriate, texture data associated with pictures of a low resolution interlaced sequence.
2. State of the art
Hierarchical encoding methods with spatial scalability are known. Scalability represents the ability to stagger information to make it decodable at multiple resolution and/or quality levels. More specifically, a data stream generated by this type of encoding method is divided into several layers, in particular a basic layer and one or more enhancement layers. These methods are used in particular to adapt a single data stream to variable transport conditions (bandwidth, error ratios, etc.), and to the expectations of the customers and the varying capabilities of their receivers (CPU, specifications of the display device, etc.). In the particular case of spatial scalability, the part of the data stream corresponding to low resolution pictures of the sequence can be decoded independently of the part of the data stream corresponding to the high resolution pictures. On the other hand, the part of the data stream corresponding to high resolution pictures of the sequence can be decoded only from the part of the data stream corresponding to the low resolution pictures.
Hierarchical encoding with spatial scalability makes it possible to encode a first data part called basic layer, relative to the low resolution pictures and, from this basic layer, a second data part called enhancement layer, relative to the high resolution pictures. Normally, each macroblock of the high resolution picture is temporally predicted according to a conventional prediction mode (for example, bidirectional prediction mode, direct prediction mode, early prediction mode, etc.) or indeed is predicted according to an inter-layer prediction mode. In this latter case, motion data (for example, a partitioning of the macroblock into blocks, possibly motion vectors and reference picture indices) and, where appropriate, texture data associated with a block of pixels of the high resolution picture is deduced or inherited from the motion data, respectively texture data, associated with blocks of pixels of a low resolution picture. The known methods do not allow such predictors to be generated in the case where the low resolution sequence is interlaced and the high resolution sequence is progressive.
3. Summary of the invention The object of the invention is to overcome at least one of these drawbacks of the prior art.
The invention relates in particular to a method for generating for at least one block of pixels of a picture of a sequence of high resolution progressive pictures, called high resolution sequence, at least one motion predictor from motion data associated with the pictures of a sequence of low resolution interlaced pictures, called low resolution sequence, of the same temporal frequency as the high resolution sequence. Each interlaced picture comprises a top field interlaced with a bottom field and that can be coded in field mode or in frame mode. Each progressive picture and each field of an interlaced picture has associated with it a temporal reference. According to the invention, at least one motion predictor is generated for a block of pixels of the high resolution picture on the basis of the motion data associated with at least one block of pixels of the top or bottom field of a low resolution picture of the same temporal reference as the high resolution picture if the low resolution picture is coded in field mode. If the high resolution picture is of the same temporal reference as the top field of a low resolution picture and if the low resolution picture is coded in frame mode, a motion predictor is generated for a block of pixels of the high resolution picture on the basis of the motion data associated with at least one block of pixels of the low resolution picture. Otherwise no motion predictor is generated.
Preferably, if the low resolution picture is coded in field mode, a motion predictor is generated by sub-sampling the said motion data associated with at least one block of pixels of the top or bottom field of the low resolution picture of the same temporal reference as the high resolution picture with a horizontal inter-layer ratio in the horizontal direction of the picture and a first vertical inter-layer ratio in the vertical direction of the picture. Advantageously, if the low resolution picture is coded in frame mode and if the high resolution picture is of the same temporal reference as the top field of the low resolution picture, a motion predictor is generated by subsampling the said motion data associated with at least one block of pixels of the low resolution picture with the horizontal inter-layer ratio in the horizontal direction of the picture and a second vertical inter-layer ratio in the vertical direction of the picture.
Preferably, a texture predictor is generated for a block of pixels of the high resolution picture on the basis of the texture data associated with at least one block of pixels of the top or bottom field of a low resolution picture of the same temporal reference as the high resolution picture.
Advantageously, the texture predictor is generated by subsampling the said texture data associated with at least one block of pixels of the top or bottom field of the low resolution picture of the same temporal reference as the high resolution picture with the horizontal inter-layer ratio in the horizontal direction of the picture and the first vertical inter-layer ratio in the vertical direction of the picture.
According to a particular characteristic, the horizontal inter-layer ratio is equal to the width of the high resolution pictures divided by the width of the fields of the low resolution pictures, the first vertical inter-layer ratio is equal to the height of the high resolution pictures divided by the height of the fields of the low resolution pictures and the second vertical inter-layer ratio is equal to the height of the high resolution pictures divided by the height of the low resolution pictures. Preferably, the motion data associated with the low resolution pictures comprises motion vectors and the motion vectors associated with a low resolution picture that are coded in frame mode or with each of the top and bottom fields of a low resolution picture coded in field mode have the same parity. According to a particular embodiment, the method is used by a method of encoding high resolution pictures from low resolution pictures.
Advantageously, the low resolution pictures are encoded according to the MPEG-4 AVC standard. According to a particular embodiment, the method is used by a method of decoding high resolution pictures from low resolution pictures.
4. List of figures The invention will be better understood and illustrated by means of exemplary embodiments and advantageous implementations, by no means limiting, given with reference to the appended figures in which:
- figure 1 represents an interlaced sequence of low resolution pictures and a progressive sequence of high resolution pictures with the same temporal frequency;
- figure 2 illustrates the method of generating texture predictors according to the invention in the case where the sequence of low resolution pictures is interlaced and the sequence of high resolution pictures is progressive; - figure 3 illustrates the method of generating motion predictors according to the invention in the case where the sequence of low resolution pictures is interlaced and the sequence of high resolution pictures is progressive;
- figure 5 illustrates the sub-sampling by a factor of 2 in the horizontal direction of the picture of two macroblocks MB1 and
MB2 of a low resolution picture and the resulting partitioning for the corresponding predictor macroblock MB pred.
5. Detailed description of the invention The invention relates to an inter-layer prediction method which consists in generating motion predictors and, where appropriate, texture predictors for pictures of a sequence of high resolution progressive pictures, called high resolution sequence, from pictures of a sequence of low resolution interlaced pictures, called low resolution sequence. The sequences are divided into groups of pictures (GOP). Each low resolution picture comprises a top field interlaced with a bottom field. In figure 1 , an interlaced picture of index k is made up of a top field referenced kT and a bottom field referenced kB and a progressive picture is referenced by its index k. A temporal reference is associated with each picture of the high resolution sequence and with each field of the low resolution sequence. A high resolution picture and a field of a low resolution picture, called low resolution field, having the same temporal reference coincide vertically. The low resolution pictures, also referenced LR pictures, have a width w (w representing a number of pixels or columns) and a height of 2h (2h representing a number of pixels or lines and means 2 multiplied by h). Each field of a low resolution interlaced picture has a width w and a height h. The high resolution pictures, also referenced HR pictures, have a width W (W representing a number of pixels or columns) and a height of 2H (2H representing a number of pixels or lines and meaning 2 multiplied by H). In the embodiment described, the interlaced pictures can be encoded either in field picture mode, i.e. each field is encoded as a separate picture, or even in frame picture mode, i.e. the two fields are encoded together. The lines of a picture are numbered from 0 and therefore the first line is an even line and the second line (numbered 1 ) is an odd line.
The invention therefore consists in generating, for pictures of the high resolution sequence or for at least one block of pixels of the latter, at least one motion predictor and, where appropriate, at least one texture predictor. A texture predictor associated with a high resolution picture or with at least one block of pixels of a high resolution picture is a picture or a prediction block which associates with each of its pixels texture data (for example, a luminance value and, where appropriate, chrominance values), which is generated from texture data associated with at least one picture (or field) or at least one block of pixels of a low resolution picture (or at least one block of pixels of a field) according to a method of sub-sampling the texture such as the ESS method applied to the texture (ESS standing for Extended Spatial Scalability) which is described in sections S.8.3.6.4 and S.8.5.14.2 of document ISO/IEC MPEG & ITU-T VCEG, entitled "Joint Scalable Video Model JSVM3 Annex-S", referenced JVT-P202, J.Reichel, H.Schwarz, M.Wien. This document is referenced JSVM3 below. A motion predictor associated with a high resolution picture or with at least one block of pixels of a high resolution picture is defined as a prediction picture or a prediction block with which is associated motion data (for example, a type of partitioning, possibly reference picture indices making it possible to identify the reference pictures to which the motion vectors point). The motion predictor is generated from motion data associated with at least one picture (or field) or at least one block of pixels of a low resolution picture (or at least one block of pixels of a field) according to a motion sub-sampling method such as the ESS method applied to the motion which is described in section S.8.4.1.6.3 of JSVM3, or such as the modified ESS method, described below, derived from the ESS method applied to the motion. The modified ESS method, referenced MESS in figure 3, makes it possible in particular to process high and/or low resolution interlaced sequences. More specifically, it makes it possible to deal with the case where the height or the width of the high resolution picture is less than that of the low resolution picture. Furthermore, it makes it possible advantageously to avoid having the motion predictors include invalid motion vectors, i.e. vectors that point to unavailable reference pictures, when the prediction method according to the invention is used by a hierarchical encoding or decoding method.
According to the modified ESS method, an intermediate motion predictor is generated by sub-sampling by 2 the motion data associated with the low resolution picture, more particularly, the motion data associated with each of the macroblocks of the low resolution picture, in the vertical direction of the picture, in the horizontal direction of the picture or in both directions. The method of sub-sampling by 2 is repeated in the vertical direction of the picture as long as the height of said intermediate predictor is greater than the height of the high resolution picture and it is repeated in the horizontal direction of the picture as long as the width of said intermediate predictor is greater than the width of the high resolution picture. The sub-sampling consists in particular in dividing by two the coordinates of the motion vectors associated with the blocks of pixels. For example, with reference to figure 4, based on two macroblocks MB1 or MB2 of the low resolution picture possibly divided into blocks of pixels, a macroblock MB of the intermediate motion predictor is generated. The size of the blocks of pixels in a macroblock is indicated above said macroblock. For example, in the second line of figure 4, the macroblock MB1 is not divided, the macroblock MB2 is divided into two blocks measuring 8 by 16 pixels (denoted 8x16) and the macroblock MB generated from these two macroblocks is divided into four 8x8 blocks, two of which are divided into 4x8 blocks. The indices of reference pictures are made uniform between the blocks of 8 by 8 pixel size within a macroblock MB, and isolated intra-type blocks within a macroblock MB are deleted in the same way as in the ESS inter-layer prediction method applied to the motion and described in JSVM3.
The motion predictor associated with the high resolution picture is generated from the last intermediate motion predictor generated in this way, by applying the ESS method (section S.8.4.1.6.3 of JSVM3) with an
W inter-layer ratio equal to — in the horizontal direction of the picture and
W i
9 M — in the vertical direction of the picture, where w, is the width and 2h, is
Ih1 the height of the last intermediate motion predictor generated. Furthermore, for each prediction macroblock, the motion vector inheritance method is modified so as not to generate invalid motion vectors, i.e. vectors that point to fields or frame pictures that are not available in the temporal breakdown process. In this case, if all the motion vectors associated with a prediction macroblock MB_pred are invalid then the inter-layer motion prediction is not authorized for this macroblock. Otherwise, (i.e. if at least one of the vectors is valid), the ESS prediction method applied to the motion is used.
The method according to the invention, illustrated by figures 1 to 3, is described for a picture but can be applied to a part of a picture and in particular to a block of pixels, for example a macroblock. It makes it possible for example to handle the case of a low resolution interlaced sequence in the SD format, i.e. of dimension 720 by 288 pixels, 60 Hz and of a high resolution progressive sequence in the 72Op format, i.e. 1280 by 720 pixels, 60 Hz or else the case of a low resolution interlaced sequence in the 108Oi format, i.e. of dimension 1920 by 540 pixels, 60 Hz and of a high resolution progressive sequence in the 108Op format, i.e. 1920 by 1080 pixels, 60 Hz.
Texture predictors associated with high resolution pictures of index 2k and 2k+1 in figure 1 are generated in the following manner as illustrated by figure 2: • A frame texture predictor of dimension W by 2H associated with the high resolution picture of index 2k is generated 20 on the basis of the texture data of the top field of the low resolution picture of index
W k by applying the ESS method with an inter-layer ratio of — in the w
2H horizontal direction of the picture and — in the vertical direction of
the picture.
• A frame texture predictor of dimension W by 2Η associated with the high resolution picture of index 2k+1 is generated 21 on the basis of the texture data of the bottom field of the low resolution picture of
W index k by applying the ESS method with an inter-layer ratio of — in w
2H the horizontal direction of the picture and — in the vertical h direction of the picture.
If the low resolution picture of index k is coded in frame mode, no frame motion predictor associated with the high resolution picture of index 2k+1 is generated and a frame motion predictor of dimension W by 2Η associated with the high resolution picture of index 2k in figure 1 is generated 30 on the basis of the motion data associated with the frame low resolution picture of index k as illustrated by figure 3 by applying the
W modified ESS method with an inter-layer ratio of — in the horizontal w
2H direction of the picture and — in the vertical direction of the picture.
In other cases, i.e. if the low resolution picture of index k is coded in field mode, motion predictors associated with the high resolution pictures of index 2k and 2k+1 in figure 1 are generated in the following manner as illustrated by figure 3: • A frame motion predictor of dimension W by 2Η associated with the high resolution picture of index 2k is generated 31 on the basis of the motion data of the top field of the low resolution picture of index k by applying the modified ESS method with an inter-layer ratio of w 2H
— in the horizontal direction of the picture and — in the vertical w h direction of the picture.
• A frame motion predictor of dimension W by 2Η associated with a high resolution picture of index 2k+1 is generated 32 on the basis of the motion data of the bottom field of the low resolution picture of index k by applying the modified ESS method with an inter-layer w 2H ratio of — in the horizontal direction of the picture and — in the w h vertical direction of the picture.
According to a particular characteristic, it is possible to permit only the motion vectors with the same parity to code the low resolution pictures so as to be able to delete one field out of two from the low resolution sequence and one frame picture out of two from the high resolution sequence so that the digital data generated by a method of coding using the method of prediction according to the invention are temporally scaleable.
If the method is used by a coding method all motion predictors (respectively texture predictors) may be generated in order to select the most appropriate one according to a given criteria, e.g. a rate distorsion criteria. If said method is used by a decoding method then a single motion predictor (respectively a single texture predictor) is generated, the type of predictor being specified in the bitstream.
Of course, the invention is not limited to the abovementioned exemplary embodiments. In particular, those skilled in the art can apply any variant to the embodiments described and combine them to benefit from their different advantages. For example, the method according to the invention can be applied to a part of the high resolution picture. In practice, it is possible to generate motion and/or texture predictors for blocks of pixels (for example, macroblocks measuring 16 by 16 pixels) of the high resolution picture from motion and/or texture data associated with blocks of pixels of the low resolution pictures. Similarly, the invention has been described in the case where the top field of an interlaced picture is displayed first ("top field first" case) and can be extended directly to the case where the bottom field is displayed first ("bottom field first" case) by reversing the top and bottom fields. Moreover, the invention can also be extended to the case of several high resolution sequences (i.e. several enhancement layers). Furthermore, the invention is advantageously used by a method of encoding or decoding a sequence of pictures or video. Preferably, the sequence of low resolution pictures is encoded according to the MPEG4 AVC encoding standard defined in document ISO/IEC 14496- 10 ("Information technology -- Coding of audio-visual objects -- Part 10: Advanced Video Coding").

Claims

Claims
1. Method for generating for at least one block of pixels of a picture of a sequence of high resolution progressive pictures, called high resolution sequence, at least one motion predictor from motion data associated with the pictures of a sequence of low resolution interlaced pictures, called low resolution sequence, of the same temporal frequency as the said high resolution sequence, each interlaced picture comprising a top field interlaced with a bottom field and that can be coded in frame mode or in field mode, each progressive picture and each field of an interlaced picture having associated with it a temporal reference, characterized in that the said at least one motion predictor is generated (31 , 32) for the said at least one block of pixels of the said high resolution picture on the basis of motion data associated with at least one block of pixels of the top and/or bottom field of a low resolution picture of the same temporal reference as the said high resolution picture if the said low resolution picture is coded in field mode and in that, if the said high resolution picture is of the same temporal reference as the top field of a low resolution picture and if the said low resolution picture is coded in frame mode, the said at least one motion predictor is generated (30) for the said at least one block of pixels of the said high resolution picture on the basis of motion data associated with at least one block of pixels of the said low resolution picture, and otherwise no motion predictor is generated.
2. Method according to Claim 1 , wherein, if the said low resolution picture is coded in field mode, the said at least one motion predictor is generated by sub-sampling (31 , 32) the said motion data associated with the said at least one block of pixels of the said top and/or bottom field of the said low resolution picture of the same temporal reference as the said high resolution picture with a horizontal inter-layer ratio in the horizontal direction of the picture and a first vertical inter-layer ratio in the vertical direction of the picture.
3. Method according to either of Claims 1 and 2, wherein if the said low resolution picture is coded in frame mode and if the said high resolution picture is of the same temporal reference as the top field of the said low resolution picture, the said at least one motion predictor is generated by subsampling the said motion data associated with the said at least one block of pixels of the said low resolution picture with the said horizontal inter-layer ratio in the horizontal direction of the picture and a second vertical inter-layer ratio in the vertical direction of the picture.
4. Method according to one of Claims 1 to 3, wherein a texture predictor is generated (20, 21 ) for the said at least one block of pixels of the said high resolution on the basis of the texture data associated with the at least one block of pixels of the top or bottom field of a low resolution picture of the same temporal reference as the said high resolution picture.
5. Method according to Claim 4, dependent on Claim 3, wherein the said texture predictor is generated by subsampling (20, 21 ) the said texture data associated with the said at least one block of pixels of the top or bottom field of the said low resolution picture of the same temporal reference as the said high resolution picture with the said horizontal inter-layer ratio in the horizontal direction of the picture and the said first vertical inter-layer ratio in the vertical direction of the picture.
6. Method according to one of Claims 3 and 5, wherein said horizontal inter-layer ratio is equal to the width of the high resolution pictures divided by the width of the fields of the low resolution pictures, in that said first vertical inter-layer ratio is equal to the height of the high resolution pictures divided by the height of the fields of the low resolution pictures and in that the second vertical inter-layer ratio is equal to the height of the high resolution pictures divided by the height of the low resolution pictures.
7. Method according to one of Claims 1 to 6, wherein the motion data associated with the low resolution pictures comprises motion vectors.
8. Method according to Claim 7, wherein the motion vectors associated with a low resolution picture that are coded in frame mode or with each of the top and bottom fields of a low resolution picture coded in field mode have the same parity.
9. Method according to one of Claims 1 to 8, wherein said method is used by a method of encoding high resolution pictures from low resolution pictures.
10. Method according to Claim 9, wherein the low resolution pictures are encoded according to the MPEG-4 AVC standard.
1 1. Method according to Claims 1 to 8, wherein said method is used by a method of decoding high resolution pictures from low resolution pictures.
12. Computer program product wherein it comprises program code instructions for executing steps of the method according to any one of Claims 1 to 8, when said program is run on a computer.
PCT/EP2006/068776 2005-12-01 2006-11-22 Method of predicting motion and texture data WO2007063017A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN200680045076.XA CN101322414B (en) 2005-12-01 2006-11-22 Method of predicting motion and texture data
EP06807830.2A EP2005760B1 (en) 2005-12-01 2006-11-22 Method of predicting motion and texture data
JP2008542726A JP5037517B2 (en) 2005-12-01 2006-11-22 Method for predicting motion and texture data
US12/085,791 US8520141B2 (en) 2005-12-01 2006-11-22 Method of predicting motion and texture data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0553677 2005-12-01
FR0553677A FR2894422A1 (en) 2005-12-01 2005-12-01 METHOD FOR PREDICTING MOTION DATA AND TEXTURE

Publications (1)

Publication Number Publication Date
WO2007063017A1 true WO2007063017A1 (en) 2007-06-07

Family

ID=36808616

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2006/068776 WO2007063017A1 (en) 2005-12-01 2006-11-22 Method of predicting motion and texture data

Country Status (6)

Country Link
US (1) US8520141B2 (en)
EP (1) EP2005760B1 (en)
JP (1) JP5037517B2 (en)
CN (1) CN101322414B (en)
FR (1) FR2894422A1 (en)
WO (1) WO2007063017A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8396124B2 (en) 2005-12-05 2013-03-12 Thomson Licensing Method of predicting motion and texture data
US8520141B2 (en) 2005-12-01 2013-08-27 Thomson Licensing Method of predicting motion and texture data
US8855204B2 (en) 2005-12-05 2014-10-07 Thomson Licensing Method of predicting motion and texture data

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013232775A (en) * 2012-04-27 2013-11-14 Sharp Corp Moving image decoding device and moving image coding device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005064948A1 (en) * 2003-12-22 2005-07-14 Koninklijke Philips Electronics N.V. Compatible interlaced sdtv and progressive hdtv

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01118085A (en) 1988-09-27 1989-05-10 Sanyo Electric Co Ltd Box body of refrigerator, etc.
US5270813A (en) 1992-07-02 1993-12-14 At&T Bell Laboratories Spatially scalable video coding facilitating the derivation of variable-resolution images
JP3189258B2 (en) 1993-01-11 2001-07-16 ソニー株式会社 Image signal encoding method and image signal encoding device, image signal decoding method and image signal decoding device
CA2126467A1 (en) * 1993-07-13 1995-01-14 Barin Geoffry Haskell Scalable encoding and decoding of high-resolution progressive video
CA2127151A1 (en) 1993-09-21 1995-03-22 Atul Puri Spatially scalable video encoding and decoding
US6057884A (en) 1997-06-05 2000-05-02 General Instrument Corporation Temporal and spatial scaleable coding for video object planes
US6222944B1 (en) * 1998-05-07 2001-04-24 Sarnoff Corporation Down-sampling MPEG image decoder
JP2000013790A (en) * 1998-06-19 2000-01-14 Sony Corp Image encoding device, image encoding method, image decoding device, image decoding method, and providing medium
JP2000041248A (en) * 1998-07-23 2000-02-08 Sony Corp Image decoder and image decoding method
JP2000059793A (en) * 1998-08-07 2000-02-25 Sony Corp Picture decoding device and method therefor
JP2001045475A (en) 1999-07-27 2001-02-16 Matsushita Electric Ind Co Ltd Video signal hierarchical coder, video signal hierarchical decoder and program recording medium
JP3975629B2 (en) * 1999-12-16 2007-09-12 ソニー株式会社 Image decoding apparatus and image decoding method
US6647061B1 (en) * 2000-06-09 2003-11-11 General Instrument Corporation Video size conversion and transcoding from MPEG-2 to MPEG-4
WO2003036978A1 (en) * 2001-10-26 2003-05-01 Koninklijke Philips Electronics N.V. Method and apparatus for spatial scalable compression
US7715477B2 (en) * 2002-05-29 2010-05-11 Diego Garrido Classifying image areas of a video signal
EP1455534A1 (en) * 2003-03-03 2004-09-08 Thomson Licensing S.A. Scalable encoding and decoding of interlaced digital video data
US7970056B2 (en) * 2003-06-26 2011-06-28 Lsi Corporation Method and/or apparatus for decoding an intra-only MPEG-2 stream composed of two separate fields encoded as a special frame picture
JP4470431B2 (en) * 2003-10-01 2010-06-02 ソニー株式会社 Data processing apparatus and method
US7362809B2 (en) * 2003-12-10 2008-04-22 Lsi Logic Corporation Computational reduction in motion estimation based on lower bound of cost function
US7894526B2 (en) * 2004-02-27 2011-02-22 Panasonic Corporation Motion estimation method and moving picture coding method
EP1574995A1 (en) * 2004-03-12 2005-09-14 Thomson Licensing S.A. Method for encoding interlaced digital video data
FR2894422A1 (en) 2005-12-01 2007-06-08 Thomson Licensing Sas METHOD FOR PREDICTING MOTION DATA AND TEXTURE

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005064948A1 (en) * 2003-12-22 2005-07-14 Koninklijke Philips Electronics N.V. Compatible interlaced sdtv and progressive hdtv

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BAYRAKERI S ET AL: "MPEG-2/ECVQ LOOKAHEAD HYBRID QUANTIZATION AND SPATIALLY SCALABLE CODING", PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA, US, vol. 3024, 1997, pages 129 - 137, XP008042521, ISSN: 0277-786X *
PURI A ET AL: "Spatial domain resolution scalable video coding", PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA, US, vol. 2094, 1993, pages 718 - 729, XP002316512, ISSN: 0277-786X *
REICHEL J ET AL: "Joint Scalable Video Model JSVM-3", JOINT VIDEO TEAM (JVT) OF ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 AND ITU-T SG16 Q6), XX, XX, 29 July 2005 (2005-07-29), pages 1 - 34, XP002384686 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8520141B2 (en) 2005-12-01 2013-08-27 Thomson Licensing Method of predicting motion and texture data
US8396124B2 (en) 2005-12-05 2013-03-12 Thomson Licensing Method of predicting motion and texture data
US8855204B2 (en) 2005-12-05 2014-10-07 Thomson Licensing Method of predicting motion and texture data

Also Published As

Publication number Publication date
US20110170001A1 (en) 2011-07-14
CN101322414A (en) 2008-12-10
CN101322414B (en) 2010-10-20
EP2005760A1 (en) 2008-12-24
EP2005760B1 (en) 2014-12-31
JP2009517941A (en) 2009-04-30
FR2894422A1 (en) 2007-06-08
JP5037517B2 (en) 2012-09-26
US8520141B2 (en) 2013-08-27

Similar Documents

Publication Publication Date Title
JP7269257B2 (en) Frame-level super-resolution-based video coding
US7426308B2 (en) Intraframe and interframe interlace coding and decoding
CN108111846B (en) Inter-layer prediction method and device for scalable video coding
KR101623124B1 (en) Apparatus and method for encoding video, apparatus and method for decoding video and directional intra-prediction method therefor
US9197903B2 (en) Method and system for determining a macroblock partition for data transcoding
KR100891662B1 (en) Method for decoding and encoding a video signal
US8798131B1 (en) Apparatus and method for encoding video using assumed values with intra-prediction
KR100891663B1 (en) Method for decoding and encoding a video signal
US20120076203A1 (en) Video encoding device, video decoding device, video encoding method, and video decoding method
US20090225846A1 (en) Inter-Layer Motion Prediction Method
CN104396249A (en) Method and apparatus of bi-directional prediction for scalable video coding
WO2013001730A1 (en) Image encoding apparatus, image decoding apparatus, image encoding method and image decoding method
US20140064373A1 (en) Method and device for processing prediction information for encoding or decoding at least part of an image
WO2014161740A1 (en) Method and apparatus for encoding or decoding an image with inter layer motion information prediction according to motion information compression scheme
US8306119B2 (en) Method for hierarchically coding video images
US8520141B2 (en) Method of predicting motion and texture data
US20140369416A1 (en) Method of predicting motion and texture data
KR20170125154A (en) Method and apparatus of video decoder using curve intra prediction
US9167266B2 (en) Method for deriving motion for high resolution pictures from motion data of low resolution pictures and coding and decoding devices implementing said method
Díaz-Honrubia et al. HEVC: a review, trends and challenges
US8396124B2 (en) Method of predicting motion and texture data
US20230209081A1 (en) Image encoding/decoding method and device having motion information determined on basis of interlayer prediction, and method for transmitting bitstream
JP2013098715A (en) Moving image encoder, moving image decoder, moving image encoding method and moving image decoding method
GB2524058A (en) Image manipulation
KR20100041441A (en) Method and apparatus for encoding and decoding video

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680045076.X

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 4004/DELNP/2008

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2008542726

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2006807830

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 12085791

Country of ref document: US