GB2505640A

GB2505640A - Compressing / Decompressing Video Sequence of Images Using a Linear Prediction Model

Info

Publication number: GB2505640A
Application number: GB201214948A
Authority: GB
Inventors: Christophe Gisquet; Edouard Francois; Patrice Onno; Guillaume Laroche
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-08-22
Filing date: 2012-08-22
Publication date: 2014-03-12
Anticipated expiration: 2032-08-22
Also published as: GB2505640B; GB201214948D0

Abstract

Prediction is generated within video coding / decoding. In a scalable encoding prediction scheme, (where the image is encoded into layers e.g. 500, 501 & 502 forming one layer and 503, 504 and 505 forming another, e.g. base and enhancement layers) where each image comprises at least two components (e.g. one luma and two chroma values), an instance of each component belongs to each layer. Data from a first component (e.g. 503) within a first layer (e.g. 503, 504, 505) are used to generate the parameters of a linear prediction model or to predict data of a second component within a second layer (e.g. 500, 501, 502). The second component is different from the first component and the second layer is different from the first layer. The parameters are evaluated based on encoded source samples belonging to at least one first component within at least one first layer and target samples belonging to a second component. These parameters are used to predict original samples belonging to the second component within a second layer from predictor samples belonging to the at least one first component. There is always a diagonal dependency which is used either for the prediction or for the model parameters evaluation.

Description

Method and apparatus for compressing or decompressing video seguence of images The present invention concerns a method and a device for generating a prediction model within a compressing or decompressing process of a video sequence of images. The invention concerns more particularly a method to improve the prediction scheme in the context of scalable encoding video sequences.

A video sequence is constituted by a sequence of images to be displayed at a given rate to generate the video. Each image is typically constituted by a set of components. For example, it is usual to use three components to represent separately the luminance and the chrominance of the image. A first component codes the luminance and is typically called Y. The chrominance is coded by two components known as U and V. These components are constituted by arrays of values. They do not have necessarily the same size. Typically the luminance component has the size of the image while the chrominance components have a reduced size, usually half in each dimension.

This scheme is often called 4:2:0 to express that the images are coded with three components and that the dimension ratio is two for each dimension between the chrominance and luminance components.

High Efficiency Video Coding (IIEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard under development. This is a successor of FI.264/MPEG-4 AVC (Advanced Video Coding), currently under joint development by the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG).

This standard and the like have adopted an encoding mechanism called prediction. This mechanism uses a splitting, possibly hierarchical, of the arrays of image components into set of samples, generally rectangles and even squares, called blocks, but the set of samples may be of any kind, it is to be understood in the following that while speaking mainly about blocks, as it is generally the case, these blocks may be actually constituted of set of samples of any kind. These blocks, or set of samples, are used as the coding unit. It is Iciowu that such a coding unit can be further split into other rectangles or blocks, called partitions. A block to be encoded is called an original block herein as it contains the original data. According to this prediction mechanism a predictor block of the original block is used. This predictor block could be created using pixel data of another block of the image which happens to be very close by its content to the block to be encoded, or from a block in the same color component of an image coded previously. In the first case, the pixel data are identified thanks to a prediction direction while in that latter, the predictor block is identified thanks to motion data (e.g., the position offset). The process of predicting from data of a same image is called spatial or INTRA prediction while the process of predicting from another image is called temporal or INTER prediction. For each case, since generally the prediction is not perfect, a difference, called residual, is computed between the predictor block and the original block. Then the original block is encoded in the form of a residual and either a prediction direction or a motion information (comprising a motion vector and an identifier representing the image used as reference). It could be understood that the more similar the predictor block is to the original block, the smaller is the residual leading to an efficient encoding. When describing specifically the decoding method, the block corresponding to the original block, namely the block of samples to be decoded will be called the target block, as it would be inappropriate to call it original as it does not contain original data.

It may happen that the block used as a predictor is not directly a block issued from the image or a preceding image but the result of the application of a function, called the prediction function to a set of image samples. In this case we distinguish the image samples used as the source to generate the block used as a predictor, called the predictor samples in the following, and the block used as a predictor generated by applying the prediction function to this predictor samples, called the predictor block in the following. Of course the prediction function may be the identity, the predictor samples corresponding directly to the predictor block.

To improve the process it is known to use a model as a prediction function to predict block of one component from blocks of another component. The model is computed by the encoder and is defined by its parameters. By using the same computation, the decoder is able to generate the very same parameters defining the very same model. This model is then used by the decoder to generate the very same predictor block, apply the residual to generate the reconstructed block. It should be noted here that the mechanism implies that the parameter computation should be based only on already decoded samples at the time of decoding the target block. This is called the principle of causal coding. The decoding of a given block must depend only on * previously decoded data, so the encoder generally produces what the decoder would output to use as predictor samples for later target blocks.

A popular model, called Linear-Model (hereafter LM) prediction mode in HEVC, is known. According to this mechanism the predictor samples and the original one are collocated in different components of the same image. For example, to encode an original block belonging to the chrominance component U, the prediction uses 1 5 predictor samples belonging to the luminance component Y, typically the block corresponding to the same area in the luminance component than the area covered by the original block. Up-sampling and/or down-sampling may be necessary when the size of the set of predictor samples and the original block to encode do not fit.

To tackle the needs of decoders having different capabilities from a single video a mechanism known as scalable encoding has been developed. The idea is to provide a base video stream, called the base layer, which is the common lowest denominator for the intended audience. Then, more data is added progressively to match given client targets (bitrate handled, framerate, resolution...) in supplementary streams, called "enhancement layers". A given target defines several streams, and a corresponding client has to decode all such layers. This can be supplementary frames to increase framerate (called temporal scalability), supplementary residual information to increase visual quality (called SNR scalability) and finally, increased resolution information (called spatial scalability). To improve coding efficiency, an enhancement layer usually refers the lower layers.

A level is defined for the different layers. The base layer is the layer of level 0, the first enhancement layer is the layer of level 1 and so on. Any given layer of level i, with i a positive integer, depends on all the layers of lower level, from 0 to i-I, to generate the video sequence. When dealing with a layer of a given level i, the layer of level i-I is referred as the reference layer.

In the case where the reference layer contains an image that coincides in time with the current image, this image is called the "base image" of the current image in consideration. The collocated block, that is to say the one that has the same spatial 1 0 location as the current original block to encode, may serve as predictor samples for predicting the original block. More precisely, the coding mode, the block partitioning, the motion data (if present) and the texture data (residue in the case of a temporally predicted coding unit in the reference layer, reconstructed texture in the case of a coding unit coded in INTRA in the reference layer) of the co-located set of predictor samples 1 5 can be used to predict the current original block. In the case of a spatial enhancement layer, operations (not shown) of up-sampling of texture and motion data of at least one reference layer may be performed. Apart from this technique of interlayer prediction used in the SVC extension of the H.264/AVC standard, the coding of an SVC scalability layer uses a motion-compensated temporal prediction loop similar to the one used for coding the H.264/AVC or HEVC-compatible base layer.

The present invention aims to improve the encoding mechanism in the context of scalable encoding applied to multi-components images.

According to a first aspect of the invention there is provided a method of generating a prediction within an encoding or decoding process of a video sequence of images, wherein said image comprising at least two components, the encoding process being a scalable encoding scheme where the image is encoded in a base layer and at least one enhancement layer, an instance of each component belonging to each layer, said prediction being made according to a linear prediction model defined by a set of parameters, the method comprising evaluating the model parameters based on encoded source samples belonging to at least one first component within at least one first layer and target samples belonging to a second component; and predicting using these model parameters original samples belonging to the second component within a second layer from predictor samples belonging to the at least one first component; and wherein said second component is different from said at least one first component and said second layer is different from said at least one first layer. Accordingly the prediction accuracy may be improved.

In an embodiment said predictor samples belongs to the at least one first component within the at least one first layer.

In an embodiment said target samples belongs to the second component within the at least one first layer.

In an embodiment different parameters of a given model are generated using different processes. Accordingly, each parameter may be evaluated with the process that fit it best.

In an embodiment evaluating the model parameters based on encoded source samples comprises adapting iteratively the set of samples constituting the encoded source samples based on some criterion. Accordingly, the model is generated using the most relevant samples.

In an embodiment the method further comprises resampling at least one of said set of samples. Accordingly, set of samples with different resolution may be used together.

In an embodiment the method further comprises filtering at least one of said set of samples. Accordingly, some unwanted artifacts, like noise for example, may be attenuated. 6 H

In an embodiment the method further comprises storing at least one of said set of samples or the result of a computation based on at least one of said set of samples in a cache memory. Accordingly, the prediction process may be accelerated.

In an embodiment the method further comprises evaluating at least another set of model parameters; and combining the obtained set of model parameters to generate the actual model parameters. Accordingly, the actual model to be used may be improved.

In an embodiment the method further comprises combining at least two sets of 1 0 predictor samples to generate the actual set of predictor samples used for predicting the original samples. Accordingly, a more reliable predictor samples may be used.

In an embodiment the method further comprises determining the original samples or the source samples or the actual combination process if any, based on some 1 5 additional parameters of the used instances or layer. Accordingly, the determination may be improved.

In an embodiment the method further comprises selecting the source samples used to determine the model parameters according to additional parameters from the used instances or layers. Accordingly, more relevant samples may be selected.

In an embodiment the additional parameters are encoding parameters of the source samples.

In an embodiment the method further comprises obtaining a non-linear part from intermediate predictor samples and the source samples. Accordingly, image components with a non-linear part may be accurately predicted.

In an embodiment the obtaining of a non-linear part comprises a filtering step.

In an embodiment the non-linear part is added to the predictor samples obtained with the linear prediction model. Accordingly, the linear and the non-linear part are taken into account.

In an embodiment the non-linear part is used to classi the predictor samples according to a type of prediction to be applied to these predictor samples. Accordingly, the prediction is improved.

According to another aspect of the invention there is provided a method to 1 0 encoding a video sequence of images using at least one prediction method according to the prediction method of the invention.

According to another aspect of the invention there is provided a method to decoding a video sequence of images using at least one prediction method according to the prediction method of the invention.

According to another aspect of the invention there is provided a prediction module for generating a prediction within an encoding or decoding process of a video sequence of images, wherein said image comprising at least two components, the encoding process being a scalable encoding scheme where the image is encoded in a base layer and at least one enhancement layer, an instance of each component belonging to each layer, said prediction being made according to a linear prediction model defined by a set of parameters, the prediction module comprising an evaluation module for evaluating the model parameters based on encoded source samples belonging to at least one first component within at least one first layer and target samples belonging to a second component; and means for using these model parameters original samples belonging to the second component within a second layer from predictor samples belonging to the at least one first component; and wherein said second component is different from said at least one first component and said second layer is different from said at least one first layer.

In an embodiment the evaluation module for evaluating the model parameters based on encoded source samples comprises means for adapting iteratively the set of samples constituting the encoded source samples based on some criterion.

In an embodiment the device further comprises means for resampling at least one of said set of samples.

In an embodiment the device further comprises means for filtering at least one of said set of samples.

In an embodiment the device further comprises a cache memory for storing at least one of said set of samples or the result of a computation based on at least one of said set of samples.

In an embodiment the device further comprises means for evaluating at least another set of model parameters; and means for combining the obtained set of model parameters to generate the actual model parameters.

In an embodiment the device further comprises means for combining at least two sets of predictor samples to generate the actual set of predictor samples used for predicting the original samples.

In an embodiment the device further comprises means for determining the original samples or the source samples or the actual combination process if any, based on some additional parameters of the used instances or layer.

In an embodiment the device further comprises means for selecting the source samples used to detennine the model parameters according to additional parameters from the used instances or layers.

In an embodiment the device further comprises means for obtaining a non-linear part from intermediate predictor samples and the source samples.

In an embodiment the means for obtaining of a non-linear part comprises filtering means.

In an embodiment the device further comprises means for adding the non-linear part to the predictor samples obtained with the linear prediction model.

In an embodiment the device flirther comprises means for using the non-linear part to classify the predictor samples according to a type of prediction to be applied to these predictor samples.

According to another aspect of the invention there is provided an encoder device comprising a prediction module according to the invention.

According to another aspect of the invention there is provided a decoder device comprising a prediction module according to the invention.

According to another aspect of the invention there is provided a computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a prediction method according to the invention; when loaded into and executed by the programmable apparatus.

According to another aspect of the invention there is provided a computer-readable storage medium storing instructions of a computer program for implementing a prediction method, according to the invention.

At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system".

Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.

Since the present invention can be implemented in sofiware, the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which:- 1 5 Figure 1 illustrates a typical block processing order in encoding processes.

Figure 2 illustrates the evaluation of the model parameters in encoding processes.

Figure 3 presents a high-level overview of an embodiment of the invention.

Figure 4 illustrates various kinds of subparts of a block used in different embodiments of the invention.

Figure 5 illustrates different prediction modes that may be used in some embodiments of the invention.

Figure 6a illustrates the general process used to encode or decode a particular instance according to an embodiment of the invention.

Figure 6b illustrates the way a recursive decoding of an instance may be performed according to an embodiment of the invention.

Figure 7 illustrates the use by an encoder of several sets of encoding parameters for a given block according to some embodiments of the invention.

Figure 8 provides a block diagram of a typical scalable video coder generating three scalability layers.

Figure 9a, 9b and 9c illustrate some embodiments where additional parameters from of the used instances or layer are used to select the source samples used to determine the model parameters.

Figure 10 illustrates the prediction method in an embodiment taking into account a linear and a non-linear part.

Figure 1 illustrates a typical block processing order (e.g. in HEVC), represented here by the zigzag scan 106, under which blocks 101, 102, 103 and 104 will be processed. Causal coding is the process of using previously coded and decoded data to control coding of new data. In that case, the processing order indicates that block 102 needs to be processed after 101, block 103 after 102 (and thus after 101) and block 104 after 103 (and thus 101 and 102). The use of decoded data is required by the fact that the decoder only has access to such data.

1 5 To explain why precisely this is required, let us consider typical intra prediction methods: they rely on outer borders of the block, situated in previously encoded and decoded blocks. In the case of figure 1, predictive coding of block 104 relies on the data 105, situated in blocks 102 and 103. The consequence is that causal coding forces serial process, disallowing any parallel processing of data and thus causing important delay between the first input block and last output block of data.

Still on figure 1, we can consider the content of 105. These data are used to infer the content of 104, considering that most natural image data are usually spatially continuous. However, details in image do exist, for instance edges. Imagining that those are located in block 104 near data 105, the continuity no longer holds true. As such local properties of the signal in 105 may not match the characteristics of the ones in 104, leading to reduced prediction accuracy, and thus lessened coding efficiency. This is often unavoidable, because data 104 cannot be inferred/predicted by any other means in non-scalable encoding schemes.

Scalable coding however has the advantage that a particular version of 104 can be obtained by the collocated equivalent of 104 in previous layers, e.g. by up-sampling the data of a lower layer to match 104 in the case of spatial scalability. This up-sampled data may have properties matching better 104 than 105, which may lead to improved coding efficiency.

A model is defined herein as the prediction function that takes predictor samples to generate a predictor block. This model could be an affine model as described below, but other models could be contemplated. A model is defined by a set of parameters, like the a and 13 values for the affine one.

HEVC has defined LM prediction mode to be used for prediction between different colour components of the same image. This prediction mode is described in relation to Figure 2, it relies on an affine model: CP,d [, y} = a. Re CL [, y] + fi Here, the values of the predictor block made of the chrominance pixels Cpred[x,y] are linearly derived from the virtual luminance values RecL[x,y] inferred from the collocated decoded luminance values.

At coding time, the set of predictor samples is located in the luminance block at the same spatial location of the target block in the chrominance component. At decoding time, the decoder first generates the luminance component. The set of predictor samples RecL'[x,yJ to be used is therefore a reconstructed block within this luminance generated block. As the luminance block and the chrominance one do not have the same size, this reconstructed block RecL'[x,y] has to be down-sampled to fit the size of the target block to give RecL'[x,y].

Figure 2 illustrates the evaluation of the model parameters. They may be found, for example, by using a Least Mean Square method, according to known methods and is somewhat outside the scope of the present invention. They are determined using encoded chrominance samples on the top border 202 and left border 201 and co-located encoded luminance samples from the top border 205 and left border 204, each border lying on the outside of the block. As they lie within already encoded blocks, this respects the causal coding principle (e.g. not referencing unavailable data). To fit the size of the original samples, filters 207 and 208 are applied to the border samples in these computations.

According to this LM prediction mode, when aiming at encoding a given original block, for example block 203, from a given set of predictor samples, for example block 206, the model has to be defined first. Defining the model consists in calculating its parameters, for example a and f3. According to the causal coding principle, the target block cannot be used to this purpose as it is not available at decoding time. Typically the computation of the parameters is using as source samples some samples of the set of predictor samples and/or neighboring samples. It also uses as target samples samples of the original block neighbors. It should be noted that the blocks used to do the actual prediction, the set of predictor samples and the original block, usually differ from the samples used to calculate the parameters of the model.

1 5 Figure 3 presents a high-level overview of the invention. This figure depicts 2 layers of a video sequence. This video sequence is an example of a 4:2:0 video sequence where each chrominance component corresponds in terms of number of samples to % of the luminance component. This encoding is called here spatial scalability, as Layer 1 is bigger than Layer 0 and relies on it to be coded, as will be explained.

For the sake of clarity, such data coresponding to one given component within one given layer will be called herein an instance of the component within the layer. For instance, data 300 is the instance of component Y within layer 0 and data 311 is the instance of component U within layer 1.

To encode these instances, the following three prediction methods exist. First the classical intra prediction, where the original block is encoded only by referencing previously decoded blocks in the same instance. Secondly, inter-layer prediction, where an original block in one instance of a given component depends on predictor samples of an instance of the same component within another layer (e.g. block 310 depends on block 300 through prediction 330 and 301 and 311 through 331). Third, inter-color component prediction, such as the LM mode, where data from one instance of a component in a given layer depends on predictor samples of an instance of another component within the same layer, e.g. block 301 depends on block 300 through prediction 320.

According to a first aspect of the invention, inter-layer prediction and inter-color component prediction may be mixed. For instance prediction of 311 is limited by the fact it relies on a small amount of data spatially close to the block and which has suffered from lossy compression. As a consequence, residual data is coded to correct the mismatch between that prediction and the original block. If applying the same method to block 311 through prediction 321, the same defect may be incurred. In particular, in a parametric model, using 301 and 300 to compute parameters may yield much better values, in particular that may still apply correctly to prediction 321.

It is proposed to use one set of instances and layers data for the model 1 5 parameters estimation, and another for the actual prediction process. In particular, the layers to which belongs the predictor samples and the layers to which belong the blocks used in parameter evaluation may differ. In addition, when referring to another layer, the classical causal coding requirements for prediction no longer apply and a wide choice of samples is available.

Different coding modes according to different embodiments of the invention will be now described. These modes could be used separately and or in combination according to the contemplated embodiment.

Figure 4 illustrates various kinds of patterns for selecting source or target samples. Other patterns may easily be derived. Using only a subpart of a block has the advantage that fewer computations are thus made in order to produce the intended result, whether it is for filtering or parameter computations. Note that it may be advantageous if the source or target samples are constituted by a number of samples that is a power of 2, because this allows simplified computations for e.g. divisions or multiplications.

All the examples illustrated on Figure 4 follow the same principle. The area representing the block used in prediction is drawn with a thick border. It is typically a square and more generally a rectangle. The grey samples represent the samples used for parameter evaluation. The white samples are not used for parameter evaluation.

It is to be noted that, thanks to the scalability scheme, it is possible in our context to use inner block samples in parameter evaluation without breaking the causal coding principle. This is because samples from instance of the component in lower level layers are available to be used as target samples in the evaluation. This is not the case 1 0 when contemplating non scalable coding schemes.

For instance, pattern 401 is the pattern that is the least complex of those presented: it only accesses the 4 first lines, which reduces memory bandwidth usage.

Pattern 402 instead favors the top and left borders, this is advantageous, because data 1 5 there may have been better predicted than the rest of the block. On the other hand, pattern 403 instead favors bottom and right borders, which is advantageous because the output data there may be used in causal predictive coding by later blocks. Pattern 404 instead favors equally all borders: this is useful to achieve a better continuity of the output samples on those parts. This is an important property as issues caused by discontinuities are usually handled by what is called deblocking. Another example is pattern 405, which uses half of the samples in a homogeneous way, introducing the least bias possible.

Another possibility, illustrated by pattern 406, is a dynamical process. Some criterion is applied to determine adaptively the samples to include, for instance by applying a high-pass filter like the Sobel filter to compute edge magnitude (high frequency) in the signal, and using (or not) those edges, i.e. by applying a threshold to the edge values and selecting according to their true/false value the pixels that will be used to compute the parameters. Alternatively, robust determination may be used. In the latter, an initial set of samples is used to compute parameters, then the outliers (those to which the newly computed model parameters do not adapt well) are rejected, building a new set of samples; the process can occur until some criterion is satisfied. This criterion may be relative to the number of iterations or the number of samples.

Finally, the samples used for determining parameters may not be completely collocated with the set of samples to predict. For instance, block 408 to predict relies on H a much bigger set 407 of samples, putting the emphasis on the past samples and higher stability of the produced parameters rather than strong adaptability of thc parameters to the block.

In a similar way, block 410 relies on a somewhat bigger set of samples 409. In those latter two cases, special care must of course be taken for sets situated on borders of the signal, as some of the samples from the set used for parameter computation may not exist, e.g. on image borders. In such case, the simplest solution is to fall back to a more traditional set of samples, preventing access to non-existent blocks.

Figure 5 illustrates different prediction modes that may be used in some embodiments of the invention. A configuration comprising a base layer and an enhancement layer, each having three instances of three components is considered. The base layer comprises instance 503, instance 504 and instance 505. The enhancement layer comprises instance 500, instance 501 and instance 502. The instances 503 and 500 are instances of the same component. Similarly, instances 504 and 501 are instances of the same component like instances 505 and 502.

It is said that instance 504 depends on instance 503 to express that a prediction model of blocks from instance 504 based on collocated blocks from instance 503 is used for the encoding of instance 504. These dependencies are illustrated by arrows on the figure.

According to classical inter-components prediction mode, some dependencies may exist within the base layer. Typically, arrow 520 illustrates that encoding of instance 504 depends on a prediction model based on instance 503. Similarly, instance 505 depends on instance 504 by model 522. Strictly speaking this is an abuse of language to speak of encoding an instance depends on a model based on another instance, as the encoding is made on individual blocks within the instance not on the instance as a whole and the model is also based on some samples not the whole instances also. The model used, defined by its parameters, may vary from one block to another. The illustrated arrow does not mean that a unique model is defined for the complete instance, but it is defined on a block basis. Similarly, instance 505 may depends on instance 503 by model 521.

According to classical inter-layer prediction mode, an instance of a given component in the enhancement layer, for instance 500, may depend on the corresponding instance in the base layer, namely 503, using model 510. Inter image prediction models, based on preceding images already decoded may also be used.

It is assumed that classical decoding order is respected. Namely, when decoding, for example, instance 501 of the enhancement layer, the preceding instance 500 is available and also all the instances of the preceding layer, namely instances 503, 504 and 505.

A first prediction mode according to some embodiments of the invention consists in using the models defined to encode an instance of the reference layer, which corresponds to the base layer on the figure, to encode the corresponding instance in the enhancement layer based on the corresponding instance as a source. For example, considering that instance 504 is encoded based on instance 503 using models 520 in the reference layer, instance 501 corresponding to instance 504 is encoded using the same models, namely the same parameters defining the models, based on instance 500 corresponding to instance 503 in the enhancement layer. It corresponds to the use in the enhancement layer of the non-modified models as they were generated in the reference layer. On the figure, that means that models 530 are the models 520 generated in the reference model. Of course filtering may be needed to adapt the resolution. This prediction mode is advantageous as new models don't need to be generated for the enhancement layer leading to fewer computations.

A second prediction mode may be used, which is very similar to the first one except that the instance of the reference layer is used instead of the one of the enhancement layer as a source instance for the prediction. Namely in the preceding example, instance 503 is used instead of instance 500 for the prediction of instance 501 using the models 521 as models 540 for the prediction. This prediction mode is H advantageous because the decoding of instance 501 does not depend anymore on instance 500. Therefore it could be contemplated to decode both instance 500 and 501 in parallel as soon as the reference layer is decoded. It could also happen that using an instance of the reference layer as the source of the prediction lower the needed filtering due to resolution adaptation. For example, the luminance instance of the reference layer may typically fit the size of a chrominance instance of the enhancement layer. On the figure 5, it can be seen that instance 504 of the reference layer fit the size of instance 502 of the enhancement layer.

1 5 In a third embodiment, the source instance may be combined for the prediction.

For example, the prediction of instance 501 may be done based on the models 530 generated as described above applied to a combination of instance 503 and 500. A linear combination may be contemplated, but other ones also. The weight given to each source instance may depend on the coding parameters, like the coding mode and/or the quantization. For instance, Q' and Q7 being the quantizer step for respectively the instance 500 and 503, Q2/(QI+Q2) and Qi/(Q1+Q2) may be used as weights to get a barycenter with weights of l/Q. Alternatively, Q22!(Q12+Q22) and Q12/(Q12+Q22) may be used as weights to get a barycentcr with weights of l/Q2. A simple average of the two instances may also be used. In any case, the choice to use one instance, the other or any combination of both may be done based on the quality of these instances. This quality may be due to coding modes or quantization. Accordingly, the quality of the source and therefore the quality of the prediction may be improved leading to a better encoding.

In a fourth embodiment, the models 520 are computed as usual. The models 530 are also computed. The actual models used as 530 being a combination of the generated models. In other word the models used in the enhancement layer are a combination of classical models generated in that layer and models generated in the reference layer. The combination may be a linear combination or others. Model parameters from more than two layers may be also combined. Models parameters from some layers may be combined to be used in another layer. Accordingly the models may be improved.

In some embodiments of the present invention, the evaluation of the different parameters of a given model may be computed from samples belonging to different layers or from the same samples but using different processes. For example, the parameter may be computed from samples of instances of a first layer while the 13 parameter is computed from corresponding instances of a second layer. The model obtained may be used for the prediction of an original block of an instance within the first, the second or even a third layer. The a and the 13 parameters may also be computed using different combination of same models.

These prediction modes may be mixed. For example, for the encoding of instance 502, the models 521 may be used with instances 503, respectively 500 or both as source instance. Resampling is used as needed. Alternatively, the models 521 and 522 may be used with instances 500 and 501 or 503 and 504 or 500, 501 and 503 as source instances.

A wide variety of combination exists. The actual used combination may be explicitly flagged, fixed possibly depending on sampling of the different layers and components or adaptive depending on blocks nature. In some embodiments, the determination of the set of predictor samples for the prediction, or the samples used to generate the model parameters or the actual combination process if any is made based on some additional parameters of the used instances or layer. For example, the coding mode or the original block size may be used to select predictor samples, typically having the same size.

It should be noted that in all these embodiments, to generate the prediction of a block belonging to an instance of a first component within a first layer, samples belonging to an instance of a second component within a second layer is used wherein the second component is different from the first one and the second layer is different from the first one. There is always a "diagonal" dependency which is used either for the prediction or for the model parameters evaluation.

Upsampling may occur for mainly two reasons. In the case where one set of samples used as a source in computing prediction is smaller than the set of original samples. It may happen because of spatial scalability, for example, when predicting instance 501 from the enhancement layer from instance 503 of the reference layer. It may also happen because of different sampling, for example when predicting instance 505 from instance 504.

Upsampling may also occur during the model evaluation, meaning the computation of the parameters, when one set of samples is smaller than another one, for example when evaluating models 522 for the prediction of instance 505 from instance 504. Conversely, downsampling may be used instead, for example to match instance SO3andSOS.

Another consideration, irrespective of which resampling is performed, is that one or more instance may need to be denoised, for example using a blurring filter, prior to computing models parameters or prediction. This additional filtering, in particular in the case of convolution filters, may be cascaded to produce an unique filter for both resampling and denoising, as is obvious to the man skilled in the art.

Figure 6a illustrates the general process used to encode or decode a particular instance according to an embodiment of the invention. On step 600, an encoder or a decoder produces the instances data needed for the determination of the prediction model parameters. The instance may belong to any layer that will be needed for that determination. This may include additional filtering as already mentioned. Then on step 601, the produced data is used to determine each set of parameters, for instance following the process described relatively to Figure 4 regarding the choice of the samples to be used. Those parameters may be combined according to some embodiment to produce the final model parameters to be used in the prediction process. This combination occurs, for example, when the model is defined by the combination of corresponding models in two layers as explained in relation to figure 5.

Step 602 then produces the instances needed as source instances in prediction, similarly to step 600. This may again require additional filtering. Now that both final model parameters and input instances are ready, step 603 finally produces the prediction.

Figure 6b illustrates the way a recursive decoding of an instance may be performed according to an embodiment of the invention. For each step 600 and 602 in Figure 6a, a list instances, defined by the pair component and layer it belongs to, is built on step 610.

It is to be noted that the decoding of one particular instance may need the decoding of one or more complete layer. This is the case for example when the prediction process of the first instance uses as input the layers of lower level. Ideally the decoding should be ordered in dependency order, meaning that all the instances needed for decoding one given instance should have been already decoded. This is typically true at the beginning of the decoding when initiating the decoding of the first instance.

Later on, as the method illustrated by Figure 6b is recursive, this cannot be guaranteed.

Accordingly the decoding is initialized by setting the first instance to decode, advantageously requiring the least instances, or even not depending on any instance, on step 611. To avoid repetitive decoding, step 612 verifies whether said instance has already been decoded. This requires a global list of decoded instances shared across all operations. Indeed, if the instance is already decoded, then nothing needs to be done, and step 616 can be undertaken. Advantageously the global list is updated here.

If the instance has not been previously decoded, then first, a cheek is done in 613 on whether the current instance depends on a scheme according to the invention. If this is not the case, as for the first instance of the layer of level 0 for example, classical decoding using usual prediction means such as angular, DC, or prediction by upsampling the base layer, is performed on step 614 Otherwise, decoding according to some embodiments of the present invention is undertaken to current layer in a step 615.

As already mentioned the decoding process is recursive and therefore step 610 is undertaken with the current needed instance as the instance to decode. When its decoding comes to an end on step 618, the iterative decoding for that instance is finished, and the process for the instance that was being decoded prior to this is resumed, which was on step 615, can be resumed on step 616.

Whether classical or proposed decoding is performed, step 616 already mentioned occurs afterwards. If there are no further needed instance, then decoding of the instance to decode is over, and this decoding process ends on step 618. Otherwise, next needed instance is selected on step 617, and the process ioops back to 612.

Please note that while a recursive decoding method is presented, the same method may be achieved by a serial process. It is known that recursive algorithm may always be serialized. For example, a serial method may decode the first instance of the base layer, then the second one and so on for all instances. Then the same process is applied to the next layer, until all layers are decoded. The present recursive decoding has the advantage that a pipelined execution may be performed: only the needed instances from previous layers and components for the instance to decode are decoded.

While the decoding process has been described on instance, it should be noted that it could be applied to different kind of data to decode. It may be applied to complete images. It may be also applied to one or more blocks corresponding, for example, to data to be predicted. In the latter case, it may be possible to perform pipelined decoding, e.g. decoding only blocks from any layer that will be used to compute model parameters or as prediction input for current block. The decoding process may also be applied to only subparts of blocks, for example using samples pattern as those illustrated on Figure 4. Accordingly the computatiollal complexity when preprocessing data or determining model parameters is reduced.

The processing may also be adapted to one or more properties of one set of samples as will be described in more details relatively to Figure 7. It is to be noted that the needed data may or may not come from the same layer as the data to decode as illustrated relatively to Figure 5.

Figure 7 illustrates the use by an encoder of several sets of encoding parameters for a given block. This may happen, for example, for subdivisions of the coding unit, their quantizer steps or their coding modes.

1 0 A block 700 may be coded as one single big block, or subdivided into partitions, depending on the allowed subdivisions and according to a hierarchical model. Some are illustrated by subdivisions 701, 702, 703, 704, 705 and 706. For instance, 706 is subdivided into 4 smaller square sub blocks like the one referenced 722 of equal size, and the bottom left is further subdivided in 4 even smaller square sub blocks among which partition 731. Similarly, subdivision 704 is made of 2 rectangular partitions 710 and 711. Of course, other subdivisions of block 700 are possible.

While model parameters may be determined for the whole 700 block at once whatever the actual subdivision of the block, it may be beneficial to take advantage of the used subdivision. Indeed, the point of subdividing a block is to create partitions of somewhat differing properties to improve coding efficiency. Therefore, determining the model parameters for each partition 710 and 711 instead of for the whole partition 700 may lead to much better suited model parameters.

Partitions 720, 721 and 722, or partitions 730 and 731, are almost identical. If there are no additional properties considered for the partitions other than their sizes and positions, then partitions 720, 721 and 722, or partitions 730 and 731, will produce the exact same parameters. We illustrated only a few examples of subdivisions: in the case of partition 730 in subdivision 705, there may be quite a lot more subdivisions than 706 including it. As a consequence, the same model parameters estimation may occur multiple times redundantly when an encoder evaluates the coding efficiency of several subdivisions.

Advantageously, precomputed parameters, filtered set of samples or even prediction samples may be stored in memory to implement a memory cache. According to an embodiment all possible versions of model parameters may be precomputed, whether on the whole image or the initial block being the coding unit such as block 700,

in our example:

-A version with partitions assumed to be the size of partitions 710 and 711 (i.e. as if 700 was subdivided into 704); -A version with partitions assumed to be the size of partitions 720, 721 or 722 (i.e. as if 700 was subdivided into 701); -A version with partitions assumed to be the size of partitions 720 and 721 (i.e. as if 700 was subdivided into 703); -A version with partitions assumed to be the size of partitions 730 and 731 (i.e. as if 700 was subdivided into 705); and -A version with partitions assumed to be the size of partitions 740 and 741 (i.e. as if 700 was subdivided into 702).

Figure 8 provides a block diagram of a typical scalable video coder generating three scalability layers. This diagram is organized in three stages 800, 830 and 860, respectively dedicated to the coding of each of the scalability layers generated. The numerical references of similar functions are incremented by 30 between the successive stages. Each stage takes, as an input, the original sequence of images to be compressed, respectively 802, 832 and 862, possibly subsampled at the spatial resolution of the scalability layer coded by the stage in question. Within each stage a motion-compensated temporal prediction loop is implemented.

The first stage 800 in figure 8 corresponds to the temporal and spatial prediction diagram of an H.264/AVC or HEVC non-scalable video coder and is known to persons skilled in the art. It successively performs the following steps for coding the base layer.

A current image 802 to be compressed at the input to the coder is divided into coding units, by the function 804. Each coding unit, first of all undergoes a motion estimation step, function 816, which attempts to find, among reference images stored in a buffer 25H 812, reference prediction units for best predicting the current coding units. This motion estimation function 816 supplies one or more indices of reference images containing the reference prediction units found, as well as the corresponding motion vectors. A motion compensation function 818 applies the estimated motion vectors to the reference prediction units found and copies the blocks thus obtained into a temporal prediction image. In addition, an INTRA prediction function 820 (where the current invention may be located conjointly with 850 and 880) determines the spatial prediction mode of the current coding unit that would provide the best performance for the coding of the current coding unit in INTRA mode. Next a function of choosing the coding mode 814 1 0 determines, among the temporal and spatial predictions, the coding mode that provides the best rate to distortion compromise in the coding of the current coding unit. The difference between the current coding unit and the prediction coding unit thus selected is calculated by the function 826, so as to provide a residue (temporal or spatial) to be compressed. This residual coding unit then undergoes a spatial transform (such as the 1 5 discrete cosine transformation or DCT) and quantization functions 806 to produce quantized transform coefficients. An entropic coding of these coefficients is then performed, by a function not shown in Figure 8, and supplies the compressed texture data of the current coding units Finally, the current coding unit is reconstructed by means of a reverse quantization and reverse transformation 808, and an addition 810 of the residue after reverse transformation and the prediction coding unit of the current coding unit. Once the current image is thus reconstructed, this is stored in a buffer 812 in order to serve as a reference for the temporal prediction of future images to be coded.

Functions 822 and 824 perform a filtering operation known to persons skilled in the art by the term deblocking filter and aimed at reducing the block effects that may appear at the boundary of coding units.

The second stage in Figure 8 illustrates the coding of the first enhancement layer 830 of the scalable stream. This layer 830 is similar to the coding scheme of the base layer, except that, for each coding of a current image in the course of compression, an additional prediction mode with respect to the coding of the base layer may be chosen by the coding mode selection function 844. This prediction mode is called "interlayer prediction". It consists of reusing the coded data in a layer below the enhancement layer currently being coded as prediction data of the current coding unit. This bottom layer, here the base layer 800, is called the "reference layer" for the interlayer prediction of the enhancement layer 830.

In the case where the reference layer contains an image that coincides in time with the current image, then referred to as the "base image" of the current image, the 1 0 collocated coding unit, that is to say the one that has the same spatial position as the current coding unit that was coded in the base layer, may serve as a reference for predicting the current coding unit. More precisely, the coding mode, the coding unit partitioning, the motion data (if present) and the texture data (residue in the case of a temporally predicted coding unit, reconstructed texture in the case of a coding unit coded in INTRA) of the co-located coding unit can be used to predict the current coding unit. In the case of a spatial enhancement layer, operations (not shown) of up-sampling of texture and motion data of the reference layer are performed. Apart from this teclmique of interlayer prediction used in the SVC extension of the H.264/AVC standard, the coding of an SVC scalability layer uses a motion-compensated temporal prediction loop similar to the one used for coding the FI.264/AVC or HEVC-compatible base layer 800.

Finally, as indicated in figure 8, the coding of a third layer 862 (the second enhancement layer) uses a coding scheme identical to that of the first enhancement layer 830, the reference layer then being the first enhancement layer. It is important to note that this scheme is related to SVC but a similar architecture can be done for a scalable extension of the HEVC video coding standard.

A scalable decoder may have access to more information than only sample values from the layers of lower level. One first such example is the subblock partitioning. Figures 9a, 9b and 9c illustrate this with the case of a block 90 which is being encoded with the invention. It references a block in a lower layer, who is subdivided into 2 partitions, which are illustrated by subblocks 91 and 92. In such a case, the samples used in computing the parameters of the model of the upper layer can be adapted to the rectangular partitioning used in the lower layer. In the illustrated selection, it is selection 402 from Figure 4 which has been adapted.

1 0 Secondly, another piece of information is the angular intra prediction. If we imagine again the reference block from the lower layer to have been predicted (when coded in the lower layer) with particular angular direction, then the set of samples in computing the parameters of the model of the upper layer can yet again be adapted to the angular direction used in the encoding of the lower layer. For instance, for a strictly vertical prediction direction, this set can be 93, while for a strictly horizontal prediction direction, it may be 94.

Last but not least, we have focused on having a linear model for the prediction process. But there can be a mismatch, so it can be interesting to consider a block as made of a linear part and a non-linear part. To this end, let us consider a block of component 1 in layer 1: the corresponding blocks in layer 0 can be used to modify the prediction, as illustrated on Figure 10. This starts on step 1000 by computing the parameters, as already presented, to generate the model used in layer 0. The parameters can then be used during step 1001 to generate the prediction for the lower layer. By e.g. subtracting the decoded version of the block of the instance of component 1 in layer 0 from the prediction obtained in step 1001, a non-linear part of the prediction can be deduced in step 1002.

Step 1003 can take several aspects.

Akin to robust estimation as described relatively to Figure 4, the non-linear part can be used to classify the samples, thus generating at least 2 subsets (eg. linear/non-linear according to a threshold on the difference between the decoded sample value and its prediction) on which to compute new sets of parameters, and thus potentially sets of * prediction.

On the other hand, it can be a simple process as thresholding the generated prediction, applying a low-pass filter, scaling it by a factor, or a process known as shrinking which for a given positive value a is: P(x)<-a P(x)P(x)+a -a < P(x) C a P(x) =0 P(x)>a=>P(x)P(x)-a Resampling of said non-linear part may be needed afterwards to match the resolution of the block of the instance of component 1 in layer 0.

Then the already described process resumes on step 1004: the parameters found during step 1000 (or following a further processing,) are used to compute the normal prediction of the block of the instance of component 1 in layer 1. Step 1005 then finishes the prediction process by using the information generated during step 1003, respectively: Adding the thresholded/scaled/shrinked/low-passed/... non-linear part to said prediction (with eventually clamping to the natural range of the samples, e.g. [0;255]) Use of the samples classes derived and their parameters to produce correct prediction sample values.

Step 1006 ends the entire process by: For an encoder, encoding the difference between the prediction and the original block, for example spatial transform, quantization and entropy coding of quantized coefficients. For a decoder, decoding said encoded difference, for exampleentropy decoding, dequantization and inverse spatial transform.

Any step of the algorithm shown in Figure 6a and 6b may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC ("Personal Computer"), a DSP ("Digital Signal Processor") or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA ("Field-Programmable Gate Array") or an ASIC ("Application-Specific Integrated Circuit").

Although the present invention has been described hereinabove with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.

Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.

In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims

CLAIMS1. A computer implemented method of generating a prediction within an encoding or decoding process of a video sequence of images, wherein said image comprising at least two components, the encoding process being a scalable encoding scheme where the image is encoded in a base layer and at least one enhancement layer, an instance of each component belonging to each layer, said prediction being made according to a linear prediction model defined by a set of parameters, the method comprising: -evaluating the model parameters based on encoded source samples belonging to at least one first component within at least one first layer and target samples belonging to a second component; and -predicting using these model parameters original samples belonging to the second component within a second layer from predictor samples belonging to the at least one first component; and -wherein said second component is different from said at least one first component and said second layer is different from said at least one first layer.
2. A method according to claim 1 wherein said predictor samples belongs to the at least one first component within the at least one first layer.
3. A method according to claim 1 wherein said target samples belongs to the second component within the at least one first layer.
4. A method according to any preceding claim wherein different parameters of a given model are generated using different processes.
5. A method according to any preceding claim wherein evaluating the model parameters based on encoded source samples comprises: -adapting iteratively the set of samples constituting the encoded source samples based on some criterion.
6. A method according to any preceding claim further comprising: -resampling at least one of said set of samples.
7. A method according to any preceding claim further comprising: F: -filtering at least one of said set of samples.
8. A method according to any preceding claim further comprising: -storing at least one of said set of samples or the result of a computation based on at least one of said set of samples in a cache memory.
9. A method according to any preceding claim further comprising: -evaluating at least another set of model parameters; and -combining the obtained set of model parameters to generate the actual model parameters.
10. A method according to any preceding claim further comprising: -combining at least two sets of predictor samples to generate the actual set of predictor samples used for predicting the original samples.
11. A method according to any preceding claim further comprising: -determining the original samples or the source samples or the actual combination process if any, based on some additional parameters of the used instances or layer.
12. A method according to any preceding claim further comprising: -selecting the source samples used to determine the model parameters according to additional parameters from the used instances or layers.
13. A method according to claim 12 wherein the additional parameters are encoding parameters of the source samples.
14. A method according to any preceding claim further comprising: -obtaining a non-linear part from intermediate predictor samples and the source samples.
15. A method according to claim 14 wherein the obtaining of a non-linear part F comprises a filtering step.
16. A method according to claim 14 or 15 wherein the non-linear part is added to the predictor samples obtained with the linear prediction model.
17. A method according to claim 14 or 15 wherein the non-linear part is used to classify the predictor samples according to a type of prediction to be applied to these predictor samples.
18. A method to encoding a video sequence of images using at least one prediction 1 5 method according to any one of claims 1 to 17.
19. A method to decoding a video sequence of images using at least one prediction method according to any one of claims I to 17.
20. A prediction module for generating a prediction within an encoding or decoding process of a video sequence of images, wherein said image comprising at least two components, the encoding process being a scalable encoding scheme where the image is encoded in a base layer and at least one enhancement layer, an instance of each component belonging to each layer, said prediction being made according to a linear prediction model defined by a set of parameters, the prediction module comprising: -an evaluation module for evaluating the model parameters based on encoded source samples belonging to at least one first component within at least one first layer and target samples belonging to a second component; and -means for using these model parameters original samples belonging to the second component within a second layer from predictor samples belonging to the at least one first component; and -wherein said second component is different from said at least one first component and said second layer is different from said at least one first layer.
21. A device according to claim 20 wherein said predictor samples belongs to the at least one first component within the at least one first layer.
22. A device according to claim 20 wherein said target samples belongs to the second component within the at least one first layer.
23. A device according to claim 20 wherein different parameters of a given model are generated using different processes.
24. A device according to any claim from 20 to 23 wherein the evaluation module for evaluating the model parameters based on encoded source samples comprises: -means for adapting iteratively the set of samples constituting the encoded source samples based on some criterion.
25. A device according to any claim from 20 to 24 further comprising: -means for resampling at least one of said set of samples.
26. A device according to any claim from 20 to 25 further comprising: -means for filtering at least one of said set of samples.
27. A device according to any claim from 20 to 26 further comprising: -a cache memory for storing at least one of said set of samples or the result of a computation based on at least one of said set of samples.
28. A device according to any claim from 20 to 27 further comprising: -means for evaluating at least another set of model parameters; and -means for combining the obtained set of model parameters to generate the actual model parameters.
29. A device according to any claim from 20 to 28 further comprising: -means for combining at least two sets of predictor samples to generate the actual set of predictor samples used for predicting the original samples.
30. A device according to any claim from 20 to 29 further comprising: -means for determining the original samples or the source samples or the actual combination process if any, based on some additional parameters of the used instances or layer.
31. A device according to any claim from 20 to 30 further comprising: -means for selecting the source samples used to determine the model parameters according to additional parameters from the used instances or layers.
32. A device according to claim 31 wherein the additional parameters are encoding parameters of the source samples.
33. A device according to any claim from 20 to 32 further comprising: -means for obtaining a non-linear part from intermediate predictor samples F and the source samples.
34. A device according to claim 33 wherein the means for obtaining of a non-linear part comprises filtering means.
35. A device according to claim 29 or 34 comprising means for adding the non-linear part to the predictor samples obtained with the linear prediction model.
36. A device according to claim 29 or 34 comprising means for using the non-linear part to classifr the predictor samples according to a type of prediction to be applied to these predictor samples.
37. An encoder device comprising a prediction module according to any one of claims 20 to 36.
38. A decoder device comprising a prediction module according to any one of F claims 20 to 36.
39. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to any one of claims 1 to 17; when loaded into and executed 1 0 by the programmable apparatus.
40. A computer-readable storage medium storing instructions of a computer program for implementing a method, according to any one of claims I to 17.1 5
41. A method of generating a prediction substantially as hereinbefore described with reference to, and as shown in Figure 6 and 10.