WO2016061743A1

WO2016061743A1 - Segmental prediction for video coding

Info

Publication number: WO2016061743A1
Application number: PCT/CN2014/089040
Authority: WO
Inventors: Kai Zhang; Jian-Liang Lin; Jicheng An
Original assignee: Mediatek Singapore Pte. Ltd.
Priority date: 2014-10-21
Filing date: 2014-10-21
Publication date: 2016-04-28

Abstract

A segmental prediction coding method is proposed. By segment prediction pixels into different segments with different treatments, prediction coding can be more efficient.

Description

SEGMENTAL PREDICTION FOR VIDEO CODING

FIELD OF INVENTION

The invention relates generally to video/image processing.

BACKGROUND OF THE INVENTION

Prediction takes a critical role in video coding. When coding or decoding a block， a prediction block， which is generated by intra-prediction or inter-prediction， is obtained first before producing residues at encoder or reconstructing the reconstruction samples at decoder.

Besides inter-prediction and intra-prediction， a new prediction mode named intra-block copy (IBC) has been adopted in the screen content coding (SCC) profile for high efficiency video coding (HEVC) standard. IBC was adopted to take advantage of reduplicated content in a picture. As depicted in Fig. 1， a reference block in the current picture is copied to the current block as the prediction if IBC is applied. The reference block is located by applying a block-copying vector (BV) . The samples in the reference block must have been reconstructed already before the current block is coded or decoded.

Inter simplified depth coding (InterSDC) is adopted into 3D-HEVC as a special prediction mode for depth coding. When InterSDC is used， a normal inter-prediction is performed for the current block first. Then each pixels in the prediction block is added by a coded offset. Suppose P_i，，j represents the prediction value at pixel position (i， j) after performing the normal inter-prediction， Offset is the offset coded for this block. Then the final prediction value at pixel position is P_i，，j+Offset. With InterSDC mode， no residues are coded. Thus the final prediction value will be output as the reconstructed value.

Depth lookup table (DLT) is adopted into 3D-HEVC. Since there are often only several values appearing in the depth component， DLT signals those valid values from the encoder to the decoder. When a CU is coded in intra simplified depth coding (SDC) mode or depth map modeling (DMM) mode， DLT is used to map the valid depth value to a DLT index which is much easier to compress. Fig. 2 demonstrates an example of DLT approach. DLT is signaled in picture parameter set (PPS) . And it is left to be an encoder issue how to get the DLT when encoding.

Since prediction values come from reconstructed pixels， there are distortions between the prediction values and the original values， even if the original pixels in the current block and the original pixels in the reference block are exactly the same. Because the reconstructed signal loses high-frequency information generally， the quality of prediction is deteriorated more when there are sharp pixel value changes in the reference block. Fig. 3 and Fig. 4 demonstrate two examples where there are two and three segments with sharp edges in a block.

SUMMARY OF THE INVENTION

In light of the previously described problems， a segmental prediction method is proposed. The prediction block is processed by a segmental process before it is used to get residues at an encoder or get reconstruction at a decoder. The segmental prediction method comprises classifying pixels in the prediction block into different categories， named as ‘segment’ ， wherein the pixels in the segment can be adjacent or not， treating the pixels in different segments in different ways， and forming a new prediction block after the treatment. The new prediction block is used to get the residues at the encoder or get the reconstruction at the decoder.

Other aspects and features of the invention will become apparent to those with ordinary skill in the art upon review of the following descriptions of specific embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention can be more fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings， wherein：

Fig. 1 is a diagram illustrating intra block copying；

Fig. 2 is a diagram illustrating is a diagram illustrating an example of DLT where five valid values appear in depth samples.

Fig. 3 is a diagram illustrating a block with two segments with sharp sample value changes；

Fig. 4 is a diagram illustrating a block with three segments with sharp sample value changes；

Fig. 5 is a diagram illustrating an exemplary segmental prediction architecture at the decoder；

Fig. 6 is a diagram illustrating an exemplary segmental process architecture；

Fig. 7 is a diagram illustrating an exemplary treatment for a segment. After the treatment， pixels in the segment can hold different values；

Fig. 8 is a diagram illustrating an exemplary treatment for a segment. After the treatment， pixels in the segment hold only one value.

Fig. 9 is a diagram illustrating four corners (painted in black) in a block.

Fig. 10 is a diagram illustrating special positions (painted in black) in a block used to calculate E_U.

DETAILED DESCRIPTION

The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.

A segmental prediction method is proposed.

In one embodiment， the prediction block is processed by a segmental process before it is used to get the residues at encoder or get the reconstruction at decoder. In another embodiment， the prediction block is processed by a segmental process， and then the modified prediction block is output as the reconstruction without adding to residues. Fig. 5 demonstrates segmental prediction architecture at the decoder.

The prediction block can be obtained by intra-prediction， inter-prediction， intra-block copy prediction or any combination of them.

For example， a part of the prediction block can be obtained by inter-prediction， and another part of the prediction block can be obtained by intra-block copy prediction.

In the segmental process， there are two steps generally as depicted in Fig. 6. The first step is called ‘classification’ ， in which the pixels in a prediction block are classified into different categories， named as ‘segment’ . Pixels in a segment can be adjacent or not. The second step is called ‘treatment’ ， in which pixels in different segments are treated in different ways. Finally， all the pixels after the treatment form a new prediction block which is output then. The number of segments can be any positive integers such as 1， 2， 3， etc.

In one embodiment， the prediction values can be classified according to their values.

In another embodiment， the prediction values can be classified according to their positions.

In still another embodiment， the prediction values can be classified according to their gradients. Gradients can be got by applying operators such as Sobel， Roberts， and Prewitt.

In still another embodiment， classification is not applied ifthe number of segments is one.

In still another embodiment， the prediction block is classified into two segments in the classification step. A pixel is classified according to its relationship with a threshold number T. In one example， a pixel is classified into segment 0 if its value is larger than T， Otherwise it is classified into segment 1. In another example， a pixel is classified into segment 0 if its value is larger than or equal to T， Otherwise it is classified into segment 1.

In still another embodiment， T is calculated as a function of all the pixel values in the prediction block. In a formula way， T＝f (P) ， where P represents all the pixel values in the prediction block.

In still another embodiment， T is calculated as the average value of all the pixel values in the prediction block.

In still another embodiment， T is calculated as the middle value of all the pixel values in the prediction block.

In still another embodiment， T is calculated as the average value of four corner values in the prediction block. As an example depicted in Fig. 9， T is calculated as the average value of the four pixels painted in black color.

In still another embodiment， T is calculated as the average value of the minimum and the maximum pixel value in the prediction block. In a formula way， T＝ (Vmax+Vmin) /2， where Vmax and Vmin are the minimum and the maximum pixel value in the prediction block respectively.

In still another embodiment， the prediction block is classified into M (M＞2) segments in the classification step. A pixel is classified according to its relationship with M-1 threshold numbers T₁＜＝T₂＜＝...＜＝T_M-1. In one example where M is equal to 3， a pixel is classified into segment 0 if its value is smaller than T₁； it is classified into segment 2 if its value is larger than T₂；otherwise it is classified into segment 1. In another example， a pixel is classified into segment 0 if its value is smaller than or equal to T₁； it is classified into segment 2 if its value is larger than T₂； otherwise it is classified into segment 1.

In still another embodiment， T_k is calculated as a function of all the pixel values in the prediction block， where k if from 1 to M-1. In a formula way， T_k＝f_k (P) ， where P represents all the pixel values in the prediction block.

In still another embodiment where M is equal to 3， T_k is calculated as

T₁＝ (T+Vmin) /2

and

T₂＝ (Vmax+T) /2，

where T is the average value of all the pixel values in the prediction block. Vmax and Vmin are the minimum and the maximum pixel value in the prediction block respectively.

In one embodiment as depicted in Fig. 7， an offset Off_U is added to a pixel in a segment， denoted as segment U， in the treatment process to get the new prediction value. In a formula way， Vnew＝Vold+Off_U， where Vold is the pixel value before the treatment and Vnew is the pixel value after the treatment respectively.

In another embodiment as depicted in Fig. 8， all pixels possess the same value V_U in a segment， denoted as segment U， after the treatment process. an offset Off_U is added to an estimated value E_U to obtain V_U. In a formula way， V_U＝E_U+Off_U.

In another embodiment， E_U is calculated as a function of all the pixel values in segment U. In a formula way， E_U＝f (P_U) ， where P_U represents all the pixel values in segment U.

In still another embodiment， E_U is calculated as the average value of all the pixel prediction values in segment U.

In still another embodiment， E_U is calculated as the middle value of all the pixel prediction values in segment U.

In still another embodiment， E_U is calculated as the average value of the minimum and the maximum pixel prediction value in segment U. In a formula way， E_U＝ (V^Umax+V^Umin) /2， where V^U max and V^U min are the minimum and the maximum pixel prediction value in segment U respectively.

In still another embodiment， E_U is calculated as the mode value of the pixel values in segment U. The mode value is defined as the value that appears most often in segment U.

An exemplary procedure to get the mode value of the pixel values in segment U is as follows. Suppose MinV and MaxV are possible the minimal and maximal pixel values respectively. When the bit depth is 8， MinV is 0 and MaxV is 255. For i from MinV to MaxV， a variable Count [i] is initialized to be 0. For each pixel in segment U， Count [v] ++， where v is the pixel value. Finally， m is output as the mode value if Count [m] is the largest in Count [i] swith i from MinV to MaxV.

In another embodiment， E_U is calculated based on pixels which are at some special positions as well as are in the segment U.

In another embodiment， E_U is calculated as the mode value of the pixels which are at some special positions as well as are in the segment U. The mode value is defined as the value that appears most often in segment U. For example， E_U is calculated as the mode value of the pixels which satisfying (x％2＝＝0) && (y％2＝＝0) in the segment U， where (x， y) represents the position. In one example as depicted in Fig. 10， E_U is calculated as the mode value of the pixels which are at the positions painted in black and in the segment U. An exemplary algorithm can be described as follows，

A variable sampleCount [j] [k] is set equal to 0 for all k from 0 to (1＜＜BitDepth_Y) -1， and all j from 0 to nSegNum [xCb] [yCb] -1.

A variable mostCount [j] is set equal to 0 for allj from 0 to nSegNum [xCb] [yCb] -1.

A variable segPred [j] is set equal to (1＜＜ (BitDepth_Y-1) for all j from 0 to nSegNum [xCb] [yCb] -1.

For y in the range of 0 to nTbS-1， inclusive， the following applies：

For x in the range of 0 to nTbS-1， inclusive， the following applies：

When y％2＝＝0 &&x ％2＝＝0， the following applies：

j＝segIdx [x] [y] .

sampleCount [j] [refSamples [x] [y] ] ++.

When sampleCount [j] [refSamples [x] [y] ] ＞mostCount [j] ， mostCount [j] is set equal to sampleCount [j] [refSamples [x] [y] ] and segPred [j] is set equal to refSamples [x] [y] .

In the algorithm above， refSamples [x] [y] represents the sample values at position (x， y) in the block. segIdx [x] [y] represents the segment index for position (x， y) . segPred [j] represents the required E_j for the segment j.

In another embodiment， a default E_U is used if no pixel are at some special positions as well as are in the segment U. For example， if no pixels in the black position depicted in Fig. 10 belongs to the segment U， then a default value is assigned to E_U. The default value can be 0， 128， 255， 1＜＜ (bit_depth-1) ， (1＜＜bit_depth) -1 or any other valid integers.

In one embodiment， the offset Off_U for segment U can be signaled explicitly by the encoder to decoder， or it can be derived implicitly by the decoder.

In one embodiment， the offset Off_U for segment U is calculated by the encoder according to the pixel original values in segment U and pixel prediction values in segment U. For example， the offset Off_U for segment U is calculated by the encoder by subtracting the average value of all the pixel original values in segment U and the average value of all the pixel prediction values in segment U.

In another embodiment， the offset Off_U for segment U is calculated by the encoder by subtracting the average value of all the pixel original values in segment U and E_U.

In still another embodiment， OffIdx_U instead of Off_U is coded. OffIdx_U is the DLT index offset.

In still another embodiment， a flag is signaled to indicate whether Off or OffIdx for all segments are zero in the block if segmental prediction is applied. If the condition holds， then no Off or OffIdx for segments in the block is signaled and all Off or OffIdx’s are implied as 0.

In still another embodiment， if the flag indicates that at least one Off or OffIdx for a segments is not zero in the block if segmental prediction is applied， and all Off or OffIdx’s for segments before the last segments are signaled to be 0， then the Off or OffIdx for the last segment cannot be 0. For example， Off-1 or OffIdx-1 instead of Off or OffIdx for the last segment should be coded. And the decoded value plus 1 is assigned to Off or OffIdx for the last segment.

In still another embodiment， V_U is calculated as V_U＝g (f (E_U) +OffIdx_U) ， where f represents a function mapping a depth value to a DLT index， and g represents a function mapping a DLT index to a depth value. f (E_U) +OffIdx_U should be clipped to a valid DLT index.

In another embodiment， the DLT index offset OffIdx_U for segment U is calculated by the encoder by subtracting DLT index of the average value of all the pixel original values in segment U and the DLT index of E_U. In a formula way， OffIdx_U＝f (A_U) -f (E_U) ， where A_U is the average value of all the original pixel values in segment U and f represents a function mapping a depth value to a DLT index.

In one embodiment， the segmental prediction method can be used or not adaptively. The encoder can send the information of whether to use the segmental prediction method to the decoder explicitly. Or the decoder can derive whether to use the segmental prediction method in the same way as the encoder implicitly.

In another embodiment， residues of a block are not signaled and implied to be all 0 if segmental prediction is applied in the block.

In still another embodiment， the segmental prediction method can be applied on coding tree unit (CTU) ， coding unit (CU) ， prediction unit (PU) or transform unit (TU) .

In still another embodiment， the encoder can send the information of whether to use the segmental prediction method to the decoder in video parameter set (VPS) ， sequence parameter set (SPS) ， picture parameter set (PPS) ， slice header (SH) ， CTU， CU， PU， or TU.

In still another embodiment， the segmental prediction method can only be applied for CU with some particular sizes. For example， it can only be applied to a CU with size larger than 8x8. In another example， it can only be applied to a CU with size smaller than 64x64.

In still another embodiment， the segmental prediction method can only be applied for CU with some particular PU partition. For example， it can only be applied to a CU with 2Nx2N partition.

In still another embodiment， the segmental prediction method can only be applied for CU with some particular coding mode. For example， it can only be applied to a CU with IBC mode.

In still another embodiment， the segmental prediction method can only be applied for CU with InterSDC mode.

In one embodiment， the number of segments in the segmental prediction method is adaptively. The encoder can send the information of how many segments to the decoder explicitly when the segmental prediction method is used. Or the decoder can derive the number in the same way as the encoder implicitly.

In still another embodiment， the encoder can send the information of how many segments to the decoder in video parameter set (VPS) ， sequence parameter set (SPS) ， picture parameter set (PPS) ， slice header (SH) ， CTU， CU， PU， or TU where the segmental prediction method is used.

In still another embodiment， the encoder can send the information of the offsets for each segment to the decoder in video parameter set (VPS) ， sequence parameter set (SPS) ， picture parameter set (PPS) ， slice header (SH) ， CTU， CU， PU， or TU where the segmental prediction method is used.

In still another embodiment， the encoder can send the information of whether to use the segmental prediction method to the decoder in a CU coded with InterSDC mode.

In still another embodiment， the encoder can send the information of how many segments to the decoder in a CU coded with InterSDC mode.

In still another embodiment， the encoder can send the information of the offsets or DLT index offsets for each segment to the decoder in a CU coded with InterSDC and in which the segmental prediction method is used.

In still another embodiment， the segmental prediction method can be applied to the texture component. It can also be applied to depth components in 3D video coding.

In still another embodiment， the segmental prediction method can be applied to the luma component. It can also be applied to chroma components.

In still another embodiment， the decision of whether to use segmental prediction method can be made separately for each component， with information signaled separately. Or， the decision of whether to use segmental prediction method can be made together for all components， with a single piece of information signaled.

In still another embodiment， the number of segments when segmental prediction method is used can be controlled separately for each component separately， with information signaled separately. Or， the number of segments when segmental prediction method is used can be controlled together for all components， with a single piece of information signaled.

In still another embodiment， the offsets for each segment when segmental prediction method is used can be decided separately for each component separately， with information signaled separately. Or， the offsets for each segment when segmental prediction method is used can be decided together for all components， with a single piece of information signaled.

The methods described above can be used in a video encoder as well as in a video decoder. Embodiments of disparity vector derivation methods according to the present invention as described above may be implemented in various hardware， software codes， or a combination of both. For example， an embodiment of the present invention can be a circuit integrated into a video compression chip or program codes integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program codes to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor， a digital signal processor， a microprocessor， or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention， by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware codes may be developed in different programming languages and different format or style. The software code may also be compiled for different target platform. However， different code formats， styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. To the contrary， it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art) . Therefore， the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.

Claims

A method of segmental prediction coding, in which a prediction block is processed by a segmental process before used to get residues at an encoder or get reconstruction at a decoder, comprising,

classifying pixels in the prediction block into different categories, named as ‘segment’ , wherein the pixels in a segment is either adjacent or not；

treating the pixels in different segments in different ways； and

forming a new prediction block after treatment for all the pixels, wherein the new prediction block is used to get the residues at the encoder or get the reconstruction at the decoder.
The method as claimed in claim 1, wherein prediction values are classified according to values, positions, or gradients.
The method as claimed in claim 2, wherein the prediction block is classified into two segments, and a pixel is classified according to its relationship with a threshold number T.
The method as claimed in claim 3, wherein T is calculated as a function of all the pixel values in the prediction block, and the function comprises one of the following calculations:

T is calculated as an average value of all the pixel values in the prediction block；

T is calculated as a middle value of all the pixel values in the prediction block；

T is calculated as the average value of some pixels in the prediction block；

T is calculated as the average value of four corner pixels in the prediction block；

and

T is calculated as an average value of a minimum and a maximum pixel value in the prediction block, T＝ (Vmax+Vmin) /2, where Vmax and Vmin are the minimum and the maximum pixel value in the prediction block respectively.
The method as claimed in claim 2, wherein the prediction block is classified into M (M>2) segments, and a pixel is classified according to its relationship with M–1 threshold numbers T₁<＝T₂<＝...<＝T_M-1.
The method as claimed in claim 5, wherein T_k is calculated as a function of all the pixel values in the prediction block, where k if from 1 to M-1, T_k＝f_k (P) , where P represents all the pixel values in the prediction block.
The method as claimed in claim 5, wherein M is equal to 3, a pixel is classified into segment 0 if its value is smaller than T₁； it is classified into segment 2 if its value is larger than T₂； otherwise it is classified into segment 1, T₁＝ (T+Vmin) /2 and T₂＝ (Vmax+T) /2, where T is an average value of all the pixel values in the prediction block, Vmax and Vmin are a minimum and a maximum pixel value in the prediction block respectively.
The method as claimed in claim 1, wherein an offset Off_U is added to a pixel in a segment, denoted as segment U, in the treatment process to get the new prediction value, Vnew＝Vold+Off_U, where Vold is the pixel value before treatment and Vnew is the pixel value after treatment respectively.
The method as claimed in claim 1, wherein all pixels possess a same value V_U in a segment, denoted as segment U, after treatment, and an offset Off_U is added to an estimated value E_U to obtain V_U, V_U＝E_U+Off_U.
The method as claimed in claim 9, E_U is calculated as a function of all the pixel values in segment U, E_U＝f (P_U) , where P_U represents all the pixel values in segment U, and functions include but not limited to:

E_U is calculated as an average value of all the pixel prediction values in segment U；

E_U is calculated as a middle value of all the pixel prediction values in segment U；

E_U is calculated as an average value of a minimum and a maximum pixel prediction value in segment U, E_U＝ (V^Umax+V^Umin) /2, where V^Umax and V^Umin are a minimum and the maximum pixel prediction value in segment U respectively；

E_U is calculated as the mode value of the pixel values in segment U； The mode value is defined as the value that appears most often in segment U.
The method as claimed in claim 9, wherein E_U is calculated as a function of some pixel values in segment U, E_U＝f (P_U) , where P_U represents some pixel values in at some special positions in segment U.
The method as claimed in claim 11, wherein E_U is calculated based on pixels which are at some special positions as well as are in the segment U.
The method as claimed in claim 12, wherein E_U is calculated as the mode value of the pixels which are at some special positions in the segment U； For example, E_U is calculated as the mode value of the pixels which satisfying (x％2＝＝0) && (y％2＝＝0) in the segment U, where (x, y) represents the position.
The method as claimed in claim 12, wherein a default E_U is used if no pixel are at some special positions as well as are in the segment U.
The method as claimed in claim 14, wherein The default value can be 0, 128, 255, 1<<(bit_depth-1) , (1<<bit_depth) -1 or any other valid integers.