WO2012081636A1

WO2012081636A1 - Image decoding device, image coding device, and data structure of coded data

Info

Publication number: WO2012081636A1
Application number: PCT/JP2011/078953
Authority: WO
Inventors: 将伸八杉
Original assignee: シャープ株式会社
Priority date: 2010-12-15
Filing date: 2011-12-14
Publication date: 2012-06-21

Abstract

Provided is an image decoding device comprising: an LUT derivation unit (134) which queries a local decoded image (P') which is located on the periphery of a subject partition, and derives, in the form of an LUT, a nonlinear correlation between a decoded brightness (Y) channel and a color differential (U, V) channel to be decoded; and an inter-channel prediction unit (351) which generates a color differential prediction image (PredC) from a brightness decoded image (P_Y) for the subject partition in accordance with the LUT.

Description

Image decoding apparatus, image encoding apparatus, and data structure of encoded data

The present invention relates to an image decoding device that performs intra prediction on image color differences, an image encoding device, and a data structure of encoded data.

In order to efficiently transmit or record a moving image, a moving image encoding device that generates encoded data by encoding the moving image, and a moving image that generates a decoded image by decoding the encoded data An image decoding device is used.

Specific examples of the moving image encoding method include H.264. H.264 / MPEG-4. Adopted in KTA software, which is a codec for joint development in AVC (Non-patent Document 1) and VCEG (Video Coding Expert Group), and TMuC (Test Model Under Consideration) software, which is the successor codec (Non-Patent Document 2).

In such an encoding method, a predicted image is usually generated based on a locally decoded image obtained by encoding / decoding an input image, and the predicted image is subtracted from the input image (original image). The prediction residual (which may be referred to as “difference image” or “residual image”) is encoded. In addition, examples of the method for generating a predicted image include inter-screen prediction (inter prediction) and intra-screen prediction (intra prediction).

Note that TMuC (Non-Patent Document 2) proposes to divide a moving image and manage it by the following hierarchical structure.

First, an image (picture) constituting a moving image is divided into slices. The slice is also divided into maximum coding units (sometimes called macroblocks or maximum coding units (Largest （Coding Units)). The maximum coding unit can be divided into smaller coding units (Coding Unit) by quadtree division.

Further, a coding unit (leaf CU) that cannot be further divided is treated as a conversion unit and a prediction unit. These are sometimes called blocks. In addition, a unit called a partition that uses a prediction unit as it is or is further divided is defined. In addition, intra prediction is performed in units of partitions.

H. H.264 / MPEG-4. In AVC (Non-Patent Document 1) and the like, an expression format in which a pixel is represented by a combination of a luminance component Y and color difference components C _b and C _r is employed.

Also, intra prediction is performed for color differences. Luminance and color difference are inherently independent components. Therefore, H.I. H.264 / MPEG-4. In AVC (Non-Patent Document 1) and the like, horizontal prediction, vertical prediction, DC prediction, and plane prediction are used for intra prediction of color difference signals.

Also, the human eye has a visual characteristic that it is sensitive to pixel luminance changes but insensitive to color changes.

Therefore, even if the resolution of the chrominance pixel is lowered, there is less visual influence than when the resolution of the luminance pixel is lowered. Therefore, in the encoding of moving images, the resolution of color difference pixels is made lower than the resolution of luminance pixels to reduce the data amount.

FIG. 27 shows an example of a correspondence relationship between an original image, a luminance pixel, and a color difference pixel in the prior art. 27A shows an image (YUV) to be encoded, FIG. 27B shows a luminance pixel, and FIG. 27C shows a color difference pixel (exemplified for U). Show.

As shown in FIGS. 27A to 27C, for example, the resolution of the luminance pixel is 8 × 8, whereas the color difference pixel has a resolution of 4 × 4.

By the way, in the local space, the types of pixel values are limited. For this reason, it is known that there is a correlation between the luminance (Y) and the color difference (UV) in a specific situation, for example, in such a local space. In recent years, a technique for using this correlation for intra prediction of color difference has been proposed. This is a so-called inter-channel prediction technique.

For example, Non-Patent Document 3 discloses predicting a color difference image (UV) from a luminance image (Y) by linear conversion. Specifically, performing linear transformation according to the following formula (A1) is disclosed.

PredC [x _C , y _C ] = α _C RecY [x _Y , y _Y ] + β _C (A1)
The meaning of each symbol in the formula (A1) is as follows.

PredC: predicted image (color difference)
[X _C , y _C ], [x _Y , y _Y ]: coordinates indicating the position of the same sample RecY: decoded image (luminance)
α _C , β _C : coefficients derived by the least squares method from the pixel values of the surrounding decoded images (hereinafter referred to as local decoded images) As described above, from the difference in resolution between the luminance image and the color difference image, The coordinates of [x _C , y _C ] and [x _Y , y _Y ] need to be appropriately converted.

Further, Non-Patent Document 3 also mentions how to take sample points when the resolution is different between the luminance image (Y) and the color difference image (U, V), as illustrated in FIG. . FIG. 28A shows the case where sample points are taken from a 2N × 2N size luminance image, and FIG. 28B shows the case where samples are taken from an N × N color difference image. . Non-Patent Document 3 describes that the sample point smpl100 shown in (a) of FIG. 28 is associated with the sample point smpl200 shown in (b) of FIG.

The conventional technology as described above performs linear conversion by the least square method in inter-channel prediction. For this reason, when the local decoded image used as a sample is not suitable for linear transformation, the accuracy of inter-channel prediction may not be sufficient.

This problem will be described with reference to FIG. 29 as follows. As shown in FIG. 29A, when the sample distribution on the luminance-chrominance plane falls within one region in the group Gr100, the accuracy of prediction of the above-described equation (A1) can be expected to some extent.

However, as shown in FIG. 29B, the sample distribution is distributed in two regions of the group Gr101 and the group Gr102, and the sample distribution varies in each region. In such a case, the error may become large in the linear conversion. For example, the sample points Smpl300 are distributed at positions far away from the values approximated by the equation (A1).

As described above, when the prediction between the channels is performed according to the linear correlation, there is a problem that the accuracy of the prediction is lowered.

The present invention has been made in view of the above-mentioned problems, and its purpose is to perform a prediction between channels in accordance with a linear correlation because each component of each pixel included in a locally decoded image varies. Thus, an object of the present invention is to realize an image decoding apparatus capable of improving the possibility that higher prediction accuracy can be obtained even when the accuracy of prediction is lowered.

In order to solve the above problem, an image decoding apparatus according to the present invention generates a prediction image for each of a plurality of channels indicating each component constituting an image, and adds a prediction residual to the generated prediction image In the image decoding apparatus that decodes the encoded image data, channel decoding means for decoding one or a plurality of channels among the plurality of channels for the processing target block, and each of the plurality of channels has been decoded A nonlinear correlation between the one or more channels that have been decoded by the channel decoding means and the other channels to be decoded is referred to with reference to a locally decoded image located around the block to be processed. The correlation derivation means to be derived and the processing target block are restored according to the above derived correlation. From already the one or more channels of the decoded image, characterized in that it comprises, the predicted image generating means for generating the prediction image of the other channels.

In the above configuration, a channel is a generalized component that constitutes an image. For example, in the YUV color space, the luminance component and the color difference component correspond to channels. That is, in this example, the channel includes a luminance channel and a color difference channel. The color difference channel includes a U channel indicating the U component of the color difference and a V channel indicating the V component of the color difference. The channel may relate to the RGB color space. The image decoding process is performed for each channel.

Also, the local decoded image is a decoded image that has been decoded for the plurality of channels and is located around the block to be processed.

Processing target block refers to various processing units in the decoding process. For example, a coding unit, a conversion unit, a prediction unit, and the like can be given. Note that the processing unit includes a unit obtained by further subdividing the encoding unit, the conversion unit, and the prediction unit.

The periphery of the processing target block includes, for example, a pixel adjacent to the target block, a block adjacent to the left side of the target block, a block adjacent to the upper side of the target block, and the like.

Also, in a local space, it is known that there is a correlation between the luminance channel and the color difference channel. Therefore, in the above configuration, by referring to the locally decoded image located around the processing target block as a sample, between the one or more channels decoded by the channel decoding means and the other channels to be decoded. The nonlinear correlation of is derived.

Here, the non-linear correlation can be derived, for example, by examining the correspondence of each point composed of the luminance value and the color difference value.

As the non-linear correlation, for example, when describing the YUV color space, it can be derived from the correspondence between the luminance value of each pixel included in the locally decoded image and the color difference value. The correlation may be realized as an LUT in which a decoded channel and a decoding target channel are associated with each other. The correlation may be expressed by a function including a relational expression established between the decoded channel and the decoding target channel.

According to the above configuration, the channel to be decoded in the processing target block is predicted from the channel that has been decoded in the processing target block according to the nonlinear correlation derived in this way. Hereinafter, such prediction is also referred to as inter-channel prediction.

In the inter-channel prediction, for example, the pixel value of the decoded image of the decoded channel is converted according to the nonlinear correlation, and the pixel value of the predicted image of the channel to be decoded is obtained. Here, the pixel value is a generalized value of a component constituting the image.

For this reason, there is variation in each component of each pixel included in the locally decoded image, and even when prediction between each channel is performed according to a linear correlation, the accuracy of the prediction is reduced. There is an effect that the possibility that higher prediction accuracy can be obtained can be improved.

In order to solve the above problems, an image encoding device according to the present invention encodes a prediction residual obtained by subtracting, from an original image, a prediction image generated for each of a plurality of channels indicating each component constituting the image. In the image encoding device that generates encoded data by converting, a channel decoding unit that decodes one or a plurality of channels among the plurality of channels and a plurality of channels that have been decoded for the processing target block A non-linear correlation between the one or more channels decoded by the channel decoding means and other channels to be decoded is derived with reference to a local decoded image located around the processing target block. The correlation derivation means and the processing target block are decoded according to the derived correlation. Mino from the decoded image of the one or more channels, characterized in that it comprises a prediction image generating means for generating the prediction image of the other channels.

In order to solve the above-described problem, the data structure of the encoded data according to the present invention includes a prediction residual obtained by subtracting, from an original image, a prediction image generated for each of a plurality of channels indicating each component constituting the image. Is a data structure of encoded data generated by encoding the image, and is generated by generating a prediction image for each of the plurality of channels and adding a prediction residual to the generated prediction image. In the image decoding apparatus that decodes the image data, channel decoding processing order information indicating in which order the plurality of channels are decoded for the processing target block, and which of the decoded 1 above for the processing target block Alternatively, the processing target that has been decoded for each of the plurality of channels from the decoded images of the plurality of channels With reference to a locally decoded image located around the lock, the predicted image of the other channel is determined according to a non-linear correlation between the one or more decoded channels and the other channel to be decoded. And prediction source channel designation information for designating whether to generate.

According to the image encoding device or the data structure of the encoded data configured as described above, the same effects as those of the image decoding device according to the present invention can be obtained.

Note that the image encoding device specifies channel decoding processing order information indicating in which order the plurality of channels are to be decoded, and from which of the decoded channels the channel to be decoded is to be predicted The prediction source channel information may be included in the data structure of the encoded data. The image encoding device may encode the information in, for example, side information.

The image decoding apparatus according to the present invention includes a channel decoding unit that decodes one or a plurality of channels among a plurality of channels, and a periphery of the processing target block that has been decoded for each of the plurality of channels. A correlation deriving means for deriving a non-linear correlation between the one or more channels already decoded by the channel decoding means and the other channels to be decoded with reference to the local decoded image located at The block includes a predicted image generating unit that generates the predicted image of the other channel from the decoded image of the one or more channels that has been decoded in accordance with the derived correlation.

The image encoding device according to the present invention includes a channel decoding unit that decodes one or more channels among a plurality of channels for the processing target block, and the processing target block that has been decoded for each of the plurality of channels. Correlation deriving means for deriving a non-linear correlation between the one or more channels already decoded by the channel decoding means and other channels to be decoded with reference to locally decoded images located in the periphery, and processing The target block includes a predicted image generating unit that generates the predicted image of the other channel from the decoded images of the one or more channels that have been decoded in accordance with the derived correlation.

The data structure of the encoded data according to the present invention is an image that decodes encoded image data by generating a prediction image for each of the plurality of channels and adding a prediction residual to the generated prediction image. In the decoding apparatus, channel decoding processing order information indicating in which order the plurality of channels are decoded for the processing target block, and the decoded image of the one or more channels already decoded for the processing target block To the local decoded image located around the block to be processed that has been decoded for each of the plurality of channels, and between the one or more channels that have been decoded and the other channels to be decoded. Prediction specifying whether to generate the predicted image of the other channel according to the nonlinear correlation of And channel information, a data structure containing.

Therefore, even if there is variation in each component of each pixel included in the locally decoded image and prediction between the channels is performed according to a linear correlation, the accuracy of the prediction may be reduced. There is an effect that the possibility of obtaining suitable prediction accuracy can be improved.

It is a block diagram which shows the structure of the estimated image generation part with which the moving image decoding apparatus which concerns on one Embodiment of this invention is provided. It is a functional block diagram which shows the schematic structure of the said moving image decoding apparatus. It is a figure which shows the structure of the encoding data produced | generated by the moving image encoder which concerns on one Embodiment of this invention, and decoded by the said moving image decoder. It is a figure explaining YUV color space. FIG. 4 is a diagram illustrating an image format of a YUV format, where (a) to (d) are a 4: 2: 0 format, a 4: 4: 4 format, a 4: 2: 2 format, and a 4: 1: 1 format, respectively. Indicates. It is a figure shown about the correspondence of the pixel position of the brightness | luminance and color difference in 4: 2: 0 format. FIG. 6 is a diagram illustrating sample positions of color difference pixels, and (a) to (c) show three sample positions. It is a figure shown about the pattern of the processing order of a color difference channel. It is the flowchart which illustrated about the schematic flow of the color difference estimated image generation process in the said estimated image generation part. FIG. 4 is a diagram illustrating an example of an image in which luminance-color difference distributions do not overlap and extend over two regions. 11 is a graph plotting luminance (Y) -color difference (U) of pixels included in the image shown in FIG. It is the flowchart shown about an example of the flow of the LUT derivation | leading-out process by a LUT derivation | leading-out part. It is a figure explaining the linear interpolation between entries. It is a graph which shows the value of LUT derived | led-out from the image shown in FIG. It is the flowchart shown about an example of the flow of the color difference estimated image generation process by the prediction between channels in the prediction part between channels. It is a figure shown about the pattern of the processing order of a color difference channel in case the prediction origin channel of the channel which performs a 2nd prediction is two. It is a figure shown about the other pattern of the processing order of a color difference channel in case the prediction origin channel of the channel which performs a 2nd prediction is two. 12 is a flowchart showing a modification of the flow of LUT derivation processing by the LUT derivation unit. It is a figure which shows an example of the derived | led-out LUT. It is a figure explaining the example which uses a function instead of LUT. It is a functional block diagram shown about the structure of the moving image encoder which concerns on one Embodiment of this invention. It is the flowchart shown about the example of the flow of the color difference channel process order in the said moving image encoder, and the encoding process of a prediction source channel. It is a functional block diagram shown about the other example of a structure of the estimated image generation part with which the moving image decoding apparatus which concerns on other embodiment of this invention is provided. It is the flowchart which illustrated about the schematic flow of the color difference estimated image generation process in the said estimated image generation part. It is the flowchart shown about an example of the flow of the LUT update process by the LUT derivation | leading-out part with which the said moving image decoding apparatus is provided. It is a figure explaining re-interpolation using the additionally registered entry. It is a figure shown about an example of the correspondence of an original image, a brightness | luminance pixel, and a color difference pixel. It is a figure shown about the sampling method when the resolutions of Y, U, and V are different. FIG. 5 is a graph plotting luminance (Y) -color difference (U) of an image, and two types of images are shown in (a) and (b). It is the figure shown about the structure of the transmitter which mounts the said moving image encoder, and the receiver which mounts the said moving image decoder. (A) shows a transmitting apparatus equipped with a moving picture coding apparatus, and (b) shows a receiving apparatus equipped with a moving picture decoding apparatus. It is the figure shown about the structure of the recording device which mounts the said moving image encoder, and the reproducing | regenerating apparatus which mounts the said moving image decoder. (A) shows a recording apparatus equipped with a moving picture coding apparatus, and (b) shows a reproduction apparatus equipped with a moving picture decoding apparatus.

[1] Embodiment 1
An embodiment of the present invention will be described with reference to FIGS. First, an overview of the moving picture decoding apparatus (image decoding apparatus) 1 and the moving picture encoding apparatus (image encoding apparatus) 2 will be described with reference to FIG. FIG. 2 is a functional block diagram showing a schematic configuration of the moving picture decoding apparatus 1.

The video decoding device 1 and the video encoding device 2 shown in FIG. H.264 / MPEG-4 AVC standard technology, VCEG (Video Coding Expert Group) technology used in KTA software, which is a joint development codec, and successor codec TMuC (Test Model Under Consideration ) The technology used in the software is implemented.

The moving image decoding apparatus 1 receives encoded data (data structure of encoded data) # 1 obtained by encoding a moving image by the moving image encoding apparatus 2. The video decoding device 1 decodes the input encoded data # 1 and outputs the video # 2 to the outside. Prior to detailed description of the moving picture decoding apparatus 1, the configuration of the encoded data # 1 will be described below.

[Configuration of encoded data]
The configuration of encoded data # 1 that is generated by the video encoding device 2 and decoded by the video decoding device 1 will be described with reference to FIG. The encoded data # 1 has a hierarchical structure including a sequence layer, a GOP (Group Of Pictures) layer, a picture layer, a slice layer, and a maximum coding unit (LCU: Large Coding Unit) layer.

FIG. 3 shows the hierarchical structure below the picture layer in the encoded data # 1. 3A to 3F show a picture layer PICT, a slice layer S, an LCU layer LCU, a leaf CU included in the LCU (denoted as CUL in FIG. 3D), and inter prediction (between screens). It is a figure which shows the structure of inter prediction information PI_Inter which is the prediction information PI about (prediction) partition, and intra prediction information PI_Intra which is the prediction information PI about intra prediction (prediction in a screen) partition.

(Picture layer)
The picture layer PICT is a set of data referred to by the video decoding device 1 in order to decode a target picture that is a processing target picture. As shown in FIG. 3A, the picture layer PICT includes a picture header PH and slice layers S ₁ to S _NS (NS is the total number of slice layers included in the picture layer PICT). Hereinafter, when it is not necessary to distinguish each of the slice layers S ₁ to S _NS , the reference numerals may be omitted. The same applies to other configurations included in the encoded data # 1.

The picture header PH includes a coding parameter group referred to by the video decoding device 1 in order to determine a decoding method of the target picture. For example, the encoding mode information (entoropy_coding_mode_flag) indicating the variable length encoding mode used in encoding by the moving image encoding device 2 is an example of an encoding parameter included in the picture header PH. When entorpy_coding_mode_flag is 0, the picture is encoded by CAVLC (Context-based Adaptive Variable Length Coding). It has become.

(Slice layer)
Each slice layer S included in the picture layer PICT is a set of data referred to by the video decoding device 1 in order to decode a target slice that is a processing target slice. As shown in FIG. 3B, the slice layer S includes a slice header SH and LCU layers LCU ₁ to LCU _NC (NC is the total number of LCUs included in the slice S).

The slice header SH includes a coding parameter group that the moving image decoding apparatus 1 refers to in order to determine a decoding method of the target slice. Slice type designation information (slice_type) for designating a slice type is an example of an encoding parameter included in the slice header SH. Further, the slice header SH includes a filter parameter FP that is referred to by a loop filter included in the video decoding device 1.

As slice types that can be specified by the slice type specification information, (1) I slice using only intra prediction at the time of encoding, and (2) P using unidirectional prediction or intra prediction at the time of encoding. Slice, (3) B-slice using unidirectional prediction, bidirectional prediction, or intra prediction at the time of encoding.

(LCU layer)
Each LCU layer LCU included in the slice layer S is a set of data that the video decoding device 1 refers to in order to decode the target LCU that is the processing target LCU. LCU layer LCU, as shown in (c) of FIG. 3, LCU header LCUH, and a plurality of coding units obtained by the quadtree dividing the LCU: the _{(CU Coding Unit) CU 1 ~} CU NL Contains.

The size that each CU can take depends on the LCU size and the hierarchical depth included in the sequence parameter set SPS of the encoded data # 1. For example, when the size of the LCU is 128 × 128 pixels and the maximum hierarchical depth is 5, the CU included in the LCU has five types of sizes, that is, 128 × 128 pixels, 64 × 64 pixels, Any of 32 × 32 pixels, 16 × 16 pixels, and 8 × 8 pixels can be taken. A CU that is not further divided is called a leaf CU.

(LCU header)
The LCU header LCUH includes an encoding parameter referred to by the video decoding device 1 in order to determine a decoding method of the target LCU. Specifically, as shown in FIG. 3C, CU partition information SP_CU that specifies a partition pattern for each leaf CU of the target LCU, and a quantization parameter difference Δqp that specifies the size of the quantization step. (Mb_qp_delta) is included.

CU division information SP_CU is information that specifies the shape and size of each CU (and leaf CU) included in the target LCU, and the position in the target LCU. Note that the CU partition information SP_CU does not necessarily need to explicitly include the shape and size of the leaf CU. For example, the CU partition information SP_CU may be a set of flags (split_coding_unit_flag) indicating whether or not the entire LCU or a partial region of the LCU is divided into four. In that case, the shape and size of each leaf CU can be specified by using the shape and size of the LCU together.

Further, the quantization parameter difference Δqp is a difference qp−qp ′ between the quantization parameter qp in the target LCU and the quantization parameter qp ′ in the LCU encoded immediately before the LCU.

(Leaf CU)
A CU (leaf CU) that cannot be further divided is treated as a prediction unit (PU: Prediction Unit) and a transform unit (TU: Transform Unit).

As shown in (d) of FIG. 3, the leaf CU (denoted as CUL in (d) of FIG. 3) is (1) PU information PUI that is referred to when the moving image decoding apparatus 1 generates a predicted image. And (2) the TU information TUI that is referred to when the moving image decoding apparatus 1 decodes the residual data. The PU information PUI may include a skip flag SKIP. When the value of the skip flag SKIP is 1, the TU information is omitted.

The PU information PUI includes prediction type information PT and prediction information PI, as shown in FIG. The prediction type information PT is information that specifies whether intra prediction or inter prediction is used as a predicted image generation method for the target leaf CU (target PU). The prediction information PI includes intra prediction information PI_Intra or inter prediction information PI_Inter depending on which prediction method is specified by the prediction type information PT. Hereinafter, a PU to which intra prediction is applied is also referred to as an intra PU, and a PU to which inter prediction is applied is also referred to as an inter PU.

The PU information PUI includes information specifying the shape and size of each partition included in the target PU and the position in the target PU. Here, the partition is one or a plurality of non-overlapping areas constituting the target leaf CU, and the generation of the predicted image is performed in units of partitions.

As shown in FIG. 3 (d), the TU information TUI includes TU partition information SP_TU that specifies a partition pattern for each block of the target leaf CU (target TU), and quantized prediction residuals QD ₁ to QD _NT. (NT is the total number of blocks included in the target TU).

TU partition information SP_TU is information that specifies the shape and size of each block included in the target TU and the position in the target TU. Each TU can be, for example, a size from 64 × 64 pixels to 2 × 2 pixels. Here, the block is one or a plurality of non-overlapping areas constituting the target leaf CU, and prediction residual encoding / decoding is performed in units of TUs or blocks obtained by dividing TUs.

Each quantized prediction residual QD is encoded data generated by the moving image encoding apparatus 2 performing the following processes 1 to 3 on a target block that is a processing target block. Process 1: DCT transform (Discrete Cosine Transform) is performed on the prediction residual obtained by subtracting the prediction image from the encoding target image. Process 2: The DCT coefficient obtained in Process 1 is quantized. Process 3: The DCT coefficient quantized in Process 2 is variable length encoded. The quantization parameter qp described above represents the magnitude of the quantization step QP used when the moving picture coding apparatus 2 quantizes the DCT coefficient (QP = 2 ^{qp / 6} ).

(Inter prediction information PI_Inter)
The inter prediction information PI_Inter includes a coding parameter that is referred to when the video decoding device 1 generates an inter prediction image by inter prediction. As shown in FIG. 3E, the inter prediction information PI_Inter includes inter PU partition information SP_Inter that specifies a partition pattern of the target PU into each partition, and inter prediction parameters PP_Inter1 to PP_InterNe (Ne for each partition). , The total number of inter prediction partitions included in the target PU).

Specifically, the inter-PU partition information SP_Inter is information for designating the shape and size of each inter prediction partition included in the target PU (inter PU) and the position in the target PU.

The inter PU is composed of four symmetric splittings of 2N × 2N pixels, 2N × N pixels, N × 2N pixels, and N × N pixels, and 2N × nU pixels, 2N × nD pixels, and nL × 2N. It is possible to divide into 8 types of partitions in total by four asymmetric splits of pixels and nR × 2N pixels. Here, the specific value of N is defined by the size of the CU to which the PU belongs, and the specific values of nU, nD, nL, and nR are determined according to the value of N. For example, an inter PU of 128 × 128 pixels is 128 × 128 pixels, 128 × 64 pixels, 64 × 128 pixels, 64 × 64 pixels, 128 × 32 pixels, 128 × 96 pixels, 32 × 128 pixels, and 96 × It is possible to divide into 128-pixel inter prediction partitions.

(Inter prediction parameter)
As shown in FIG. 3E, the inter prediction parameter PP_Inter includes a reference image index RI, an estimated motion vector index PMVI, and a motion vector residual MVD.

(Intra prediction information PI_Intra)
The intra prediction information PI_Intra includes an encoding parameter that is referred to when the video decoding device 1 generates an intra predicted image by intra prediction. As shown in (f) of FIG. 3, the intra prediction information PI_Intra includes intra PU partition information SP_Intra that specifies a partition pattern of the target PU (intra PU) into each partition, and intra prediction parameters PP_Intra ₁ for each partition. -PP_Intra _NA (NA is the total number of intra prediction partitions included in the target PU).

Specifically, the intra-PU partition information SP_Intra is information that specifies the shape and size of each intra-predicted partition included in the target PU, and the position in the target PU. The intra PU split information SP_Intra includes an intra split flag (intra_split_flag) that specifies whether or not the target PU is split into partitions. If the intra partition flag is 1, the target PU is divided symmetrically into four partitions. If the intra partition flag is 0, the target PU is not divided and the target PU itself is one partition. Are treated as Therefore, if the size of the target PU is 2N × 2N pixels, the intra prediction partition can take any of 2N × 2N pixels (no division) and N × N pixels (four divisions) (where, N = 2 ⁿ , n is an arbitrary integer of 1 or more). For example, a 128 × 128 pixel intra PU can be divided into 128 × 128 pixel and 64 × 64 pixel intra prediction partitions.

Details of the intra prediction parameter PP_Intra will be described later.

[Video decoding device]
Hereinafter, the configuration of the video decoding device 1 according to the present embodiment will be described with reference to FIGS.

(Outline of video decoding device)
The video decoding device 1 generates a predicted image for each partition, generates a decoded image # 2 by adding the generated predicted image and the prediction residual decoded from the encoded data # 1, and generates The decoded image # 2 is output to the outside.

Here, the generation of the predicted image is performed with reference to the encoding parameter obtained by decoding the encoded data # 1. Here, the encoding parameter is a parameter referred to in order to generate a prediction image, and in addition to a prediction parameter such as a motion vector referred to in inter-screen prediction and a prediction mode referred to in intra-screen prediction. Partition size and shape, block size and shape, and residual data between the original image and the predicted image. Hereinafter, a set of all information excluding the residual data among the information included in the encoding parameter is referred to as side information.

Further, in the following description, the case where the prediction unit is a partition constituting the LCU will be described as an example. However, the present embodiment is not limited to this, and the prediction unit is a unit larger than the partition. The present invention can also be applied to the case where the prediction unit is a unit smaller than the partition.

In the following, a frame (picture), a slice, an LCU, a block, and a partition to be decoded are referred to as a target frame, a target slice, a target LCU, a target block, and a target partition, respectively.

The LCU size is, for example, 64 × 64 pixels, and the partition size is, for example, 64 × 64 pixels, 32 × 32 pixels, 16 × 16 pixels, 8 × 8 pixels, 4 × 4 pixels, or the like. These sizes do not limit the present embodiment, and the size and partition of the LCU may be other sizes.

(Configuration of video decoding device)
Referring to FIG. 2 again, the schematic configuration of the moving picture decoding apparatus 1 will be described as follows. FIG. 2 is a functional block diagram showing a schematic configuration of the moving picture decoding apparatus 1.

As shown in FIG. 2, the moving picture decoding apparatus 1 includes a variable length code demultiplexing unit 11, an inverse quantization / inverse conversion unit 12, a predicted image generation unit (channel decoding unit) 13, and an adder (channel decoding unit) 14. And a frame memory 15.

[Variable length code demultiplexer]
The variable-length code demultiplexing unit 11 demultiplexes the encoded data # 1 for one frame input to the video decoding device 1 to obtain various kinds of information included in the hierarchical structure shown in FIG. To separate. For example, the variable length code demultiplexing unit 11 refers to information included in various headers, and sequentially separates the encoded data # 1 into slices and LCUs.

Here, the various headers include (1) information on the method of dividing the target frame into slices, and (2) information on the size, shape, and position of the LCU belonging to the target slice.

Then, the variable length code demultiplexer 11 refers to the CU partition information SP_CU included in the encoded LCU header LCUH and divides the target LCU into leaf CUs. In addition, the variable-length code demultiplexing unit 11 acquires the TU information TUI and the PU information PUI for the target leaf CU: CUL.

The variable length code demultiplexing unit 11 supplies the TU information TUI obtained for the target leaf CU to the dequantization / inverse transform unit 12. Further, the variable length code demultiplexing unit 11 supplies the PU information PUI obtained for the target leaf CU to the predicted image generation unit 13.

[Inverse quantization / inverse transform unit]
The inverse quantization / inverse transform unit 12 performs inverse quantization / inverse transform of the quantization prediction residual for each block for the target leaf CU.

Specifically, the inverse quantization / inverse transform unit 12 first decodes the TU partition information SP_TU from the TU information TUI about the target leaf CU supplied from the variable length code demultiplexer 11.

Further, the inverse quantization / inverse transform unit 12 divides the target leaf CU into one or a plurality of blocks according to the decoded TU partition information SP_TU.

Also, the inverse quantization / inverse transform unit 12 decodes the TU partition information SP_TU and the quantized prediction residual QD from the TU information TUI for each block.

The inverse quantization / inverse transform unit 12 restores the prediction residual D for each pixel for each target partition by performing inverse quantization and inverse DCT transform (Inverse DiscretecreCosine Transform). The inverse quantization / inverse transform unit 12 supplies the restored prediction residual D to the adder 14.

[Predicted image generator]
For each partition included in the target leaf CU, the predicted image generation unit 13 refers to a local decoded image P ′ that is a decoded image around the partition, and generates a predicted image Pred by intra prediction or inter prediction.

Note that intra prediction and inter prediction include luminance prediction and color difference prediction, respectively. In addition, when performing inter-channel prediction in color difference intra prediction, the predicted image generation unit 13 refers to a luminance decoded image.

In the following, the generation process of the predicted image Pred by intra prediction will be described, but the present invention is not limited to this, and the video decoding device 1 may generate the predicted image Pred by inter prediction.

Intra prediction is sometimes referred to as intra prediction or spatial prediction, but in the following, it is unified with the expression intra prediction.

More specifically, the local decoded image P ′ includes a luminance local decoded image P ′ _Y related to luminance and a color difference local decoded image P ′ _C related to color difference. The color difference local decoded image P _'C, the color difference local decoded image P about U-channel' and includes a _U, and the color difference relates V channel and the local decoded image P _'V.

Specifically, the predicted image generation unit 13 operates as follows. First, the predicted image generation unit 13 decodes the PU information PUI for the target leaf CU supplied from the variable length code demultiplexing unit 11. Subsequently, the predicted image generation unit 13 determines a division pattern for each partition of the target leaf CU according to the PU information PUI. Further, the predicted image generation unit 13 selects a prediction mode of each partition according to the PU information PUI, and assigns each selected prediction mode to each partition.

The predicted image generation unit 13 generates a predicted image Pred for each partition included in the target leaf CU with reference to the selected prediction mode and the pixel values of the local decoded image P ′ around the partition. The predicted image generation unit 13 supplies the predicted image Pred generated for the target leaf CU to the adder 14.

Note that the predicted image Pred specifically includes a luminance predicted image PredY related to luminance and a color difference predicted image PredC related to color difference. Further, the color difference prediction image PredC includes a color difference prediction image PredU for the U channel and a color difference prediction image PredV for the V channel. Further, a more specific configuration of the predicted image generation unit 13 will be described later.

[Adder]
The adder 14 adds the predicted image Pred supplied from the predicted image generation unit 13 and the prediction residual D supplied from the inverse quantization / inverse transform unit 12, thereby decoding the decoded image P for the target leaf CU. Is generated. Decoded image P includes, luminance decoded image (hereinafter, referred to by the luminance decoded picture P _Y) with a color difference decoded image.

[Frame memory]
The decoded image P that has been decoded is sequentially recorded in the frame memory 15. In the frame memory 15, at the time of decoding the target LCU, decoded images corresponding to all the LCUs decoded before the target LCU (for example, all the LCUs preceding in the raster scan order) are recorded. .

Note that, in the moving image decoding apparatus 1, one frame of encoded data # 1 input to the moving image decoding apparatus 1 at the time when the decoded image generation processing for each LCU is completed for all the LCUs in the image. The decoded image # 2 corresponding to is output to the outside.

(About predicted image generator)
Hereinafter, the predicted image generation unit 13 will be described in more detail.

[Data structure of intra prediction parameters]
Next, the data structure of the intra prediction parameter PP_Intra (hereinafter referred to as “intra prediction parameter PP”) will be described as follows. The intra prediction parameter PP (data structure of encoded data) illustratively includes an inter-channel prediction flag, a color difference channel processing order flag, and a second channel prediction source channel specifier.

The inter-channel prediction flag is a flag indicating whether or not the color difference is predicted by inter-channel prediction. For example, if the inter-channel prediction flag is “1”, it indicates that the color difference is predicted by the inter-channel prediction, and if “0”, 1 bit indicates that the color difference is predicted without using the inter-channel prediction. Information.

The color difference channel processing order flag is a flag for designating whether the prediction process is performed from the U channel or the V channel. The color difference channel processing order flag is, for example, 1-bit information indicating that processing is performed in the order of U and V if “0” and in the order of V and U if “1”.

The second channel prediction source channel specifier is information for designating which channel the second predicted channel is predicted from. That is, the second predicted channel can be predicted from either the Y channel, the first predicted first channel, or both. For example, if the second channel prediction source channel specifier is “0”, it indicates that prediction is performed from the Y channel, if “10”, it indicates that prediction is performed from the first channel, and “11”. If it is, it is 1 or 2-bit information indicating that prediction is performed from both the Y channel and the first channel.

Thus, illustratively, the information indicating the combination of the processing order and the prediction source channel is encoded as separate information of the color difference channel processing order flag and the second channel prediction source channel specifier, respectively.

In the following, an example in which there is one prediction source channel will be mainly described. A case where there are a plurality of prediction source channels will be described in detail in a later-described modification.

[About color difference]
Next, the color difference will be supplementarily described with reference to FIGS. 4 and 5 as follows. FIG. 4 is a diagram for explaining the YUV color space, and shows the relationship between the luminance Y and the U component and V component, which are the components of the color difference. FIG. 5 is a diagram showing an image format of YUV format. (A) to (d) are respectively 4: 2: 0 format, 4: 4: 4 format, 4: 2: 2 format, and 4: 1: 1 format is shown.

First, the YUV color space will be described with reference to FIG. According to the YUV format, an image is expressed by luminance Y and U and V components that are color differences. As shown in FIG. 4, originally, in the YUV color space, the luminance Y is defined as an independent coordinate system orthogonal to the U-V plane with respect to the U component and the V component.

In FIG. 4, when the coordinates (U, V) indicating the U component and the V component are near the origin, the color is achromatic. On the other hand, when the coordinates (U, V) indicating the U component and the V component are further away from the origin, the color becomes darker.

As an implementation example, the luminance Y, U component, and V component each take a value from 0 to 255. Thus, the closer the luminance value is to 0, the darker the image, and vice versa. Further, when the U component and the V component have a color difference value = 128, the image becomes an achromatic color, and with this as a reference, the color becomes darker as the color difference value approaches 0 or 255.

When the color difference value of the U component is close to 0, the image is generally green, and when it is close to 255, the image is generally red. In addition, when the color difference value of the V component is close to 0, the image is generally yellow, and when it is close to 255, the image is generally blue.

By the way, if the image is viewed locally, the types of pixel values used are limited. When viewed locally, the luminance Y has a correlation with each of the U component and the V component. Therefore, locally, it is possible to derive the U component and the V component from the luminance Y using this correlation.

Hereinafter, the following terms will be used in the description of the process of deriving the U component and the V component from the luminance Y. First, the luminance Y is called the Y channel, and the color difference consisting of the U component and the V component is called the color difference channel. A channel is a generalized concept of luminance Y, U component, and V component.

Further, when it is necessary to distinguish the U component and the V component in the color difference channel, they are referred to as a U channel and a V channel, respectively. Prediction of the U channel and the V channel from the Y channel using the correlation between the luminance Y and the U component and the V component is referred to as inter-channel prediction.

Next, the image format of luminance and color difference will be described as follows. Even if the resolution of the color difference is lowered, the visual effect is less than when the luminance resolution is lowered. Therefore, the data amount can be reduced by reducing the resolution of the color difference. For example, in the 4: 2: 0 format shown in FIG. 5A, the data amount is reduced by the following data structure. Note that the left block of FIG. 5A shows the resolution of the luminance Y, and the right block shows the resolution of the U component and the V component. The same applies to (b) to (d) below.

As shown in FIG. 5A, in the 4: 2: 0 format, the resolution of the color difference is ½ of the luminance resolution in both the horizontal and vertical directions. That is, as a whole, the resolution of the color difference is 1/4 of the resolution of the luminance. Generally, the 4: 2: 0 format is used in television broadcasting and consumer video equipment.

Referring further to FIG. 5, another example of the YUV image format will be described as follows.

First, there is a 4: 4: 4 format shown in FIG. In the 4: 4: 4 format, the luminance resolution and the color difference resolution are the same. The 4: 4: 4 format is used, for example, in specialized equipment for image processing when high image quality is required rather than reducing the amount of data.

Also, there is a 4: 2: 2 format shown in FIG. In the 4: 4: 4 format, the horizontal resolution is halved.

Also, in the 4: 2: 2 format, there is a 4: 1: 1 format in which the horizontal resolution is halved ((d) in FIG. 5).

[Configuration of predicted image generation unit]
Next, the configuration of the predicted image generation unit 13 will be described in more detail with reference to FIG. FIG. 1 is a functional block diagram illustrating an example of the configuration of the predicted image generation unit 13.

As shown in FIG. 1, the predicted image generation unit 13 includes a local image input unit 131, a luminance predicted image generation unit 132, an inter-channel prediction determination unit 133, an LUT derivation unit (correlation derivation unit) 134, and a color difference prediction image generation unit ( A prediction image generation unit, a processing information acquisition unit, a prediction control unit) 135, and a prediction image output unit 136.

The local image input unit 131 acquires the luminance local decoded image P ′ _Y and the color difference local decoded image P ′ _{C from the} local decoded image P ′. The local image input unit 131 transfers the luminance local decoded image P ′ _Y to the luminance predicted image generation unit 132 and transfers the color difference local decoded image P ′ _C to the inter-channel prediction determination unit 133.

Brightness prediction image generation unit 132 refers to the luminance local decoded image P _'Y, when predictions are based on PU information PUI, it generates a brightness prediction image PredY. The predicted brightness image generation unit 132 transmits the generated predicted brightness image PredY to the predicted image output unit 136.

The inter-channel prediction determination unit 133 refers to an inter-channel prediction flag included in the intra-prediction parameter PP, and a channel in which intra prediction of color difference (hereinafter simply referred to as color difference prediction) generates a color difference prediction image by inter-channel prediction. It is determined whether or not it is an inter prediction mode.

As a result of the determination, if it is the inter-channel prediction mode, the inter-channel prediction determination unit 133 informs the LUT derivation unit 134 and the inter-channel prediction unit 351 (described later) of the color difference prediction image generation unit 135 that it is the inter-channel prediction mode. Notice. As a result of the determination, if it is not between channels prediction mode, the prediction determining section 133 among the channels, and transfers the color difference local decoded image P _'C channel prediction unit 352 of the color difference prediction image generation unit 135 (described later).

LUT deriving unit 134, for each target partition, it derived based an LUT (Look Up Table) for performing inter-channel prediction chrominance local decoded image P _'Y. The LUT derived by the LUT deriving unit is illustratively structured as follows. That is, the LUT, the pixel position of the color difference local decoded image _{_{_{P 'Y [x Y, y}}} Y] in association with the luminance value in the above pixel position _[x _{Y, y} Y] corresponding to the color difference local decoded image P 'pixel positions _{_C} _[x _C, _y _C] chrominance values in is stored.

The correspondence relationship between the pixel position [x _y , y _y ] in the color difference local decoded image P ′ _Y and the pixel position [x _C , y _C ] in the color difference prediction image Pred is described in Non-Patent Document 3. Relationships can be adopted.

Further, the LUT deriving unit 134 transmits the derived LUT to the inter-channel prediction unit 351 (predicted image generation means; described later) of the color difference predicted image generation unit 135. Details of the operation of the LUT deriving unit 134 will be described later.

The color difference predicted image generation unit 135 predicts a color difference image and generates a color difference predicted image PredC. More specifically, the color difference predicted image generation unit 135 includes an inter-channel prediction unit 351 and an intra-channel prediction unit 352.

Inter-channel prediction unit 351, at the time the color difference prediction is a prediction mode among channels, with reference to the luminance decoded image P _Y, to generate the color difference prediction image PredC performs prediction of the color difference images by inter-channel prediction.

The channel prediction unit 352, at the time the color difference prediction not prediction mode among channels, with reference to the chrominance local decoded image P _'C, to produce a color difference prediction image PredC performs prediction of the color difference images. The prediction of the color difference image by the intra-channel prediction unit 352 is performed by, for example, direction prediction or DC prediction.

Details of the operation of the inter-channel prediction unit 351 will be described later.

The predicted image output unit 136 outputs the luminance predicted image PredY generated by the luminance predicted image generating unit 132 and the color difference predicted image PredC generated by the color difference predicted image generating unit 135 as the predicted image Pred.

In the following, when the color difference processing is described, the color difference U channel will be mainly described. U channel processing and V channel processing are generally the same. Therefore, unless otherwise noted, in the description of U channel processing, if the U suffix is read as V, the V channel processing can be described.

[Correspondence between pixel position in luminance decoded image and pixel position in color difference prediction image]
Next, the correspondence relationship between the pixel position [x _Y , y _Y ] in the luminance decoded image P _Y and the pixel position [x _U , y _U ] in the color difference predicted image PredU will be described using FIG. 6 and FIG. 7. . FIG. 6 is a diagram illustrating a correspondence relationship between luminance and color difference pixel positions in the 4: 2: 0 format. (A) of FIG. 6 shows the pixel position of the luminance decoded image P _Y, in FIG. (B) shows the pixel position of the color difference prediction image PredU to be predicted.

Pixel positions of the luminance decoded image P _Y and the color difference prediction image PredU are both represented by relative coordinates with the origin at the upper left of the block.

Therefore, in the luminance decoded image P _Y shown in FIG. 6A, the pixel position of the pixel SP _Y 1 is [x _Y , y _Y ] = [6, 2]. In the color difference local decoded image PredU shown in FIG. 6B, the pixel position of the pixel SP _U 1 is [x _U , y _U ] = [3, 1].

The correspondence relationship between the pixel position [x _Y , y _Y ] in the luminance decoded image P _Y and the pixel position [x _U , y _U ] in the color difference predicted image PredU is, for example, x _Y = 2x _U , y _Y = 2y _U. It is.

That is, according to this correspondence, the pixel position [x _Y , y _Y ] = [6, 2] in the luminance decoded image P _Y described above is the pixel position [x _U , y _U ] = [ 3, 1].

With reference to FIG. 7, in more detail, the sample position of the color difference pixel, explaining correspondence between the pixel positions in the luminance decoded picture P _Y are as follows. FIGS. 7A to 7B are diagrams illustrating sample positions of three color difference pixels.

FIG. 7A shows the case already described with reference to FIG. That is, as shown in FIG. 7A, the sample position of the color difference pixel may be set at the upper left of the block. In this case, the correspondence relationship between the pixel positions is x _Y = 2x _U and y _Y = 2y _U.

Further, as shown in FIG. 7B, the sample position of the color difference pixel may be set to the left from the center of the block. In this case, the correspondence between the pixel _positions, x _Y = _2x U, a _{y Y} = 2y U +0.5.

Further, as shown in FIG. 7C, the sample position of the color difference pixel may be set at the center of the block. In this case, the correspondence between pixel _{_{_{locations, x Y = 2x U + 0.5}}} , a _{y Y} = 2y U +0.5.

Then, the luminance value corresponding to the value of a certain color difference pixel is derived by filtering the luminance value in the vicinity of the pixel position obtained in accordance with the correspondence relationship shown in (a) to (c) of FIG.

Here, the vicinity of the pixel position is a coordinate obtained when the value of each coordinate of the pixel position obtained from the correspondence relation is rounded up and down. That is, in the example shown in FIG. 7B, [x _Y , y _Y ] and [x _Y , y _Y +1]. Further, in the example shown in FIG. 7 _(c), in _{_{_{[x Y, y Y],}}} [x Y + 1, y Y], [x Y, y Y +1] and _{_{[x Y + 1, y Y}} +1] is there.

In the example shown in FIG. 7A, since x _Y and y _Y are integer values, the luminance value at the pixel position [x _Y , y _Y ] may be used as a luminance value as a sample as it is. Good. Moreover, a smoothing filter is mentioned as an example of this filtering.

In order to obtain “a luminance value corresponding to a certain color difference pixel value” with higher accuracy, it is only necessary to select a sample position that can be predicted with the highest accuracy.

Note that the above relationship is the same for the V channel, so the description thereof is omitted here.

[Color difference channel processing order pattern]
Next, a color difference channel processing order pattern when there is one prediction source channel will be described with reference to FIG. FIG. 8 is a diagram showing a pattern of the color difference channel processing order when there is one prediction source channel. When there is one prediction source channel, the U channel and the V channel can be predicted using the luminance (Y) channel as a base point, as shown in FIG. As shown in FIG. 8, there are the following three prediction patterns.

The first is a pattern indicated by a solid line in FIG. That is, while predicting the U channel from the Y channel, the V channel is predicted from the Y channel. Which of the U channel and the V channel is predicted first can be arbitrarily selected. Further, the U channel and V channel prediction processing may be performed in parallel.

The second is a pattern indicated by a dotted line in FIG. That is, the U channel is first predicted from the Y channel, and then the V channel is predicted from the predicted U channel.

The third is the reverse of the second, and is a pattern indicated by a broken line in FIG. That is, the V channel is first predicted from the Y channel, and then the U channel is predicted from the predicted V channel.

The chrominance channel processing order flag indicating which processing order is used, and the second channel prediction source channel specifier that specifies the prediction source channel of the second prediction channel are the codes in the moving picture encoding device 2. Encoded in the encoding process and transmitted to the video decoding device 1. Then, the inter-channel prediction unit 351 of the video decoding device 1 performs inter-channel prediction according to the color difference channel processing order flag and the second channel prediction source channel specifier.

Note that the case where there are two prediction source channels will be described in a later-described modification.

Moreover, inter-channel prediction unit 351, in accordance with the following equation (1), by performing the inter-channel prediction of the U channel, and generates the color difference prediction image PredU from the luminance decoded image P _Y.

_{_{_{PredU [x U, y U]}}} = LUT U [RecY [x Y, y Y]] ... (1)
Here, the meaning of each symbol in the formula (1) will be described as follows.

[X _Y , y _Y ]: Pixel position in the luminance decoded image P _Y (hereinafter referred to as luminance pixel position)
[X _U , y _U ]: Pixel position of the color difference predicted image PredU corresponding to the luminance pixel position (hereinafter referred to as color difference pixel position)
LUT _U : A function that returns an entry of a U channel LUT (hereinafter simply referred to as LUT)
As a precaution, the inter-channel prediction of the V channel is also described as follows. For the V channel, similarly to the U channel, inter-channel prediction is performed according to the following equation (2).

_{_{_{PredV [x V, y V]}}} = LUT V [RecY [x Y, y Y]] ... (2)
The meaning of each symbol in the formula (2) is the same as that in the formula (1), and the description thereof is omitted.

[Outline of color difference prediction image generation processing]
Next, a schematic flow of color difference predicted image generation processing in the predicted image generation unit 13 will be described with reference to FIG. FIG. 9 is a flowchart illustrating a schematic flow of color difference predicted image generation processing in the predicted image generation unit 13.

When the color difference prediction image generation process is started, the inter-channel prediction determination unit 133 refers to the inter-channel prediction flag and determines whether or not the inter-channel prediction mode is set (S10).

Here, if the determination result shows that the mode is not the inter-channel prediction mode (NO in S10), the intra-channel prediction unit 352 generates the color difference prediction image PredC without depending on the inter-channel prediction (S11), and the process ends.

In contrast, the result of the determination, if it is inter-channel prediction mode (YES at S10), LUT deriving unit 134 derives the LUT with reference to the luminance decoded image _{P Y} (S12). Then, the inter-channel prediction unit 351 generates a color difference prediction image PredC by inter-channel prediction with reference to the LUT derived by the LUT deriving unit 134 (S13). This is the end of the process.

In the color difference prediction image generation process, as an example, it is assumed that a luminance decoded image _PY is generated prior to the process. However, not limited to this, the luminance decoded image P _Y after step 10, only to be generated by the start of step S13.

[Relationship between luminance-color difference distribution and inter-channel prediction]
Here, the relationship between the luminance-color difference distribution and the inter-channel prediction will be described with reference to FIG. 10, FIG. 11, and FIG.

FIG. 10 is a diagram showing an example of an image B in which the luminance-color difference distributions do not overlap and extend over two regions.

The image B shown in FIG. 10 is composed of six pixel areas. The pixel regions R1 to R3 are regions where the luminance value (Y) is low and the color difference value (U) is high. The pixel regions R4 to R6 are regions having a high luminance value (Y) and a low color difference value (U).

In the pixel regions R1 to R3, the color difference value (U) gradually increases from the pixel region R1 to R3. On the other hand, in the pixel regions R4 to R6, the color difference value (U) decreases from the pixel region R4 to R6.

A plot of the luminance (Y) -color difference (U) of the pixels included in such an image B is shown in the graph of FIG.

The pixels included in the pixel regions R1 to R3 are plotted in the group Gr1 in the graph shown in FIG. In this group Gr1, the luminance (Y) of the sample is low and the color difference value (U) is high. Further, in the group Gr1, there is a tendency that the higher the luminance (Y), the higher the color difference value (U).

Further, the pixels included in the pixel regions R4 to R6 are plotted in the group Gr2 in the graph shown in FIG. In the group Gr2, the luminance (Y) of the sample is high and the color difference value (U) is low. Further, in the group Gr2, there is a tendency that the higher the luminance (Y), the higher the color difference value (U).

For this reason, the direction in which the distribution spreads in each of the groups Gr1 and Gr2 and the positional relationship between the groups Gr1 and Gr2 are not suitable for linear approximation.

Image B is not suitable for linear approximation because there are variations in brightness and color difference in the image. Such variations tend to be often seen in images that include boundaries between multiple objects or textures of multiple colors.

In the case of such an image, if an LUT is created by linear approximation, the error tends to increase. In the following, a method for suppressing such an error by deriving a LUT by nonlinearly approximating the above sample will be described.

[LUT derivation process flow]
Next, the flow of LUT derivation processing by the LUT derivation unit 134 will be described using FIG. FIG. 12 is a flowchart showing an example of the flow of LUT derivation processing by the LUT derivation unit 134.

As shown in FIG. 12, when the LUT derivation process is started, first, the LUT derivation unit 134 initializes the LUT (S100). To initialize the LUT is to make the LUT unregistered.

Subsequently, the LUT deriving unit 134 enters a loop LP11 of registration processing for each luminance pixel adjacent to the target partition (S101).

In this loop LP11, the following registration process is performed for the luminance pixel to be processed. First, the LUT deriving unit 134 obtains the luminance value = n at the luminance pixel to be processed and the color difference value = m at the color difference pixel position corresponding to the luminance pixel position (S102).

Next, the LUT deriving unit 134 determines whether or not a color difference value is registered in the LUT [n] (S103).

If the color difference value is not registered in the LUT [n] as a result of the determination (NO in S103), the LUT deriving unit 134 registers the color difference value m acquired in Step S102 as it is in the LUT [n]. The process returns to the beginning of the loop LP11 (S106).

On the other hand, if the result of determination is that LUT [n] has already been registered, the LUT derivation unit 134 calculates (m + LUT [n] +1) / 2, thereby obtaining the acquired color difference value m and the registered color difference value. And the average value is calculated. The LUT deriving unit 134 substitutes the average value thus calculated for the color difference value m (S104). Subsequently, the LUT deriving unit 134 registers the color difference value m into which the average value is substituted in step S104 in the LUT [n] (S105), and returns to the top of the loop LP11 (S106).

When the registration process for each luminance pixel ends, the loop LP11 ends.

After the loop LP11 is completed, the loop LP12 for interpolation processing of unregistered LUT entries is entered (S107).

In this loop LP12, the following interpolation processing is performed for unregistered entries from n = 0 to n = 255. That is, in the loop LP12, the entry of the LUT that has no sample is made up by interpolation. First, the LUT deriving unit 134 determines whether or not LUT [n] is unregistered (S108).

Here, when LUT [n] is not registered, that is, when LUT [n] is registered (NO in S108), the interpolation process continues and returns to the top of the loop LP12 (S111).

On the other hand, when LUT [n] is not registered (YES in S108), the LUT deriving unit 134 searches for the latest registered entries before and after n as shown in FIG. 13 (S109). Specifically, the LUT deriving unit 134 searches for the latest registered entry before and after n as follows.

First, the LUT deriving unit 134 searches for a registered entry for n forward, that is, nL smaller than n, with n as a base point. That is, with reference to FIG. 13, a search is made for a sample point Smpl1 in the n forward direction.

Also, the LUT deriving unit 134 searches for an entry registered for n backward, that is, nR larger than n, with n as a base point. That is, with reference to FIG. 13, a search is made for a sample point Smpl2 in the backward direction of n.

Next, the LUT derivation unit 134 registers a value obtained by linear interpolation between the LUT [nL] and the LUT [nR] in the LUT [n] (S110).

Now, with reference to FIG. 13, in step S110, an interpolation process is performed to connect the nearest sample point Smpl1 ahead of n and the sample point Smpl2 behind n by a straight line L1. Yes. The value at n of the straight line L1 is registered in LUT [n].

If only one of nL and nR is detected as a result of the search in step S109, the value of the registered entry of the detected one is registered in LUT [n].

Further, the interpolation process is continued and the process returns to the top of the loop LP12 (S111).

In the loop LP12, when the interpolation processing is completed for all entries from n = 0 to n = 255, the loop LP12 ends. Then, the LUT derivation process ends.

In step S104, if an entry has already been registered in the registration process, the average value of the acquired color difference value m and the registered color difference value is registered as a new entry. This is due to the following reason.

That is, in step S104, if the acquired color difference value m is overwritten with the registered entry value, if the acquired color difference value m is statistical noise, the noise is directly registered in the entry. It becomes. On the other hand, such noise can be reduced by acquiring an average value of the acquired color difference value m and the registered color difference value. The average value acquired here is a weighted average in the order of registration. Further, the present invention is not limited to this, and in step S104, a weighted average in which weights are arbitrarily set may be acquired.

Also, after the LUT is derived, the moving average process may be performed on the entire table. By performing the moving average process on the entire table, a rapid change in color difference can be suppressed.

Here, the LUT value derived by performing the above processing on the image B shown in FIG. 10 is as shown in FIG. That is, FIG. 14 shows a graph of the LUT values derived from the image B, that is, entries.

A graph L11 shown in FIG. 14 shows LUT entry values derived from the image B. The graph L11 is a line passing through each sample point. In the graph L11, the entry between each sample point is created by linear interpolation. For example, a straight line is connected between the leftmost sample of the group Gr1 and the leftmost sample of the group Gr2.

Note that, for an entry having a plurality of sample points, an average value is taken for sample points where n values overlap, and this value is used as an actual sample point for inter-channel prediction.

In the above, an unregistered entry, that is, an entry between each sample point, is created by linear interpolation. However, the present invention is not limited to this. For example, the entry may be created by cubic interpolation. Thereby, the prediction accuracy of the table can be improved.

[Flow of color difference prediction image generation processing by inter-channel prediction]
Next, a flow of color difference prediction image generation processing by inter-channel prediction in the inter-channel prediction unit 351 will be described with reference to FIG. FIG. 15 is a flowchart illustrating an example of a flow of color difference prediction image generation processing by inter-channel prediction in the inter-channel prediction unit 351.

When the inter-channel prediction process is started, the inter-channel prediction unit 351 sets the color difference A and color difference A prediction source channels and the color difference B and color difference B prediction source channels according to the intra prediction parameter PP (S120). .

The contents set by the inter-channel prediction unit 351 are specifically as follows. That is, the inter-channel prediction unit 351 sets the color difference A and the color difference B according to the color difference channel processing order flag. In the following, the color difference B is processed after the color difference A. For example, the inter-channel prediction unit 351 may set the color difference A as “U channel” and the color difference B as “V channel”.

Also, the inter-channel prediction unit 351 sets “Y channel” as the color difference A prediction source channel.

Then, the inter-channel prediction unit 351 sets a color difference B prediction source channel according to the second channel prediction source channel specifier. For example, the inter-channel prediction unit 351 may set the prediction source channel of the U channel with the color difference A as “Y channel” and set the prediction source channel of the V channel with the color difference B as “Y channel and U channel”. Good.

Next, the inter-channel prediction unit 351 selects a color difference A prediction mode (S121), and generates a color difference prediction image by inter-channel prediction for the color difference A according to the setting in step S120 (S122).

Next, the inter-channel prediction unit 351 selects a color difference B prediction mode (S123), and generates a color difference prediction image by inter-channel prediction for the color difference B according to the setting in step S120 (S124). The process ends as described above.

(Action / Effect)
As described above, the video decoding device 1 generates the prediction image Pred for each of the luminance (Y) channel and the color difference (U, V) channel, and adds the prediction residual D to the generated prediction image Pred. The prediction image generation unit 13 and the adder 14 that decode the luminance (Y) channel and generate the luminance decoded image _PY for the target partition, and the target partition An LUT deriving unit 134 for deriving a non-linear correlation between a decoded luminance (Y) channel and a color difference (U, V) channel to be decoded as an LUT with reference to a local decoded image P ′ located in the vicinity. If, for the target partition, according to the LUT, inter-channel pre from luminance decoded image _{P Y,} to generate the color difference prediction image PredC It includes a section 351, a.

According to the above configuration, the luminance component of each pixel included in the locally decoded image P ′ varies, and when prediction between channels is performed according to a linear correlation, the prediction accuracy is reduced. Even if it exists, there exists an effect that possibility that a higher prediction precision can be improved can be improved.

(Modification)
Below, the preferable modification of the moving image decoding apparatus 1 is demonstrated.

[Modification of pattern of color difference channel processing order]
FIG. 16 and FIG. 17 are diagrams showing the pattern of the color difference channel processing order when there are two prediction source channels for the second channel to be predicted.

16, (1) the U channel may be predicted using the Y channel as a base point, and then (2) the V channel may be predicted based on a combination of the Y channel and the U channel. Also, as shown in FIG. 17, (1) the V channel may be predicted using the Y channel as a base point, and then (2) the U channel may be predicted based on a combination of the Y channel and the V channel.

In this way, in order to make the prediction source channel of the second channel to be predicted two, for example, the LUT may be extended to two dimensions. Even when the LUT is extended to two dimensions, the same technique as the one-dimensional LUT derivation can be adopted, and a known technique used when creating a two-dimensional LUT can be adopted. Is possible.

In this case, for example, in the case of the processing order shown in FIG. 16, the LUT deriving unit 134 derives the LUT as follows. That is, first, a primary table is derived for the luminance value Y and the color difference value U of the sample points as described above. Subsequently, the color difference value V of the sample point is registered in the table in association with the luminance Y and the color difference value U of the sample point. The same applies to the processing order shown in FIG.

In this way, the LUT can be configured to extend to more than two dimensions. Another example of a configuration that expands the LUT to two or more dimensions includes a configuration that uses luminance values Y in a plurality of pixels. More specifically, in the LUT entry, the color difference value U or the color difference value V may be looked up by a combination of adjacent luminance values.

Since the color difference can be predicted from a plurality of channels or from a plurality of pixel values of the same channel by using a two-dimensional or more LUT as described above, the accuracy of the predicted image can be improved.

[Modified example of data structure of intra prediction parameter]
In the above, the intra prediction parameter PP is illustratively configured to include the inter-channel prediction flag, the color difference channel processing order flag, and the second channel prediction source channel specifier. However, the intra prediction parameter PP may be configured to include an inter-channel prediction index indicating a combination of processing order and prediction source channels as follows.

That is, the index index = 0 to 5 is assigned to the combination of the processing order and the prediction source channel as follows.

index = 0: Order U → V, V prediction source channel = Y
index = 1: Order U → V, V prediction source channel = U
index = 2: Order U → V, V prediction source channel = Y, U
index = 3: Order V → U, V prediction source channel = Y
index = 4: Order V → U, V prediction source channel = V
index = 5: Order V → U, V prediction source channel = Y, V
The above-mentioned inter-channel prediction index is encoded and transmitted to the video decoding device 1 in the encoding process in the video encoding device 2. Then, the inter-channel prediction unit 351 of the video decoding device 1 performs inter-channel prediction according to the inter-channel prediction index transmitted from the video encoding device 2.

In the above, the intra prediction parameter PP indicating the combination of the processing order and the prediction source channel is included in the intra prediction parameter PP, but the present invention is not limited to this. Various modifications are possible depending on the unit of LUT derivation. For example, information indicating the combination of the processing order and the prediction source channel may be stored in the header of the processing unit other than the slice header, and the processing order or the like may be changed according to the processing unit. For example, information indicating a combination of processing order and prediction source channel may be stored in a sequence header (SPS: Sequence Parameter Set) or a picture header (PPS: Picture Parameter Set).

Also, the processing order or the like may be changed in units of processing smaller than slices, for example, in units of LCUs. In this way, the LUT derivation unit can be changed within a range in which there is a correlation between luminance and color difference.

Also, the information indicating the combination of the processing order and the prediction source channel may be encoded in different processing units. For example, the information indicating the processing order may be encoded in LCU units, and the information indicating the combination of prediction source channels may be encoded in PU units.

[Modification of LUT structure]
In the above, the number of entries in the LUT is 256 (n = 0 to 255). This is an example when the pixel value is expressed by 8 bits. However, not limited to this, the number of entries may be two ^{to eight} smaller number. For example, when the accuracy of the LUT is not so required, the number of entries can be reduced to 128 (n = 0 to 127) as follows.

That is, PredU [x _U , y _U ] is calculated as follows.

When the value of RecY [x _Y , y _Y ] is an even number:
PredU [x _U , y _U ] = LUT [RecY [x _Y , y _Y ] / 2]
When the value of RecY [x _Y , y _Y ] is an odd number:
PredU [x _U , y _U ] =
_{_{(LUT [RecY [x Y,}} y Y] / 2] + LUT [RecY [x Y, y Y] / 2 + 1]) / 2
As a result, the storage area used in LUT creation can be reduced without significantly reducing the accuracy of the LUT.

The same applies to PredV [x _V , y _V ].

[Modified example of flow of LUT derivation process]
Next, a modified example of the flow of the LUT derivation process by the LUT derivation unit 134 will be described with reference to FIG. In the flowchart shown in FIG. 12, an example in which weighted average is performed in the order of registration is shown. However, in the following, a modification example in which all pixels are treated uniformly without weighting and an average value is obtained will be described. explain.

FIG. 18 is a flowchart showing a modified example of the flow of the LUT derivation process by the LUT derivation unit 134.

As shown in FIG. 18, when the LUT derivation process is started, first, the LUT derivation unit 134 initializes the LUT (S130).

In the initialization of the LUT in step S130, the LUT [n] is set to an unregistered state, the cumulative addition value sum [n] = 0 of the color difference value at the luminance value n, and the number of samples of the color difference value at the luminance value n are counted. [N] = 0 (n = 0 to 255 for LUT, sum, count) is executed. This initialization is performed for two channels of the U channel and the V channel.

Subsequently, the LUT deriving unit 134 enters a registration processing loop LP11A for each luminance pixel adjacent to the target partition (S131).

In this loop LP11A, the following registration process is performed for the luminance pixel to be processed. First, the LUT deriving unit 134 obtains the luminance value = n at the luminance pixel to be processed and the color difference value = m at the sample position corresponding to the pixel position of the luminance pixel (S132).

Next, the LUT deriving unit 134 substitutes sum [n] + m for sum [n] and substitutes count [n] +1 for count [n] (S133).

Then, the process returns to the top of the loop LP11 (S134). When the registration process for each luminance pixel ends, the loop LP11A ends.

After the loop LP11A ends, the loop LP12A for interpolation processing of unregistered LUT entries is entered (S135).

In this loop LP12A, the following interpolation processing is performed for the luminance pixels to be processed with entries from n = 0 to n = 255. First, the LUT deriving unit 134 determines whether count [n] is greater than 0 (S136).

Here, if count [n] is 0 or less (that is, count [n] = 0), the LUT deriving unit 134 returns to the top of the loop LP12A (S138), and continues processing for the next entry.

On the other hand, when count [n] is larger than 0, the LUT deriving unit 134 substitutes sum [n] / count [n] into LUT [n] (S137). That is, in step S137, the arithmetic average of the color difference values at the luminance value n is acquired and substituted into LUT [n].

Thereafter, the process returns to the top of the loop LP12A (S138), and the process is continued for the next entry.

In the loop LP12A, when the interpolation processing is completed for all entries from n = 0 to n = 255, the loop LP12A ends. Then, the LUT derivation process ends.

Next, the LUT deriving unit 134 performs interpolation processing for unregistered entries (S139). Since the process in step S139 is the same as the process in steps S107 to S111 (loop LP13) shown in FIG. 12, the description thereof is omitted here. Thereafter, the LUT derivation process ends.

[Modifications of Pixels Referenced for LUT Derivation]
In the above, the configuration in which the pixels adjacent to the target partition are referred to, but the configuration is not limited thereto. For example, a configuration in which two adjacent columns of pixels are referred to may be used. Further, for example, a configuration may be adopted in which a partition adjacent to the upper side of the target partition or a partition adjacent to the left side of the target partition is referred to. In this way, it is possible to appropriately refer to pixels in a region where there is a correlation between luminance and color difference.

[Modified example of interpolation processing of LUT unregistered entry]
Next, a modified example of the interpolation process for the LUT unregistered entry will be described with reference to FIG. FIG. 19 is a diagram illustrating an example of the derived LUT.

Hereinafter, a modified example in which interpolation processing is performed only at the time of reference will be described. That is, as shown in FIG. 19, the LUT may hold only a set of samples until it is referred to.

In the example shown in FIG. 19, only 16 sets of sample values including the combination of luminance Y: 40 and color difference value (U): 160 are registered as entries (in the figure, the first 4 sets). Only shows).

That is, in this modification, the LUT deriving unit 134 does not interpolate unregistered entries other than the entries for 16 sets of samples during the LUT deriving process.

On the other hand, in the color difference prediction image generation process, when the unregistered entry is referred to, the LUT deriving unit 134 derives the referenced unregistered entry.

When an unregistered entry is referred to, the LUT deriving unit 134 derives the referenced unregistered entry n (nL <n <nR) by linear interpolation according to the following equation (3) as an example.

LUT [n] = (LUT [nL] × (n−nL) + LUT [nR] × (nR−n) / (nR−nL)) (3)
Here, nL and nR are the numbers of the most recently registered entries before and after n, respectively, as described with reference to FIG.

Hereinafter, the application of Expression (3) will be described in detail for the LUT shown in FIG. When LUT [40] = 160 and LUT [44] = 180 are registered, the LUT deriving unit 134 derives an unregistered entry n = 43 referred to as follows.

That is, from equation (3),
LUT [n] = (LUT [40] × 3 + LUT [44] × 1) / 4
It is.

According to this modified example, when an unregistered entry is referred to, calculation for interpolation processing occurs. On the other hand, when an unregistered entry remains unreferenced, interpolation processing for LUT derivation is performed. You do n’t have to. Further, the memory area may be prepared for the number of registered entries. That is, when such a configuration is adopted, only a memory area proportional to the number of samples is consumed in creating the LUT table.

In this modification, the interpolation processing by linear interpolation has been described, but interpolation processing by cubic interpolation may be adopted.

[Variation without LUT]
Next, an example in which a function is used instead of the LUT will be described with reference to FIG. That is, for each encoding target block, a function U = f _U (Y) that approximates a set of samples is derived, and only a parameter that specifies f _U is stored in the memory.

The color difference prediction image PredU is obtained using f _U (Y), that is, according to the following equation (4):
_{_{PredU = f U [RecY [x}} Y, y Y]] ... (4)
Is derived. It shows an example of _{f U} in FIG. 20. As shown in FIG. 20, f _U may be derived as a curve that passes through each sample point. Incidentally, f _U, between adjacent samples may be derived independently. Incidentally, speaking from another perspective, the entire set of samples, can also be referred to as "parameters for specifying the f _U".

[Combination with linear transformation]
Based on the determination of the least square method error value, the inter-channel prediction based on the conventional linear transformation and the inter-channel prediction based on the LUT may be switched. As a result, the higher accuracy may be used for inter-channel prediction. For example, when the image is configured with a single color, or when most of the image is configured with a monotone gradation region, a configuration using linear conversion can be used.

[Moving picture encoding device]
Below, the structure of the moving image encoder 2 which concerns on this embodiment is demonstrated with reference to FIG. In addition, the same code | symbol is attached | subjected about the same member as the already demonstrated member, and the description is abbreviate | omitted.

(Outline of video encoding device)
Generally speaking, the moving image encoding device 2 is a device that generates and outputs encoded data # 1 by encoding the input image # 10.

(Configuration of video encoding device)
FIG. 21 is a functional block diagram showing the configuration of the moving image encoding device 2. As illustrated in FIG. 21, the moving image encoding device 2 includes an encoding setting unit 21, an inverse quantization / inverse conversion unit 22, a predicted image generation unit 23, an adder 24, a frame memory 25, a subtractor 26, a conversion / A quantization unit 27 and a variable length coding unit 28 are provided.

The encoding setting unit 21 generates image data related to encoding and various setting information based on the input image # 10.

Specifically, the encoding setting unit 21 generates the next image data and setting information.

First, the encoding setting unit 21 generates the leaf CU image # 100 for the target leaf CU by sequentially dividing the input image # 10 into slice units and LCU units.

Also, the encoding setting unit 21 generates header setting information H ′ based on the result of the division process. The header information H ′ includes (1) information about the size, shape and position of the LCU belonging to the target slice, and (2) the size, shape and shape of the leaf CU belonging to each LCU. It includes CU information CU ′ about the position.

Furthermore, the encoding setting unit 21 refers to the leaf CU image # 100 and the CU information CU 'to generate PU setting information PUI'. The PU setting information PUI 'includes information on all combinations of (1) possible division patterns for each partition of the target leaf CU and (2) prediction modes that can be assigned to each partition.

The encoding setting unit 21 supplies the leaf CU image # 100 to the subtractor 26.

Also, the encoding setting unit 21 supplies the header information H ′ to the variable length encoding unit 28. Also, the encoding setting unit 21 supplies the PU setting information PUI ′ to the predicted image generation unit 23.

The inverse quantization / inverse transform unit 22 performs inverse quantization and inverse DCT transform (Inverse Discrete Cosine Transform) on the quantization prediction residual for each block supplied from the transform / quantization unit 27, Restore the prediction residual for each block. Further, the inverse quantization / inverse transform unit 22 integrates the prediction residual for each block according to the division pattern specified by the TU partition information, and generates a prediction residual D for the target leaf CU. The inverse quantization / inverse transform unit 22 supplies the prediction residual D for the generated target leaf CU to the adder 24.

The predicted image generation unit 23 refers to the locally decoded image P ′ recorded in the frame memory 25 and the PU setting information PUI ′, and generates a predicted image Pred for the target leaf CU. When performing inter-channel prediction in the prediction of chrominance, the prediction image generation unit 23 refers to the luminance decoded image P _Y. The predicted image generation unit 23 sets the prediction parameter obtained by the predicted image generation process in the PU setting information PUI ′, and transfers the set PU setting information PUI ′ to the variable length encoding unit 28. Note that the predicted image generation process performed by the predicted image generation unit 23 is the same as that performed by the predicted image generation unit 13 included in the video decoding device 1, and thus description thereof is omitted here.

The adder 24 adds the predicted image Pred supplied from the predicted image generation unit 23 and the prediction residual D supplied from the inverse quantization / inverse transform unit 22 to add a decoded image P for the target leaf CU. Is generated.

Decoded decoded image P is sequentially recorded in the frame memory 25. In the frame memory 25, decoded images corresponding to all the LCUs decoded before the target LCU (for example, all the LCUs preceding in the raster scan order) at the time of decoding the target LCU are recorded. .

The subtracter 26 generates a prediction residual D for the target leaf CU by subtracting the prediction image Pred from the leaf CU image # 100. The subtractor 26 supplies the generated prediction residual D to the transform / quantization unit 27.

The transform / quantization unit 27 performs a DCT transform (Discrete Cosine Transform) and quantization on the prediction residual D to generate a quantized prediction residual.

Specifically, the transform / quantization unit 27 refers to the leaf CU image # 100 and the CU information CU ', and determines the division pattern of the target leaf CU into one or a plurality of blocks. Further, according to the determined division pattern, the prediction residual D is divided into prediction residuals for each block.

The transform / quantization unit 27 generates a prediction residual in the frequency domain by performing DCT transform (DiscretecreCosine Transform) on the prediction residual for each block, and then quantizes the prediction residual in the frequency domain. Thus, a quantized prediction residual for each block is generated.

Also, the transform / quantization unit 27 relates to the generated quantization prediction residual for each block, TU partition information that specifies the partition pattern of the target leaf CU, and all possible partition patterns for each block of the target leaf CU. TU setting information TUI ′ including the information is generated.

The transform / quantization unit 27 supplies the generated TU setting information TUI 'to the inverse quantization / inverse transform unit 22 and the variable length coding unit 28.

The variable length encoding unit 28 generates and outputs encoded data # 1 based on the TU setting information TUI ′, the PU setting information PUI ′, and the header information H ′.

(Color difference channel processing order and prediction source channel encoding process flow)
Next, the color difference channel processing order and the flow of the encoding process of the prediction source channel in the moving image encoding device 2 will be described with reference to FIG. FIG. 22 is a flowchart illustrating an example of the flow of the color difference channel processing order and the prediction source channel encoding process in the moving image encoding apparatus 2.

When the encoding process is started, the predicted image generation unit 23 selects a luminance prediction mode in the luminance predicted image generation process for the target partition (S200).

Next, the predicted image generation unit 23 generates a brightness predicted image based on the selected prediction mode (S201).

Next, the predicted image generation unit 23 enters the color difference predicted image creation processing loop LP21 for each pattern in the color difference channel processing order.

In this loop LP21, the following color difference prediction image creation processing is performed for the target partition.

First, when the inter-channel prediction process is started, the predicted image generation unit 23 performs the color difference A and the prediction source channel of the color difference A according to the pattern of each processing order described using FIG. 8, FIG. 16, and FIG. The color difference B and the prediction source channel of the color difference B are set (S203).

That is, in step S203, the inter-channel prediction unit 351 sets one of the U channel and the V channel to the color difference A for which channel estimation is performed first, and the other is set to the color difference B for which estimation is performed after the color difference A. Further, the inter-channel prediction unit 351 sets one or a plurality of prediction source channels for the color difference A and the color difference B, respectively.

Next, the predicted image generation unit 23 selects a prediction mode for the color difference A (S204), and generates a color difference predicted image for the color difference A by inter-channel prediction according to the setting in step S120 (S205).

Next, the predicted image generation unit 23 selects a prediction mode for the color difference B (S206), and generates a color difference predicted image by inter-channel prediction for the color difference B according to the setting in step S120 (S207). Furthermore, the color difference predicted image creation process by inter-channel prediction continues, and the process returns to the top of the loop LP21 (S208).

When the color difference prediction image creation processing for each pattern in the processing order of the color difference channel is completed, the loop LP21 ends.

Next, as a result of performing the inter-channel prediction processing in the loop LP21, the predicted image generation unit 23 selects a color difference channel processing order and a combination of prediction source channels that are most suitable for encoding (S209).

The prediction image generation unit 23 performs encoding by including the inter-channel prediction flag, the color difference channel processing order flag, and the second channel prediction source channel specifier in the intra prediction parameter PP (S210).

Note that the predicted image generation unit 23 may encode the inter-channel prediction index indicating the combination of the processing order and the prediction source channel in step S210.
[2] Embodiment 2
Another embodiment of the present invention will be described below with reference to FIGS. For convenience of explanation, members having the same functions as those in the drawings described in the first embodiment are denoted by the same reference numerals and description thereof is omitted.

[Image decoding device]
(About configuration)
First, the configuration of the predicted image generation unit 13A according to the present embodiment will be described with reference to FIG. FIG. 23 is a functional block diagram illustrating another example of the configuration of the predicted image generation unit 13.

The predicted image generation unit 13A initializes the LUT in units of LCU, and updates the LUT in each block (target partition) in the same LCU.

21, in the predicted image generation unit 13A, the inter-channel prediction unit 351 is changed to an inter-channel prediction unit 351A.

Further, unlike the predicted image generation unit 13 shown in FIG. 1, the predicted image generation unit 13A is provided with an LUT derivation unit (correlation derivation means) 16 separately from the predicted image generation unit 13A.

Hereafter, this difference will be explained as follows.

First, unlike the LUT derivation unit 134 that newly derives an LUT for each target partition, the LUT derivation unit 16 newly derives an LUT for each target LCU. The LUT deriving unit 16 updates the LUT in units of partitions. The difference between the LUT deriving unit 16 and the LUT deriving unit 134 is a unit for deriving this LUT.

Other than that, since the same configuration can be adopted, detailed description thereof is omitted here.

The inter-channel prediction unit 351A is changed to refer to the LUT deriving unit 16.

(Outline of color difference prediction image generation processing)
Next, a schematic flow of color difference predicted image generation processing in the predicted image generation unit 13A will be described with reference to FIG. FIG. 24 is a flowchart illustrating a schematic flow of color difference predicted image generation processing in the predicted image generation unit 13A.

When the color difference prediction image generation process is started, the LUT deriving unit 16 determines whether or not the target partition is a block that is first processed by the LCU (S30). When the target partition is a block processed first by the LCU (YES in S30), the LUT is initialized (S31).

In the initialization of the LUT in step S31, the LUT [n] = 128 and the sample presence flag LUT_F [n] = 0 are substituted (n = 0 to 255 for LUT and LUT_F). This initialization is performed for two channels of the U channel and the V channel. In the sample presence flag LUT_F, LUT_F [n] = 1 indicates that there is a sample, and LUT_F [n] = 0 indicates that there is no sample.

Next, the LUT deriving unit 16 updates the LUT of each channel while referring to the local decoded image P ′ (S32). Details of the LUT update processing will be described later.

If the target partition is not the first block processed by the LCU (NO in S30), the LUT deriving unit 16 executes the LUT update process without initializing the LUT (S32).

Next, the inter-channel prediction determination unit 133 refers to the inter-channel prediction flag and determines whether or not the inter-channel prediction mode is set (S33).

Here, if it is determined that the mode is not the inter-channel prediction mode (NO in S33), the intra-channel prediction unit 352 generates the color difference prediction image PredC without using the inter-channel prediction (S36), and the process ends.

On the other hand, if it is determined that the mode is the inter-channel prediction mode (YES in S33), the inter-channel prediction unit 351A refers to the LUT updated by the LUT deriving unit 16, and the color difference prediction image PredC by the inter-channel prediction Is generated (S35). This is the end of the process.

(LUT derivation process flow)
Next, the flow of LUT update processing by the LUT deriving unit 16 will be described with reference to FIG. FIG. 25 is a flowchart showing an example of the flow of LUT update processing by the LUT deriving unit 16.

As shown in FIG. 25, when the LUT update process is started, the LUT deriving unit 16 enters a registration process loop LP41 for each luminance pixel adjacent to the target partition (S400).

In this loop LP41, the following update process is performed for the luminance pixel to be processed. First, the LUT deriving unit 16 obtains the luminance value = n at the luminance pixel to be processed and the color difference value = m at the color difference pixel position corresponding to the luminance pixel position (S401).

Next, the LUT deriving unit 16 determines whether or not a color difference value is registered in the LUT [n] (S402). In other words, the LUT deriving unit 16 determines that the color difference value is registered if LUT_F [n] is 1 (with a sample), whereas if LUT_F [n] is 0 (no sample), the color difference is determined. It is determined that the value is not registered.

If the color difference value is not registered in LUT [n] as a result of the determination (NO in S402), the LUT derivation unit 134 registers the color difference value m acquired in Step S401 as it is in LUT [n]. At the same time, 1 (with sample) is substituted into LUT_F [n] (S404), and the process returns to the top of the loop LP11 (S405).

On the other hand, if the result of determination is that LUT [n] has already been registered, the LUT derivation unit 16 calculates (m + LUT [n] +1) / 2, thereby obtaining the acquired color difference value m and the registered color difference value. And the average value is calculated. The LUT deriving unit 16 substitutes the average value thus calculated for the color difference value m (S403). Subsequently, the LUT deriving unit 16 registers the color difference value m into which the average value is substituted in step S403 in LUT [n], and substitutes 1 (with sample) into LUT_F [n] (S404). Return to the top of LP41 (S405).

When the update process for each luminance pixel ends, the loop LP41 ends.

After the loop LP41 is completed, the process enters the loop LP42 for interpolation processing of unregistered LUT entries (S406).

In this loop LP42, the following interpolation processing is performed for entries from n = 0 to n = 255. First, the LUT deriving unit 134 determines whether or not LUT_F [n] is 0 (no sample) (S407).

The LUT itself is retained while processing the same LCU. In addition, in the block processed first by the LCU, all entries are registered by interpolation. Therefore, here, instead of confirming whether or not LUT [n] has been registered, it is confirmed whether or not a sample exists. An entry for which no sample exists is again subject to interpolation.

If LUT_F [n] is not 0 (no sample), that is, if LUT [n] has already been registered (NO in S407), the interpolation process continues and returns to the top of the loop LP42 (S410). .

On the other hand, when LUT_F [n] is 0 (no sample) (YES in S407), the LUT deriving unit 16 searches for the most recent sample entry before and after n (S408).

Here, in the target partition, when a sample point having a value closer to the sample point of interpolation in the immediately preceding target partition is not registered as an entry in the LUT, interpolation as shown in FIG. 13 is performed.

On the other hand, when a sample point having a value closer to the interpolation sample point in the immediately preceding target partition is registered in the LUT as an entry in the target partition, re-interpolation as shown in FIG. 26 is performed.

In the example shown in FIG. 26, the sample points Smpl1 and Smpl2 are registered in the immediately preceding target partition and the interpolation process is performed, but the sample point Smpl3 is newly registered in the target partition.

First, the LUT deriving unit 16 searches for entries with the latest sample before and after n.

First, the LUT deriving unit 16 searches for registered entries for n forwards, that is, nL smaller than n, with n as a base point. That is, here, the sample point Smpl1 shown in FIG. 26 is searched.

Also, the LUT deriving unit 16 searches for an entry registered for n backward, that is, nR larger than n, with n as a base point. Here, the sample point Smpl2 shown in FIG. 26 has been searched for in the target partition up to immediately before, but the sample point Smpl3 is searched for when the sample point Smpl3 is registered in the target partition.

Next, the LUT deriving unit 16 registers a value obtained by linear interpolation between the LUT [nL] and the LUT [nR] in the LUT [n] (S409).

If only one of nL and nR is detected as a result of the search in step S408, the value of the registered entry of the detected one is registered in LUT [n].

Further, the interpolation process is continued and the process returns to the top of the loop LP42 (S410).

In the loop LP42, when the interpolation processing is completed for all entries from n = 0 to n = 255, the loop LP42 ends. Then, the LUT derivation process ends.

In step S403, the LUT deriving unit 16 calculates the average value of the acquired color difference value m and the registered color difference value by calculating (m + LUT [n] +1) / 2. Not limited to. For example, a weighted average of 1: 3 may be calculated for the acquired color difference value m and the registered color difference value.

(effect)
As described above, the predicted image generation unit 13A is configured to initialize the LUT in units of LCUs and update the LUT in each block (target partition) in the same LCU.

Also, the LUT tends to increase in accuracy as the number of samples used for derivation increases. However, if the sample is acquired too widely, the correlation may be lost or it may become extremely small. Here, the LCU is a range that is wider than the target partition and is assumed to have a correlation.

By deriving the LUT from the correlation of sample points in such an LCU, more sample points can be acquired in a wider range than the target partition, and the prediction accuracy of inter-channel prediction can be improved.
[Application example]
The above-described moving image encoding device 2 and moving image decoding device 1 can be used by being mounted on various devices that perform transmission, reception, recording, and reproduction of moving images. The moving image may be a natural moving image captured by a camera or the like, or may be an artificial moving image (including CG and GUI) generated by a computer or the like.

First, it will be described with reference to FIG. 30 that the above-described moving image encoding device 2 and moving image decoding device 1 can be used for transmission and reception of moving images.

30 (a) is a block diagram illustrating a configuration of a transmission device PROD_A in which the moving image encoding device 2 is mounted. As illustrated in (a) of FIG. 30, the transmission device PROD_A modulates a carrier wave with an encoding unit PROD_A1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_A1. Thus, a modulation unit PROD_A2 that obtains a modulation signal and a transmission unit PROD_A3 that transmits the modulation signal obtained by the modulation unit PROD_A2 are provided. The moving image encoding apparatus 2 described above is used as the encoding unit PROD_A1.

The transmission device PROD_A is a camera PROD_A4 that captures a moving image, a recording medium PROD_A5 that records the moving image, an input terminal PROD_A6 that inputs the moving image from the outside, as a supply source of the moving image input to the encoding unit PROD_A1. An image processing unit A7 that generates or processes an image may be further provided. FIG. 30A illustrates a configuration in which the transmission apparatus PROD_A includes all of these, but a part of the configuration may be omitted.

The recording medium PROD_A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium PROD_A5 according to the recording encoding method may be interposed between the recording medium PROD_A5 and the encoding unit PROD_A1.

(B) of FIG. 30 is a block diagram illustrating a configuration of the receiving device PROD_B in which the moving image decoding device 1 is mounted. As illustrated in (b) of FIG. 30, the receiving device PROD_B includes a receiving unit PROD_B1 that receives a modulated signal, a demodulating unit PROD_B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit PROD_B1, and a demodulator. A decoding unit PROD_B3 that obtains a moving image by decoding the encoded data obtained by the unit PROD_B2. The moving picture decoding apparatus 1 described above is used as the decoding unit PROD_B3.

The receiving device PROD_B has a display PROD_B4 for displaying a moving image, a recording medium PROD_B5 for recording the moving image, and an output terminal for outputting the moving image to the outside as a supply destination of the moving image output by the decoding unit PROD_B3. PROD_B6 may be further provided. FIG. 30B illustrates a configuration in which the reception device PROD_B includes all of these, but a part of the configuration may be omitted.

The recording medium PROD_B5 may be used for recording a non-encoded moving image, or may be encoded using a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) for encoding the moving image acquired from the decoding unit PROD_B3 according to the recording encoding method may be interposed between the decoding unit PROD_B3 and the recording medium PROD_B5.

Note that the transmission medium for transmitting the modulation signal may be wireless or wired. Further, the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.

For example, a terrestrial digital broadcast broadcasting station (broadcasting equipment or the like) / receiving station (such as a television receiver) is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by wireless broadcasting. Further, a broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) of cable television broadcasting is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by cable broadcasting.

Also, a server (workstation etc.) / Client (television receiver, personal computer, smart phone etc.) such as VOD (Video On Demand) service and video sharing service using the Internet is a transmitting device for transmitting and receiving modulated signals by communication. This is an example of PROD_A / reception device PROD_B (usually, either a wireless or wired transmission medium is used in a LAN, and a wired transmission medium is used in a WAN). Here, the personal computer includes a desktop PC, a laptop PC, and a tablet PC. The smartphone also includes a multi-function mobile phone terminal.

In addition to the function of decoding the encoded data downloaded from the server and displaying it on the display, the video sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device PROD_A and the reception device PROD_B.

Next, it will be described with reference to FIG. 31 that the above-described moving picture encoding apparatus 2 and moving picture decoding apparatus 1 can be used for recording and reproduction of moving pictures.

FIG. 31 (a) is a block diagram showing a configuration of a recording apparatus PROD_C in which the above-described moving picture encoding apparatus 2 is mounted. As shown in FIG. 31 (a), the recording device PROD_C has an encoding unit PROD_C1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_C1 on the recording medium PROD_M. A writing unit PROD_C2 for writing. The moving image encoding apparatus 2 described above is used as the encoding unit PROD_C1.

The recording medium PROD_M may be of a type built in the recording device PROD_C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of the type connected to the recording device PROD_C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray Disc: registration) Or a drive device (not shown) built in the recording device PROD_C.

The recording device PROD_C is a camera PROD_C3 that captures moving images as a supply source of moving images to be input to the encoding unit PROD_C1, an input terminal PROD_C4 for inputting moving images from the outside, and reception for receiving moving images. The unit PROD_C5 and an image processing unit C6 that generates or processes an image may be further provided. FIG. 31A illustrates a configuration in which the recording apparatus PROD_C includes all of these, but a part of the configuration may be omitted.

The receiving unit PROD_C5 may receive a non-encoded moving image, or may receive encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit PROD_C5 and the encoding unit PROD_C1.

Examples of such a recording device PROD_C include a DVD recorder, a BD recorder, and an HDD (Hard Disk Drive) recorder (in this case, the input terminal PROD_C4 or the receiving unit PROD_C5 is a main supply source of moving images). . In addition, a camcorder (in this case, the camera PROD_C3 is a main source of moving images), a personal computer (in this case, the receiving unit PROD_C5 or the image processing unit C6 is a main source of moving images), a smartphone (in this case In this case, the camera PROD_C3 or the receiving unit PROD_C5 is a main supply source of moving images) is also an example of such a recording device PROD_C.

(B) of FIG. 31 is a block showing a configuration of a playback device PROD_D equipped with the above-described video decoding device 1. As shown in (b) of FIG. 31, the playback device PROD_D reads a moving image by decoding a read unit PROD_D1 that reads encoded data written to the recording medium PROD_M and a read unit PROD_D1 that reads the encoded data. And a decoding unit PROD_D2 to be obtained. The moving picture decoding apparatus 1 described above is used as the decoding unit PROD_D2.

Note that the recording medium PROD_M may be of the type built into the playback device PROD_D, such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory, It may be of a type connected to the playback device PROD_D, or (3) may be loaded into a drive device (not shown) built in the playback device PROD_D, such as DVD or BD. Good.

In addition, the playback device PROD_D has a display PROD_D3 that displays a moving image, an output terminal PROD_D4 that outputs the moving image to the outside, and a transmission unit that transmits the moving image as a supply destination of the moving image output by the decoding unit PROD_D2. PROD_D5 may be further provided. FIG. 31B illustrates a configuration in which the playback apparatus PROD_D includes all of these, but some of them may be omitted.

The transmission unit PROD_D5 may transmit an unencoded moving image, or transmits encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, it is preferable to interpose an encoding unit (not shown) that encodes a moving image with an encoding method for transmission between the decoding unit PROD_D2 and the transmission unit PROD_D5.

Examples of such a playback device PROD_D include a DVD player, a BD player, and an HDD player (in this case, an output terminal PROD_D4 to which a television receiver or the like is connected is a main supply destination of moving images). . In addition, a television receiver (in this case, the display PROD_D3 is a main supply destination of moving images), a digital signage (also referred to as an electronic signboard or an electronic bulletin board), and the display PROD_D3 or the transmission unit PROD_D5 is the main supply of moving images. Desktop PC (in this case, the output terminal PROD_D4 or the transmission unit PROD_D5 is the main video image supply destination), laptop or tablet PC (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a moving image) A smartphone (which is a main image supply destination), a smartphone (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a main moving image supply destination), and the like are also examples of such a playback device PROD_D.
[Conclusion]
Finally, each block of the moving picture decoding apparatus 1 and the moving picture encoding apparatus 2 described above may be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or may be a CPU (Central It may be realized by software using a Processing Unit).

In the latter case, each device includes a CPU that executes instructions of a program that realizes each function, a ROM (Read （Memory) that stores the program, a RAM (Random Memory) that expands the program, the program, and various types A storage device (recording medium) such as a memory for storing data is provided. An object of the present invention is to provide a recording medium in which a program code (execution format program, intermediate code program, source program) of a control program of each of the above devices, which is software that realizes the above-described functions, is recorded so as to be readable by a computer. This can also be achieved by supplying to each of the above devices and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU).

Examples of the recording medium include tapes such as magnetic tape and cassette tape, magnetic disks such as floppy (registered trademark) disks / hard disks, and CD-ROM / MO / MD / DVD / CD-R / Blu-ray disks (registered trademarks). ) And other optical disks, IC cards (including memory cards) / optical cards, semiconductor memories such as mask ROM / EPROM / EEPROM / flash ROM, PLD (Programmable logic device) and FPGA ( Logic circuits such as Field Programmable Gate Array can be used.

Also, each of the above devices may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited as long as it can transmit the program code. For example, the Internet, intranet, extranet, LAN, ISDN, VAN, CATV communication network, virtual private network (Virtual Private Network), telephone line network, mobile communication network, satellite communication network, etc. can be used. The transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type. For example, even in the case of wired lines such as IEEE 1394, USB, power line carrier, cable TV line, telephone line, ADSL (Asymmetric Digital Subscriber Line) line, infrared rays such as IrDA and remote control, Bluetooth (registered trademark), IEEE 802.11 wireless, HDR ( It can also be used by wireless such as High Data Rate, NFC (Near Field Communication), DLNA (Digital Living Network Alliance), mobile phone network, satellite line, and terrestrial digital network. The present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.

The present invention is not limited to the above-described embodiment, and various modifications can be made within the scope indicated in the claims. That is, embodiments obtained by combining technical means appropriately modified within the scope of the claims are also included in the technical scope of the present invention.

For example, the image decoding device according to the present embodiment described above decodes a moving image from encoded data. However, regardless of whether the image is a moving image or a still image, the image decoding device generally Applicable. The same applies to the image encoding device.

(Appendix 1)
In the above embodiment, LCU (Largest Coding Unit) is H.264. It corresponds to the root of a coding tree (Coding Tree) of HEVC (High Efficiency Video Coding) proposed as a successor to H.264 / MPEG-4 AVC, and a leaf CU is a CU (Coding Unit, coding) It is also called the leaf of the tree). Moreover, PU and TU in the said embodiment are respectively equivalent to the prediction tree (Prediction Tree) and transformation tree (transform tree) in HEVC. Moreover, the partition of PU in the said embodiment is corresponded to PU (Prediction Unit) in HEVC. In the above embodiment, a block obtained by dividing a TU corresponds to a TU (Transformation Unit) in HEVC.

(Appendix 2)
Further, as described above, according to the moving image encoding device 2 and the moving image decoding device 1 according to one aspect of the present invention, each component of each pixel included in the locally decoded image has a variation and is linear. This is a configuration that improves the possibility that higher prediction accuracy can be obtained even if prediction between the channels is performed according to the correlation, even if the prediction accuracy is reduced.

That is, the prediction image generation unit 13 refers to the local decoded image P ′ positioned around the target partition, and performs nonlinearity between the decoded luminance (Y) channel and the color difference (U, V) channel to be decoded. a LUT derivation unit 134 for deriving a correlation as LUT, the target partition, according to the LUT LUT deriving unit 134 derives, from the luminance decoded image _{P Y,} and inter-channel prediction unit 351 for generating a color difference prediction image PredC, the Prepare.

(Appendix 3)
As described above, the image decoding apparatus according to the present invention generates a prediction image for each of a plurality of channels indicating each component constituting an image, and adds a prediction residual to the generated prediction image. In the image decoding device that decodes the image data encoded according to the above, for each block to be processed, channel decoding means for decoding one or more channels among the plurality of channels, and each of the plurality of channels has been decoded. Then, with reference to the local decoded image located around the block to be processed, a non-linear correlation between the one or more channels already decoded by the channel decoding means and the other channels to be decoded is derived. For the correlation deriving means and the processing target block, the decoded block is added according to the derived correlation. From one or more channels of the decoded image, which is configured to include, the predicted image generating means for generating the prediction image of the other channels.

In the image decoding device according to the present invention, the correlation deriving unit is configured to detect a decoded channel when a pixel value of the decoded image of the decoded channel does not exist as a corresponding pixel value of a pixel included in the local decoded image. It is preferable to derive the nonlinear correlation by performing interpolation using pixel values of pixels included in the local decoded image having pixel values within a predetermined range of pixel values of the decoded image.

In the above configuration, the pixel value of the image is the value of any component that forms the image.

In addition, when the pixel value of the decoded image of the decoded channel does not exist as the corresponding pixel value of the pixel included in the local decoded image, in the example of the luminance value, the prediction source The pixel value of the decoded image of the decoded luminance channel that does not appear as the luminance value of the pixel included in the local decoded image.

The nonlinear correlation may be derived in advance, or the pixel value of the decoded image of the decoded channel does not exist as the corresponding pixel value of the pixel included in the local decoded image. You may derive | lead-out in the stage which became clear.

According to the above configuration, for a value that does not appear as a pixel value of a pixel included in the locally decoded image, a correlation can be obtained by linear interpolation from previous and subsequent samples near the value. For example, for each sample point consisting of a luminance value and a color difference value, the correlation can be derived by linearly interpolating adjacent points. As another example of nonlinear correlation, each point can be derived by approximating each point by cubic interpolation.

Since the correlation is derived by interpolation using the pixel value obtained from the sample of the locally decoded image in this way, the value for the decoded channel is the value that does not appear as the pixel value of the pixel included in the locally decoded image. Thus, the value for the decoding target channel can be predicted with high accuracy.

In the image decoding apparatus according to the present invention, it is preferable that the correlation deriving unit derives a relationship between a plurality of decoded channels and a decoding target channel as a correlation.

For example, specifically, regarding the YUV color space, the V channel to be decoded can be predicted from the decoded luminance channel and U channel by the above configuration.

According to the above configuration, since the prediction between channels is performed using the relationship between a plurality of decoded channels and the channel to be decoded as a correlation, the prediction accuracy can be improved.

In the image decoding apparatus according to the present invention, it is preferable that the correlation deriving unit derives a correlation between a plurality of pixel values included in a locally decoded image of a decoded channel and a channel to be decoded.

For example, specifically for the YUV color space, the above configuration derives a correlation between a plurality of luminance values included in the locally decoded image of the decoded luminance channel and the color difference to be decoded. In this case, it is desirable that the plurality of luminance values are luminance values within a predetermined range. For example, it is desirable to derive a non-linear correlation between adjacent luminance values. This is because the influence of noise is averaged by performing prediction from a plurality of luminance values.

According to the above configuration, since prediction between channels is performed using correlation between a plurality of pixel values of a decoded channel and a channel to be decoded, prediction accuracy can be improved.

In the image decoding apparatus according to the present invention, channel decoding processing order information indicating in which order the plurality of channels are to be decoded, and from which of the decoded channels the channel to be decoded should be predicted The processing information acquisition means for acquiring the specified prediction source channel information, and the plurality of channels as decoding targets in the order indicated by the channel decoding processing order information, and the decoding target channels as the prediction source It is preferable to include a prediction control unit that performs control so as to perform prediction from the decoded channel specified in the channel information.

According to the above configuration, inter-channel prediction can be controlled based on designation of channel decoding process order information and prediction source channel information. The channel decoding processing order information and the prediction source channel information are included in encoded data including encoded image data, for example. Therefore, for example, it is possible to cope with an image encoding device that encodes channel decoding process order information and prediction source channel information into encoded data and transmits the encoded data.

In the image decoding apparatus according to the present invention, the processing information acquisition means acquires the channel decoding processing order information and the prediction source channel information that are encoded in a predetermined processing unit in decoding processing, and the prediction control means Preferably, the control is performed in accordance with the channel decoding processing order information and the prediction source channel information acquired by the processing information acquisition unit.

According to the above configuration, the control can be changed according to the processing unit. For each processing unit, it is possible to set the actual decoding target order and prediction source channel.

In the image decoding device according to the present invention, the correlation deriving means uses, for the processing target block included in the block group including a plurality of blocks, the local decoded image decoded in the processed block included in the block group. It is preferable to derive the correlation.

In the above configuration, there is a tendency that a spatial correlation exists between each block included in the block group.

That is, here, the local decoded image “located around the processing target block” is the local decoded image decoded in “the processed block included in the processing target block group”.

Also, the more sample points, the higher the accuracy of prediction based on the above correlation. Therefore, if as many sample points as possible can be acquired between blocks having a spatial correlation, the prediction accuracy of inter-channel prediction can be improved.

In the above configuration, in the inter-channel prediction of the processing target block, a correlation derived from a locally decoded image decoded in a processed block included in the block group, that is, a block having a high possibility of having a spatial correlation is used. Thereby, the prediction accuracy of inter-channel prediction can be improved.

As described above, the image encoding device according to the present invention encodes the prediction residual obtained by subtracting the prediction image generated for each of a plurality of channels indicating each component constituting the image from the original image. In the image encoding device that generates encoded data by this, for the processing target block, channel decoding means for decoding one or more channels among the plurality of channels, and each of the plurality of channels has been decoded. A correlation for deriving a non-linear correlation between the one or more channels decoded by the channel decoding means and the other channels to be decoded with reference to the locally decoded image located around the block to be processed The derivation means and the processing target block are processed in accordance with the above derived correlation. From one or more channels of the decoded image, which is configured to include, the predicted image generating means for generating the prediction image of the other channels.

As described above, the data structure of the encoded data according to the present invention encodes the prediction residual obtained by subtracting the prediction image generated for each of a plurality of channels indicating each component constituting the image from the original image. A data structure of encoded data generated by converting to an image encoded by generating a predicted image for each of the plurality of channels and adding a prediction residual to the generated predicted image In an image decoding apparatus that decodes data, channel decoding processing order information that indicates in which order the plurality of channels are decoded for the processing target block, and one or more decoded ones for the processing target block From the decoded image of the channel, the block to be processed has been decoded for each of the plurality of channels. The predicted image of the other channel is generated according to a non-linear correlation between the one or more channels that have been decoded and the other channel to be decoded with reference to a locally decoded image located in the vicinity of The data structure includes prediction source channel designation information for designating whether or not to perform.

(Appendix 4)
The embodiments of the present invention have been described in detail with reference to the drawings. However, the specific configuration is not limited to these embodiments, and the design and the like within the scope not departing from the gist of the present invention are also claimed. Included in the range.

The present invention can be suitably applied to a decoding device that decodes encoded data and an encoding device that generates encoded data. Further, the present invention can be suitably applied to the data structure of encoded data generated by the encoding device and referenced by the decoding device.

1 video decoding device (image decoding device)
2 Video encoding device (image encoding device)
13 Predictive image generation unit (channel decoding means)
14 Adder (Channel decoding means)
16 LUT derivation unit (correlation derivation means)
134 LUT derivation unit (correlation derivation means)
135 Color difference prediction image generation unit (prediction image generation means, processing information acquisition means, prediction control means)
351 Inter-channel prediction unit (predicted image generation means)
PP intra prediction parameter (data structure of encoded data)

Claims

In an image decoding apparatus that decodes image data encoded by generating a prediction image for each of a plurality of channels indicating each component constituting an image and adding a prediction residual to the generated prediction image,
Channel decoding means for decoding one or a plurality of channels among the plurality of channels for the processing target block;
With reference to the locally decoded image located around the processing target block that has been decoded for each of the plurality of channels, the one or more channels that have been decoded by the channel decoding means and the other channels to be decoded Correlation derivation means for deriving a nonlinear correlation between
And a predicted image generating unit configured to generate the predicted image of the other channel from the decoded image of the one or more channels that has been decoded in accordance with the derived correlation. Image decoding device.
The correlation deriving means, when a pixel value of the decoded image of the decoded channel does not exist as a pixel value corresponding to a pixel included in the local decoded image, a predetermined range of pixel values of the decoded image of the decoded channel The image decoding apparatus according to claim 1, wherein the nonlinear correlation is derived by interpolation using a pixel value of a pixel included in the local decoded image having a pixel value within.
The image decoding apparatus according to claim 1 or 2, wherein the correlation deriving means derives a relationship between a plurality of decoded channels and a channel to be decoded as a correlation.
3. The image decoding according to claim 1, wherein the correlation deriving unit derives a correlation between a plurality of pixel values included in a locally decoded image of a decoded channel and a channel to be decoded. apparatus.
Obtains channel decoding processing order information indicating in which order the plurality of channels are to be decoded, and prediction source channel information specifying which of the decoded channels is to be predicted as a decoding target channel Processing information acquisition means,
The plurality of channels are set as decoding targets in the order indicated by the channel decoding processing order information, and the channels set as decoding targets are controlled to be predicted from the decoded channels specified in the prediction source channel information. The image decoding apparatus according to claim 1, further comprising: a prediction control unit.
The processing information acquisition means acquires the channel decoding processing order information and the prediction source channel information that are encoded in a predetermined processing unit in decoding processing,
6. The image decoding apparatus according to claim 5, wherein the prediction control unit performs the control according to the channel decoding processing order information and the prediction source channel information acquired by the processing information acquisition unit.
The correlation deriving means derives a correlation for the processing target block included in the block group including a plurality of blocks by using the local decoded image decoded in the processed block included in the block group. The image decoding device according to any one of claims 1 to 6.
In an image encoding device that generates encoded data by encoding a prediction residual obtained by subtracting a prediction image generated for each of a plurality of channels indicating each component constituting an image from an original image,
Channel decoding means for decoding one or a plurality of channels among the plurality of channels for the processing target block;
With reference to the locally decoded image located around the processing target block that has been decoded for each of the plurality of channels, the one or more channels that have been decoded by the channel decoding means and the other channels to be decoded Correlation derivation means for deriving a nonlinear correlation between
And a predicted image generating unit configured to generate the predicted image of the other channel from the decoded image of the one or more channels that has been decoded in accordance with the derived correlation. Image encoding device.
A data structure of encoded data generated by encoding a prediction residual obtained by subtracting a predicted image generated for each of a plurality of channels indicating each component constituting an image from an original image,
An image decoding device that generates a prediction image for each of the plurality of channels and decodes encoded image data by adding a prediction residual to the generated prediction image.
Channel decoding processing order information indicating in which order the plurality of channels are decoded for the processing target block;
With respect to the processing target block, from one of the decoded images of one or more of the channels described above, with reference to a local decoded image located around the processing target block that has been decoded for each of the plurality of channels, Prediction source channel information that specifies whether to generate the prediction image of the other channel according to a non-linear correlation between the one or more channels that have been decoded and another channel to be decoded. A data structure of encoded data characterized by the above.