WO2015056735A1

WO2015056735A1 - Image decoding device

Info

Publication number: WO2015056735A1
Application number: PCT/JP2014/077530
Authority: WO
Inventors: 知宏猪飼; 健史筑波; 山本　智幸
Original assignee: シャープ株式会社
Priority date: 2013-10-16
Filing date: 2014-10-16
Publication date: 2015-04-23
Also published as: JP6401707B2; JPWO2015056735A1

Abstract

The present invention prevents a situation in which there is no reference picture of a reference layer during advanced residual prediction (ARP) processing. This image decoding device, which performs residual prediction, comprises: a reference picture determination unit (3095) which determines whether a residual prediction reference picture can be used; and a residual prediction unit (3092) which applies residual prediction to a motion compensation image in a reference layer if the reference picture of the reference layer can be used. The reference picture determination unit performs the determination according to whether the reference picture of the residual prediction reference layer is stored in a decoded picture buffer (DPB).

Description

Image decoding device

The present invention relates to an image decoding apparatus that performs residual prediction between a target layer and a reference layer.

In recent years, HEVC (High-Efficiency Video Coding) is known as one of video encoding methods for encoding, transmitting, and storing video (Non-patent Document 1). There is also known a technique for generating encoded data from a plurality of moving images by encoding a plurality of mutually related moving images into layers (hierarchies). This is called an encoding technique (or scalable encoding technique).

Also, MV-HEVC (Multi-View HEVC) based on HEVC has been proposed as one of hierarchical coding technologies (Non-patent Document 2).

MV-HEVC supports view scalability. In view scalability, a moving image corresponding to a plurality of different viewpoints (views) is divided into layers and encoded to generate hierarchical encoded data. For example, a moving image corresponding to a basic viewpoint (base view) is encoded as a lower layer. Next, a moving image corresponding to a different viewpoint is encoded as an upper layer after applying inter-layer prediction.

Furthermore, 3D video technology called 3D-HEVC (3D High Efficiency Efficiency Video Coding) based on HEVC and MV-HEVC is being studied.

In 3D-HEVC, a prediction technique called ARP (Advanced Residual Prediction) has also been proposed. In ARP, pictures that have been decoded at different viewpoints are used to estimate residuals in the target picture. More specifically, in the ARP, the residual of the target picture is estimated from the difference between the decoded image (refIvRefPic) of the reference layer other than the target time and the decoded image (currIvRefPic) of the reference layer at the target time.

However, the conventional technology as described above has a problem that refIvRefPic is already discarded from DPB (Decoded Picture Buffer) and may not exist.

It will be specifically described with reference to FIG. In FIG. 24, two views are encoded. Pictures P101, P102, and P103 are pictures in the target view (layer), and pictures P201, P202, and P203 are pictures of the reference view (RefViewIdx). Note that the number assigned to each picture indicates the decoding order.

Here, when decoding the picture P103, it is assumed that the picture P103 is predicted from the picture P101.

In this case, in ARP, a residual is derived from the picture corresponding to the current POC or the picture P203 specified by currIvRefPic and the picture P202 specified by the first picture or refIvRefPic in RefPicListX.

However, in the example shown in FIG. 24, since the picture P202 specified by refIvRefPic is not referenced from the picture P203 specified by currIvRefPic, if it is deleted from the DPB before decoding of the picture P103, The picture P202 specified by refIvRefPic cannot be used. The target picture is decoded while referring to the pictures stored in the reference picture list. Since the reference picture list changes in units of slices, the position of the picture P102 (arpRefpic) in the reference picture list is in units of slices. There is also the problem that it may change. Furthermore, there is a possibility that the POC of the picture P102 to be referred to matches the POC of the picture P103 of the target picture to be decoded.

The present invention has been made in view of the above problems, and an object of the present invention is to determine whether there is no reference picture in the reference layer, or the location of the reference picture in the target layer changes between slices, or decoding during residual prediction. An object of the present invention is to realize an image decoding apparatus or the like that can avoid a situation in which the picture order of a target picture to be matched matches the picture order of a picture in a target layer to be referred to.

An image decoding apparatus according to an aspect of the present invention includes a reference picture determination unit that determines whether or not a residual prediction reference picture is usable, and a residual that performs residual prediction using the residual prediction reference picture. A difference prediction application unit, wherein the reference picture determination unit performs the determination according to whether or not a reference picture of the residual prediction reference layer is stored in a DPB (Decoded Picture） Buffer). .

An image decoding apparatus according to another aspect of the present invention includes a reference picture deriving unit that derives a reference picture for residual prediction, and a residual prediction applying unit that performs residual prediction using the residual prediction reference picture. ,
The reference picture deriving unit derives, as the residual prediction reference picture, a reference picture having a picture order different from the picture order of the target picture among the reference pictures included in the reference picture list.

An image decoding apparatus according to another aspect of the present invention uses a reference picture selection unit that derives a reference picture for residual prediction, and the residual prediction reference picture when the residual prediction reference picture is available A residual prediction application unit that performs residual prediction, scans the reference picture list in order from the top, and the absolute value of the difference between the POC of the reference picture RefPicListX [i] and the POC (PicOrderCntVal) of the target picture currPic is When the difference is smaller than the POC difference, the reference picture is set as a residual prediction reference picture.

An image decoding apparatus according to another aspect of the present invention provides a reference picture derivation that derives, as a residual prediction reference picture, a reference picture having a picture order different from the picture order of the target picture among reference pictures included in a reference picture list And an inter prediction parameter decoding control unit for decoding the residual prediction flag iv_res_pred_weight_idx when the residual prediction reference picture is available.

The image decoding apparatus according to an aspect of the present invention has an effect that it is possible to avoid a situation in which a decoded picture of the reference layer cannot be used during residual prediction for a target picture.

Also, according to the image decoding apparatus according to another aspect of the present invention, it is possible to ensure that the arpRefpic POC is different from the POC of the current picture.

It is a functional block diagram which shows the structure of the residual prediction part contained in the image decoding apparatus which concerns on one Embodiment of this invention. It is the schematic which shows the structure of the image transmission system which concerns on this embodiment. It is a figure which shows the hierarchical structure of the data of the encoding stream which concerns on this embodiment. (A) is a conceptual diagram which shows an example of a reference picture list, (b) is a conceptual diagram which shows an example of a vector candidate. It is a conceptual diagram which shows the example of a reference picture. It is the schematic which shows the structure of the image decoding apparatus which concerns on this embodiment. It is the schematic which shows the structure of the inter prediction parameter decoding part which concerns on this embodiment. It is a table | surface which shows the relationship between merge index merge_idx and the weight number of residual prediction. It is the functional block diagram which illustrated the structure of the decoded picture management part contained in the said image decoding apparatus. It is the schematic which shows the structure of the inter estimated image generation part which concerns on this embodiment. (A) is a conceptual diagram (part 1) of residual prediction according to the present embodiment, and (b) is a conceptual diagram (part 2) of residual prediction according to the present embodiment. It is a block diagram which shows the structure of the image coding apparatus which concerns on this embodiment. It is the schematic which shows the structure of the inter prediction parameter encoding part which concerns on this embodiment. It is a syntax which shows the example which determines an arpRefPicAvailable flag in CU level. It is a syntax which shows another example which determines an arpRefPicAvailable flag in CU level. It is a figure which is a part of syntax table utilized at the time of SPS decoding by the entropy decoding part of the said image decoding apparatus, Comprising: It is a figure which shows the part which concerns on a reference picture set and a reference picture list. (A) is a figure which shows the syntax table utilized at the time of decoding of short-term reference picture set information by the entropy decoding part of the said image decoding apparatus. (B) is a part of a syntax table used at the time of slice header decoding by the entropy decoding unit of the image decoding apparatus, and shows a part related to a reference picture set. (A) is a part of syntax table referred to at the time of decoding of the VPS extension (vps_extension) included in the VPS, and shows a part corresponding to IL-RPS information. (B) is a diagram showing a part corresponding to IL-RPS information, which is a part of a syntax table referred to at the time of slice decoding. (A) is a figure which shows the relationship between the dependence type in the case where there exist inter-layer image prediction and inter-layer motion prediction in the type of inter-layer prediction, and the availability of each inter-layer prediction. (B) is a diagram illustrating the relationship between sub RPS (inter-layer pixel RPS and inter-layer motion limited RPS) included in the inter-layer RPS generated in the image decoding apparatus and the dependency type. It is a flowchart showing the derivation | leading-out process of sub RPS (inter-layer pixel RPS and inter-layer motion limited RPS) included in inter-layer RPS. (A) is a part of the syntax table used at the time of slice header decoding by the entropy decoding unit of the image decoding apparatus, and shows a part related to the reference picture list. (B) is a figure which shows the syntax table utilized at the time of decoding of reference picture list correction information by the entropy decoding part of the said image decoding apparatus. It is the schematic which shows the temporary L0 reference list and temporary L1 reference list which are produced | generated in the intermediate process of L0 reference list and L1 reference list derivation | leading-out in the RPL derivation part of the said image decoding apparatus. It is the schematic which shows the structure of the merge prediction parameter derivation | leading-out part contained in the said image decoding apparatus. It is a figure explaining the relationship between the picture of the object layer in ARP, and a reference layer.

[First Embodiment]
Hereinafter, embodiments of the present invention will be described with reference to the drawings.

First, a schematic configuration of the image transmission system 1 according to the present embodiment will be described with reference to FIG. FIG. 2 is a schematic diagram showing the configuration of the image transmission system 1 according to the present embodiment.

The image transmission system 1 is a system that transmits a code obtained by encoding a plurality of layer images and displays an image obtained by decoding the transmitted code. The image transmission system 1 includes an image encoding device 11, a network 21, an image decoding device 31, and an image display device 41.

The signal T indicating a plurality of layer images (also referred to as texture images) is input to the image encoding device 11. A layer image is an image that is viewed or photographed at a certain resolution and a certain viewpoint. When performing view scalable coding in which a three-dimensional image is coded using a plurality of layer images, each of the plurality of layer images is referred to as a viewpoint image. Here, the viewpoint corresponds to the position or observation point of the photographing apparatus. For example, the plurality of viewpoint images are images taken by the left and right photographing devices toward the subject. The image encoding device 11 encodes each of the signals to generate an encoded stream Te (encoded data). Details of the encoded stream Te will be described later. A viewpoint image is a two-dimensional image (planar image) observed at a certain viewpoint. The viewpoint image is indicated by, for example, a luminance value or a color signal value for each pixel arranged in a two-dimensional plane. Hereinafter, one viewpoint image or a signal indicating the viewpoint image is referred to as a picture. In addition, when performing spatial scalable coding using a plurality of layer images, the plurality of layer images include a base layer image having a low resolution and an enhancement layer image having a high resolution. When SNR scalable encoding is performed using a plurality of layer images, the plurality of layer images are composed of a base layer image with low image quality and an extended layer image with high image quality. Note that view scalable coding, spatial scalable coding, and SNR scalable coding may be arbitrarily combined. In the present embodiment, encoding and decoding of an image including at least a base layer image and an image other than the base layer image (enhancement layer image) is handled as the plurality of layer images. Of the multiple layers, for two layers that have a reference relationship (dependency relationship) in the image or encoding parameter, the image on the reference side is referred to as a first layer image, and the image on the reference side is referred to as a second layer image. . For example, when there is an enhancement layer image (other than the base layer) that is encoded with reference to the base layer, the base layer image is treated as a first layer image and the enhancement layer image is treated as a second layer image. Note that examples of the enhancement layer image include an image of a viewpoint other than the base view and a depth image.

The network 21 transmits the encoded stream Te generated by the image encoding device 11 to the image decoding device 31. The network 21 can be configured by, for example, the Internet, a wide area network (WAN: Wide Area Network), a small network (LAN: Local Area Network), or a combination thereof. The network 21 is not necessarily limited to a bidirectional communication network, and may be a unidirectional or bidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting and satellite broadcasting. The network 21 may be replaced by a storage medium that records an encoded stream Te such as a DVD (Digital Versatile Disc) or a BD (Blue-ray Disc).

The image decoding device 31 decodes each of the encoded streams Te transmitted by the network 21, and generates a plurality of decoded layer images Td (decoded viewpoint images Td).

The image display device 41 displays all or part of the plurality of decoded layer images Td generated by the image decoding device 31. For example, in view scalable coding, a 3D image (stereoscopic image) and a free viewpoint image are displayed in all cases, and a 2D image is displayed in some cases. The image display device 41 includes, for example, a display device such as a liquid crystal display or an organic EL (Electro-luminescence) display. In addition, in the spatial scalable coding and SNR scalable coding, when the image decoding device 31 and the image display device 41 have a high processing capability, a high-quality enhancement layer image is displayed and only a lower processing capability is provided. Displays a base layer image that does not require higher processing capability and display capability as an extension layer.

Prior to detailed description of the image encoding device 11 and the image decoding device 31 according to the present embodiment, the data structure of the encoded stream Te generated by the image encoding device 11 and decoded by the image decoding device 31 will be described. .

FIG. 3 is a diagram showing a hierarchical structure of data in the encoded stream Te. The encoded stream Te illustratively includes a sequence and a plurality of pictures constituting the sequence. (A) to (f) of FIG. 3 respectively show a sequence layer that defines a sequence SEQ, a picture layer that defines a picture PICT, a slice layer that defines a slice S, a slice data layer that defines slice data, and a slice data. It is a figure which shows the encoding unit layer which prescribes | regulates the encoding tree layer which prescribes | regulates the encoding tree unit contained, and the coding unit (Coding Unit; CU) contained in a coding tree.

(Sequence layer) In the sequence layer, a set of data referred to by the image decoding device 31 for decoding a sequence SEQ to be processed (hereinafter also referred to as a target sequence) is defined. As shown in FIG. 3A, the sequence SEQ includes a video parameter set VPS (Video Parameter Set) sequence parameter set SPS (Sequence Parameter Parameter Set), a picture parameter set PPS (Picture Parameter Parameter Set), a picture PICT, and an addition. Extension information SEI (SupplementallementEnhancement Information) is included. Here, the value indicated after # indicates the layer ID. FIG. 3 shows an example in which encoded data of # 0 and # 1, that is, layer 0 and layer 1, exists, but the type of layer and the number of layers are not dependent on this.

The video parameter set VPS is a set of encoding parameters common to a plurality of moving images, a plurality of layers included in the moving image, and encoding parameters related to individual layers in a moving image composed of a plurality of layers. A set is defined.

The sequence parameter set SPS defines a set of encoding parameters that the image decoding device 31 refers to in order to decode the target sequence. For example, the width and height of the picture are defined.

In the picture parameter set PPS, a set of encoding parameters referred to by the image decoding device 31 in order to decode each picture in the target sequence is defined. For example, a quantization width reference value (pic_init_qp_minus26) used for picture decoding and a flag (weighted_pred_flag) indicating application of weighted prediction are included. A plurality of PPS may exist. In that case, one of a plurality of PPSs is selected from each picture in the target sequence.

(Picture layer) In the picture layer, a set of data referred to by the image decoding device 31 for decoding a picture PICT to be processed (hereinafter also referred to as a target picture) is defined. As shown in FIG. 3B, the picture PICT includes slices S0 to SNS-1 (NS is the total number of slices included in the picture PICT).

It should be noted that, hereinafter, when it is not necessary to distinguish each of the slices S0 to SNS-1, the subscripts may be omitted. The same applies to data included in an encoded stream Te described below and to which other subscripts are attached.

(Slice layer) In the slice layer, a set of data referred to by the image decoding device 31 for decoding a slice S (also referred to as a target slice) to be processed is defined. As shown in FIG. 3C, the slice S includes a slice header SH and slice data SDATA.

The slice header SH includes a coding parameter group that the image decoding device 31 refers to in order to determine a decoding method of the target slice. The slice type designation information (slice_type) that designates the slice type is an example of an encoding parameter included in the slice header SH.

As slice types that can be specified by the slice type specification information, (1) I slice using only intra prediction at the time of encoding, (2) P slice using unidirectional prediction or intra prediction at the time of encoding, (3) B-slice using unidirectional prediction, bidirectional prediction, or intra prediction at the time of encoding may be used.

In addition, the slice header SH may include a reference (pic_parameter_set_id) to the picture parameter set PPS included in the sequence layer.

(Slice data layer) In the slice data layer, a set of data referred to by the image decoding device 31 for decoding the slice data SDATA to be processed is defined. The slice data SDATA includes a coding tree block (CTB: Coded ツリー Tree Block) (coding tree unit CTU) as shown in FIG. CTB is a block of a fixed size (for example, 64 × 64) constituting a slice, and may be called a maximum coding unit (LCU: LargegestLCording Unit).

(Encoding tree layer) As shown in (e) of FIG. 2, the encoding tree layer defines a set of data that the image decoding device 31 refers to in order to decode the encoding tree block to be processed. . The coding tree unit is divided by recursive quadtree division. A node having a tree structure obtained by recursive quadtree partitioning is called a coding tree. An intermediate node of the quadtree is a coded quadtree tree (CQT: Coded Quad Tree), and the CTU is defined as including the highest CQT. The CQT includes a split flag (split_flag). When the split_flag is 1, the CQT is divided into four CQTs (including four CQTs). When split_flag is 0, CQT includes a coding unit (CU: Coded Unit) that is a terminal node. The encoding unit CU is a basic unit of the encoding process.

(Encoding unit layer) As shown in (f) of FIG. 3, the encoding unit layer defines a set of data referred to by the image decoding device 31 in order to decode the encoding unit to be processed. Specifically, the encoding unit includes a CU header CUH, a prediction tree, a conversion tree, and a CU header CUH. In the CU header CUH, it is defined whether the coding unit is a unit using intra prediction or a unit using inter prediction. The encoding unit is the root of a prediction tree (prediction tree; PT) and a transformation tree (transform tree; TT). The CU header CUH is included between the prediction tree and the conversion tree or after the conversion tree.

In the prediction tree, the coding unit is divided into one or a plurality of prediction blocks, and the position and size of each prediction block are defined. In other words, the prediction block is one or a plurality of non-overlapping areas constituting the coding unit. The prediction tree includes one or a plurality of prediction blocks obtained by the above division.

Prediction processing is performed for each prediction block. Hereinafter, a prediction block which is a unit of prediction is also referred to as a prediction unit (PU, prediction unit).

There are roughly two types of division in the prediction tree: intra prediction and inter prediction. Intra prediction is prediction within the same picture, and inter prediction refers to prediction processing performed between different pictures (for example, between display times and between layer images).

In the case of intra prediction, there are 2N × 2N (the same size as the encoding unit) and N × N division methods.

Further, in the case of inter prediction, the division method is encoded by part_mode of encoded data, and 2N × 2N (the same size as the encoding unit), 2N × N, 2N × nU, 2N × nD, N × 2N, nL X2N, nRx2N, and NxN. Note that 2N × nU indicates that a 2N × 2N encoding unit is divided into two regions of 2N × 0.5N and 2N × 1.5N in order from the top. 2N × nD indicates that a 2N × 2N encoding unit is divided into two regions of 2N × 1.5N and 2N × 0.5N in order from the top. nL × 2N indicates that a 2N × 2N encoding unit is divided into two regions of 0.5N × 2N and 1.5N × 2N in order from the left. nR × 2N indicates that a 2N × 2N encoding unit is divided into two regions of 1.5N × 2N and 0.5N × 1.5N in order from the left. Since the number of divisions is one of 1, 2, and 4, PUs included in the CU are 1 to 4. These PUs are expressed as PU0, PU1, PU2, and PU3 in order.

Also, in the transform tree, the encoding unit is divided into one or a plurality of transform blocks, and the position and size of each transform block are defined. In other words, the transform block is one or a plurality of non-overlapping areas constituting the encoding unit. The conversion tree includes one or a plurality of conversion blocks obtained by the above division.

The division in the transformation tree includes the one in which an area having the same size as that of the encoding unit is assigned as the transformation block, and the one in the recursive quadtree division like the above-described division in the tree block.

Conversion processing is performed for each conversion block. Hereinafter, a transform block that is a unit of transformation is also referred to as a transform unit (TU).

(Prediction parameter) The prediction image of the prediction unit is derived by the prediction parameter attached to the prediction unit. The prediction parameters include a prediction parameter for intra prediction or a prediction parameter for inter prediction. Hereinafter, prediction parameters for inter prediction (inter prediction parameters) will be described. The inter prediction parameter includes prediction list use flags predFlagL0 and predFlagL1, reference picture indexes refIdxL0 and refIdxL1, and vectors mvL0 and mvL1. The prediction list use flags predFlagL0 and predFlagL1 are flags indicating whether or not reference picture lists called L0 list and L1 list are used, respectively, and a reference picture list corresponding to a value of 1 is used. In this specification, when “flag indicating whether or not XX” is described, 1 is XX, 0 is not XX, 1 is true and 0 is false in logical negation and logical product. (The same applies hereinafter). However, other values can be used as true values and false values in an actual apparatus or method. When two reference picture lists are used, that is, when predFlagL0 = 1 and predFlagL1 = 1 correspond to bi-prediction, when one reference picture list is used, that is, (predFlagL0, predFlagL1) = (1, 0) Alternatively, the case of (predFlagL0, predFlagL1) = (0, 1) corresponds to single prediction. Note that the prediction list use flag information can also be expressed by an inter prediction flag inter_pred_idc described later. Normally, a prediction list use flag is used in a prediction image generation unit and a prediction parameter memory described later, and an inter prediction flag inter_pred_idc is used when decoding information on which reference picture list is used from encoded data. It is done.

Syntax elements for deriving inter prediction parameters included in the encoded data include, for example, a partition mode part_mode, a merge flag merge_flag, a merge index merge_idx, an inter prediction flag inter_pred_idc, a reference picture index refIdxLX, a prediction vector index mvp_LX_idx, and a difference There is a vector mvdLX.

(Example of reference picture list) Next, an example of the reference picture list will be described. The reference picture list is a sequence of reference pictures stored in the DPB 3061 (FIG. 9: details will be described later) of the decoded picture management unit 306. FIG. 4A is a conceptual diagram illustrating an example of a reference picture list. In the reference picture list 601, five rectangles arranged in a line on the left and right indicate reference pictures, respectively. The codes P1, P2, Q0, P3, and P4 shown in order from the left end to the right are codes indicating respective reference pictures. P such as P1 indicates the viewpoint P, and Q of Q0 indicates a viewpoint Q different from the viewpoint P. The subscripts P and Q indicate the picture order number POC. A downward arrow directly below refIdxLX indicates that the reference picture index refIdxLX is an index that refers to the reference picture Q0 stored in the DPB 3061 of the decoded picture management unit 306.

(Example of reference picture) Next, an example of a reference picture used to derive a vector will be described. FIG. 5 is a conceptual diagram illustrating an example of a reference picture. In FIG. 5, the horizontal axis indicates the display time, and the vertical axis indicates the viewpoint. The rectangles shown in FIG. 5 with 2 rows and 3 columns (6 in total) indicate pictures. Among the six rectangles, the rectangle in the second column from the left in the lower row indicates a picture to be decoded (target picture), and the remaining five rectangles indicate reference pictures. A reference picture Q0 indicated by an upward arrow from the target picture is a picture that has the same display time as the target picture and a different viewpoint. In the displacement prediction based on the target picture, the reference picture Q0 is used. A reference picture P1 indicated by a left-pointing arrow from the target picture is a past picture at the same viewpoint as the target picture. A reference picture P2 indicated by a right-pointing arrow from the target picture is a future picture at the same viewpoint as the target picture. In motion prediction based on the target picture, the reference picture P1 or P2 is used.

(Inter prediction flag and prediction list use flag) The relationship between the inter prediction flag and the prediction list use flags predFlagL0 and predFlagL1 can be mutually converted as follows. Therefore, as an inter prediction parameter, a prediction list use flag may be used, or an inter prediction flag may be used. In addition, hereinafter, the determination using the prediction list use flag may be replaced with the inter prediction flag. Conversely, the determination using the inter prediction flag can be performed by replacing the prediction list use flag.

Inter prediction flag = (predFlagL1 << 1) + predFlagL0
predFlagL0 = Inter prediction flag & 1
predFlagL1 = Inter prediction flag >> 1
Here, >> is a right shift, and << is a left shift.

(Merge prediction and AMVP prediction) The prediction parameter decoding (encoding) method includes merge prediction (merge) mode and AMVP (Adaptive 、 Motion Vector Prediction, adaptive motion vector prediction) mode. The merge flag merge_flag identifies these It is a flag to do. In both the merge prediction mode and the AMVP mode, the prediction parameter of the target PU is derived using the prediction parameter of the already processed block. The merge prediction mode is a mode that uses the prediction parameter already derived without including the prediction list use flag predFlagLX (inter prediction flag inter_pred_idcinter_pred_idc), the reference picture index refIdxLX, and the vector mvLX in the encoded data. In this mode, the prediction flag inter_pred_idcinter_pred_idc, the reference picture index refIdxLX, and the vector mvLX are included in the encoded data. The vector mvLX is encoded as a prediction vector index mvp_LX_idx indicating a prediction vector and a difference vector (mvdLX).

The inter prediction flag inter_pred_idc is data indicating the type and number of reference pictures, and takes one of the values Pred_L0, Pred_L1, and Pred_Bi. Pred_L0 and Pred_L1 indicate that reference pictures stored in reference picture lists called an L0 list and an L1 list are used, respectively, and that both use one reference picture (single prediction). Prediction using the L0 list and the L1 list are referred to as L0 prediction and L1 prediction, respectively. Pred_Bi indicates that two reference pictures are used (bi-prediction), and indicates that two reference pictures stored in the L0 list and the L1 list are used. The prediction vector index mvp_LX_idx is an index indicating a prediction vector, and the reference picture index refIdxLX is an index indicating a reference picture stored in the reference picture list. Note that LX is a description method used when L0 prediction and L1 prediction are not distinguished. By replacing LX with L0 and L1, parameters for the L0 list and parameters for the L1 list are distinguished. For example, refIdxL0 is a reference picture index used for L0 prediction, refIdxL1 is a reference picture index used for L1 prediction, and refIdx (refIdxLX) is a notation used when refIdxL0 and refIdxL1 are not distinguished.

The merge index merge_idx is an index indicating which one of the prediction parameter candidates (merge candidates) derived from the processed block is used as the prediction parameter of the decoding target block.

(Motion vector and displacement vector) The vector mvLX includes a motion vector and a displacement vector (disparity vector). A motion vector is a positional shift between the position of a block in a picture at a certain display time of a layer and the position of the corresponding block in a picture of the same layer at a different display time (for example, an adjacent discrete time). It is a vector which shows. The displacement vector is a vector indicating a positional shift between the position of a block in a picture at a certain display time of a certain layer and the position of a corresponding block in a picture of a different layer at the same display time. The pictures in different layers may be pictures from different viewpoints or pictures with different resolutions. In particular, a displacement vector corresponding to pictures of different viewpoints is called a disparity vector. In the following description, when a motion vector and a displacement vector are not distinguished, they are simply referred to as a vector mvLX. A prediction vector and a difference vector related to the vector mvLX are referred to as a prediction vector mvpLX and a difference vector mvdLX, respectively. Whether the vector mvLX and the difference vector mvdLX are motion vectors or displacement vectors is determined using a reference picture index refIdxLX attached to the vectors.

Next, the configuration of the image decoding device 31 according to the present embodiment will be described. FIG. 6 is a schematic diagram illustrating a configuration of the image decoding device 31 according to the present embodiment. The image decoding device 31 includes an entropy decoding unit 301, a prediction parameter decoding unit 302, a decoded picture management unit (reference image storage unit, frame memory) 306, a predicted image generation unit 308, an inverse quantization / inverse DCT unit 311, and an addition unit 312 and a residual storage unit 313 (residual recording unit).

The prediction parameter decoding unit 302 includes an inter prediction parameter decoding unit 303 and an intra prediction parameter decoding unit 304. The predicted image generation unit 308 includes an inter predicted image generation unit 309 and an intra predicted image generation unit 310.

The entropy decoding unit 301 performs entropy decoding on the encoded stream Te input from the outside, and separates and decodes individual codes (syntax elements). The separated codes include prediction information for generating a prediction image and residual information for generating a difference image.

The entropy decoding unit 301 outputs a part of the separated code to the prediction parameter decoding unit 302. Some of the separated codes are, for example, prediction mode PredMode, split mode part_mode, merge flag merge_flag, merge index merge_idx, inter prediction flag inter_pred_idcinter_pred_idc, reference picture index refIdxLX, prediction vector index mvp_LX_idx, difference vector mvdLX, residual prediction flag iv_res_pred_weight_idx. Control of which code to decode is performed based on an instruction from the prediction parameter decoding unit 302. The entropy decoding unit 301 outputs the quantization coefficient to the inverse quantization / inverse DCT unit 311. The quantization coefficient is a coefficient obtained by performing quantization by performing DCT (Discrete Cosine Transform) on the residual signal in the encoding process.

The inter prediction parameter decoding unit 303 refers to the prediction parameter stored in the prediction parameter memory 3067 of the decoded picture management unit 306 based on the code input from the entropy decoding unit 301 and decodes the inter prediction parameter.

The inter prediction parameter decoding unit 303 outputs the decoded inter prediction parameters to the prediction image generation unit 308 and stores them in the prediction parameter memory 3067 of the decoded picture management unit 306. Details of the inter prediction parameter decoding unit 303 will be described later.

The intra prediction parameter decoding unit 304 refers to the prediction parameter stored in the prediction parameter memory 3067 of the decoded picture management unit 306 based on the code input from the entropy decoding unit 301 and decodes the intra prediction parameter. The intra prediction parameter is a parameter used in a process of predicting a picture block within one picture, for example, an intra prediction mode IntraPredMode. The intra prediction parameter decoding unit 304 outputs the decoded intra prediction parameter to the prediction image generation unit 308 and stores it in the prediction parameter memory 3067 of the decoded picture management unit 306.

The intra prediction parameter decoding unit 304 may derive different intra prediction modes depending on luminance and color difference. In this case, the intra prediction parameter decoding unit 304 decodes the luminance prediction mode IntraPredModeY as the luminance prediction parameter and the color difference prediction mode IntraPredModeC as the color difference prediction parameter. The luminance prediction mode IntraPredModeY is a 35 mode and corresponds to planar prediction (0), DC prediction (1), and direction prediction (2 to 34). The color difference prediction mode IntraPredModeC uses one of planar prediction (0), DC prediction (1), direction prediction (2, 3, 4), and LM mode (5).

The decoded picture management unit 306 stores the reference picture block (reference picture block) generated by the addition unit 312 in the DPB 3061 at a predetermined position for each decoding target picture and block, and at a predetermined timing. A decoded viewpoint image Td in which blocks are integrated for each picture is output to the outside.

Also, the decoded picture management unit 306 stores the prediction parameter in the prediction parameter memory 3067 at a predetermined position for each picture and block to be decoded. Details of the decoded picture management unit 306 will be described later with reference to FIG.

The prediction image generation unit 308 receives the prediction mode predMode input from the entropy decoding unit 301 and the prediction parameter from the prediction parameter decoding unit 302. Further, the predicted image generation unit 308 reads the reference picture from the DPB 3061 of the decoded picture management unit 306. The predicted image generation unit 308 generates a predicted picture block P (predicted image) using the input prediction parameter and the read reference picture in the prediction mode indicated by the prediction mode predMode.

Here, when the prediction mode predMode indicates the inter prediction mode, the inter prediction image generation unit 309 uses the inter prediction parameter input from the inter prediction parameter decoding unit 303 and the read reference picture to perform the prediction picture block P by inter prediction. Is generated. The predicted picture block P corresponds to the prediction unit PU. The PU corresponds to a part of a picture composed of a plurality of pixels as a unit for performing the prediction process as described above, that is, a decoding target block on which the prediction process is performed at a time.

The inter prediction image generation unit 309 decodes a reference picture list (RPS: ReferenceＳPicture List) (L0 list or L1 list) whose prediction list use flag predFlagLX is 1 from a reference picture indicated by a reference picture index refIdxLX. The reference picture block at the position indicated by the vector mvLX is read from the DPB 3061 of the decoded picture management unit 306 with reference to the target block. The inter prediction image generation unit 309 performs prediction on the read reference picture block to generate a prediction picture block P. The inter prediction image generation unit 309 outputs the generated prediction picture block P to the addition unit 312.

When the prediction mode predMode indicates the intra prediction mode, the intra predicted image generation unit 310 performs intra prediction using the intra prediction parameter input from the intra prediction parameter decoding unit 304 and the read reference picture. Specifically, the intra-predicted image generation unit 310 extracts a reference picture block that is a decoding target picture and is in a predetermined range from the decoding target block among the already decoded blocks from the DPB 3061 of the decoded picture management unit 306. read out. The predetermined range is, for example, any of the left, upper left, upper, and upper right adjacent blocks when the decoding target block sequentially moves in a so-called raster scan order, and varies depending on the intra prediction mode. The raster scan order is an order in which each row is sequentially moved from the left end to the right end in each picture from the upper end to the lower end.

The intra predicted image generation unit 310 performs prediction in the prediction mode indicated by the intra prediction mode IntraPredMode for the read reference picture block, and generates a predicted picture block. The intra predicted image generation unit 310 outputs the generated predicted picture block P to the addition unit 312.

When the intra prediction parameter decoding unit 304 derives an intra prediction mode different in luminance and color difference, the intra prediction image generation unit 310 performs planar prediction (0), DC prediction (1), direction according to the luminance prediction mode IntraPredModeY. A prediction picture block of luminance is generated according to any of prediction (2 to 34), and planar prediction (0), DC prediction (1), direction prediction (2, 3, 4), LM according to the color difference prediction mode IntraPredModeC A color difference prediction picture block is generated in any one of modes (5).

The inverse quantization / inverse DCT unit 311 inversely quantizes the quantization coefficient input from the entropy decoding unit 301 to obtain a DCT coefficient. The inverse quantization / inverse DCT unit 311 performs inverse DCT (Inverse Discrete Cosine Transform) on the obtained DCT coefficient to calculate a decoded residual signal. The inverse quantization / inverse DCT unit 311 outputs the calculated decoded residual signal to the addition unit 312 and the residual storage unit 313.

The adder 312 outputs the prediction picture block P input from the inter prediction image generation unit 309 and the intra prediction image generation unit 310 and the signal value of the decoded residual signal input from the inverse quantization / inverse DCT unit 311 for each pixel. Addition to generate a reference picture block. The adding unit 312 stores the generated reference picture block (ie, decoded picture) in the DPB 3061 of the decoded picture management unit 306.

[Configuration of Inter Prediction Parameter Decoding Unit] Next, the configuration of the inter prediction parameter decoding unit 303 will be described. FIG. 7 is a schematic diagram illustrating a configuration of the inter prediction parameter decoding unit 303 according to the present embodiment. The inter prediction parameter decoding unit 303 includes an inter prediction parameter decoding control unit 3031, an AMVP prediction parameter derivation unit 3032, an addition unit 3035, and a merge prediction parameter derivation unit 3036.

[[Inter Prediction Parameter Decoding Control Unit]] The inter prediction parameter decoding control unit 3031 instructs the entropy decoding unit 301 to decode a code (syntax element) related to inter prediction, and the code (synth For example, a partition mode part_mode, a merge flag merge_flag, a merge index merge_idx, an inter prediction flag inter_pred_idcinter_pred_idc, a reference picture index refIdxLX, a prediction vector index mvp_LX_idx, a difference vector mvdLX, and a residual prediction flag iv_res_pred_weight_idx are extracted.

The inter prediction parameter decoding control unit 3031 first extracts a merge flag from the encoded data. When the inter prediction parameter decoding control unit 3031 expresses that a certain syntax element is to be extracted, it means that the entropy decoding unit 301 is instructed to decode a certain syntax element, and the corresponding syntax element is read from the encoded data. To do. Here, when the value indicated by the merge flag is 1, that is, indicates the merge prediction mode, the inter prediction parameter decoding control unit 3031 extracts the merge index merge_idx as a prediction parameter related to merge prediction. The inter prediction parameter decoding control unit 3031 outputs the extracted merge index merge_idx to the merge prediction parameter derivation unit 3036.

When the merge flag merge_flag is 0, that is, indicates the AMVP prediction mode, the inter prediction parameter decoding control unit 3031 uses the entropy decoding unit 301 to extract the AMVP prediction parameter from the encoded data. Examples of AMVP prediction parameters include an inter prediction flag inter_pred_idc, a reference picture index refIdxLX, a vector index mvp_LX_idx, and a difference vector mvdLX. The inter prediction parameter decoding control unit 3031 outputs the prediction list use flag predFlagLX derived from the extracted inter prediction flag inter_pred_idcinter_pred_idc and the reference picture index refIdxLX to the AMVP prediction parameter derivation unit 3032 and the prediction image generation unit 308 (FIG. 6). Further, it is stored in the DPB 3061 (FIG. 9) of the decoded picture management unit 306. The inter prediction parameter decoding control unit 3031 outputs the extracted vector index mvp_LX_idx to the AMVP prediction parameter derivation unit 3032. The inter prediction parameter decoding control unit 3031 outputs the extracted difference vector mvdLX to the addition unit 3035.

[[Merge Prediction Parameter Deriving Unit]] FIG. 23 is a schematic diagram showing the configuration of the merge prediction parameter deriving unit 3036 according to this embodiment. The merge prediction parameter derivation unit 3036 includes a merge candidate derivation unit 30361 and a merge candidate selection unit 30362. The merge candidate derivation unit 30361 includes a merge candidate storage unit 303611, an extended merge candidate derivation unit 303612, and a basic merge candidate derivation unit 303613.

The merge candidate storage unit 303611 stores the merge candidates input from the extended merge candidate derivation unit 303612 and the basic merge candidate derivation unit 303613. The merge candidate includes a prediction list use flag predFlagLX, a vector mvLX, and a reference picture index refIdxLX. In the merge candidate storage unit 303611, an index is assigned to the stored merge candidates according to a predetermined rule. For example, “0” is assigned as an index to the merge candidate input from the extended merge candidate derivation unit 303612.

The extended merge candidate derivation unit 303612 includes a displacement vector acquisition unit 3036122, an inter-layer merge candidate derivation unit 3036121, a displacement merge candidate derivation unit 3036123, and a BVSP merge candidate derivation unit 3036124 (not shown).

The displacement vector acquisition unit 3036122 first acquires displacement vectors in order from a plurality of candidate blocks adjacent to the decoding target block (for example, blocks adjacent to the left, upper, and upper right). Specifically, the displacement vector acquisition unit 3036122 selects one of the candidate blocks, and uses the reference picture index refIdxLX of the candidate block to determine whether the selected candidate block vector is a displacement vector or a motion vector. If there is a displacement vector determined using a reference layer determination function (described later), it is set as the displacement vector. If there is no displacement vector in the candidate block, the next candidate block is scanned in order. When there is no displacement vector in the adjacent block, the displacement vector acquisition unit 3036122 attempts to acquire the displacement vector of the block at the position corresponding to the target block of the block included in the reference picture in the temporally different display order. When the displacement vector cannot be acquired, the displacement vector acquisition unit 3036122 sets a zero vector as the displacement vector. The displacement vector acquisition unit 3036122 outputs the displacement vector to the inter-layer merge candidate derivation unit 3036121 and the displacement merge candidate derivation unit.

The inter-layer merge candidate derivation unit 3036121 receives the displacement vector from the displacement vector acquisition unit 3036122. The inter-layer merge candidate derivation unit 3036121 selects a block indicated only by the displacement vector input from the displacement vector acquisition unit 3036122 from a picture having the same POC as the decoding target picture of another layer (eg, base layer, base view). The prediction parameter, which is a motion vector included in the block, is read from the prediction parameter memory 307. More specifically, the prediction parameter read by the inter-layer merge candidate derivation unit 3036121 is a prediction parameter of a block including coordinates obtained by adding a displacement vector to the coordinates of the starting point when the center point of the target block is the starting point. .
The coordinates (xRef, yRef) of the reference block are the coordinates of the target block (xP, yP), the displacement vector (mvDisp [0], mvDisp [1]), and the width and height of the target block are nPSW, nPSH. Is derived by the following equation.

xRef = Clip3 (0, PicWidthInSamples _L -1, xP + ((nPSW-1) >> 1) + ((mvDisp [0] + 2) >> 2))
yRef = Clip3 (0, PicHeightInSamples _L -1, yP + ((nPSH-1) >> 1) + ((mvDisp [1] + 2) >> 2))
Note that the inter-layer merge candidate derivation unit 3036121 determines whether the prediction parameter is a motion vector as false (not a displacement vector) in a determination method using a reference layer determination function described later included in the inter prediction parameter decoding control unit 3031. Judgment is made by the method. The inter-layer merge candidate derivation unit 3036121 outputs the read prediction parameter as a merge candidate to the merge candidate storage unit 303611. Further, when the prediction parameter cannot be derived, the inter-layer merge candidate deriving unit 3036121 outputs that fact to the displacement merge candidate deriving unit. This merge candidate is a motion prediction inter-layer candidate (inter-view candidate) and is also described as an inter-layer merge candidate (motion prediction).

The displacement merge candidate derivation unit 3036123 receives the displacement vector from the displacement vector acquisition unit 3036122. The displacement merge candidate derivation unit 3036123 merges the input displacement vector and the reference picture index refIdxLX of the previous layer image indicated by the displacement vector (for example, the index of the base layer image having the same POC as the decoding target picture) as a merge candidate. The data is output to the candidate storage unit 303611. This merge candidate is a displacement prediction inter-layer candidate (inter-view candidate) and is also described as an inter-layer merge candidate (displacement prediction).

The BVSP merge candidate derivation unit 3036124 derives block view synthesis prediction (Block View Synthesis Prediction) merge candidates. A BVSP merge candidate is a type of displacement merge candidate that generates a predicted image from another viewpoint image, but is a merge candidate that divides the PU into smaller blocks and performs a predicted image generation process.

The basic merge candidate derivation unit 303613 includes a spatial merge candidate derivation unit 3036131, a temporal merge candidate derivation unit 3036132, a merge merge candidate derivation unit 3036133, and a zero merge candidate derivation unit 3036134.

The spatial merge candidate derivation unit 3036131 reads the prediction parameters (prediction list use flag predFlagLX, vector mvLX, reference picture index refIdxLX) stored in the prediction parameter memory 307 according to a predetermined rule, and uses the read prediction parameters as merge candidates. To derive. The prediction parameter to be read is a prediction parameter relating to each of the blocks within a predetermined range from the decoding target block (for example, all or a part of the blocks in contact with the lower left end, upper left upper end, and upper right end of the decoding target block, respectively). is there. The derived merge candidates are stored in the merge candidate storage unit 303611.

The temporal merge candidate derivation unit 3036132 reads the prediction parameter of the block in the reference image including the lower right coordinate of the decoding target block from the prediction parameter memory 307 and sets it as a merge candidate. The reference picture designation method may be, for example, the reference picture index refIdxLX designated in the slice header, or may be designated using the smallest reference picture index refIdxLX of the block adjacent to the decoding target block. . The derived merge candidates are stored in the merge candidate storage unit 303611.

The merge merge candidate derivation unit 3036133 derives merge merge candidates by combining two different derived merge candidate vectors and reference picture indexes already derived and stored in the merge candidate storage unit 303611 as L0 and L1 vectors, respectively. To do. The derived merge candidates are stored in the merge candidate storage unit 303611.

The zero merge candidate derivation unit 3036134 derives a merge candidate in which the reference picture index refIdxLX is 0 and both the X component and the Y component of the vector mvLX are 0. The derived merge candidates are stored in the merge candidate storage unit 303611.

FIG. 8 shows an example of merge candidates derived by the merge candidate deriving unit 30361. If two merge candidates have the same prediction parameter, excluding the processing of reducing the order, the merge index order, layer merge candidate (lower left), spatial merge candidate (upper right), spatial merge candidate (upper right), Displacement merge candidates, BVSP merge candidates, spatial merge candidates (lower left), spatial merge candidates (upper left), and temporal merge candidates. Further, there are a merge merge candidate and a zero merge candidate thereafter, which are omitted in FIG.

The merge candidate selection unit 30362 selects, from the merge candidates stored in the merge candidate storage unit 303611, a merge candidate to which an index corresponding to the merge index merge_idx input from the inter prediction parameter decoding control unit 3031 is assigned. As an inter prediction parameter. The merge candidate selection unit 30362 stores the selected merge candidate in the prediction parameter memory 3067 (FIG. 9) of the decoded picture management unit 306 and outputs it to the prediction image generation unit 308 (FIG. 6).

[[[About Displacement Vector Acquisition Unit]]] Details of the displacement vector acquisition unit 3036122 will be described below. When the block adjacent to the target PU has a displacement vector, the displacement vector acquisition unit 3036122 extracts the displacement vector from the prediction parameter memory 3067 of the decoded picture management unit 306, refers to the prediction parameter memory 3067, A prediction flag predFlagLX, a reference picture index refIdxLX, and a vector mvLX of a block adjacent to the PU are read out. The displacement vector acquisition unit 3036122 has a reference layer determination function. The displacement vector acquisition unit 3036122 sequentially reads prediction parameters of blocks adjacent to the target PU, and determines whether or not the adjacent block has a displacement vector from the reference picture index of the adjacent block by the reference layer determination function. When the adjacent block includes a displacement vector, the displacement vector acquisition unit 3036122 outputs the displacement vector. If there is no displacement vector in the prediction parameter of the adjacent block, the zero vector is output as the displacement vector.

(Reference layer determination function) The reference layer determination function of the displacement vector acquisition unit 3036122 is based on the input reference picture index refIdxLX, and reference layer information reference_layer_info indicating the relationship between the reference picture indicated by the reference picture index refIdxLX and the target picture Is to determine. Reference layer information reference_layer_info is information indicating whether the vector mvLX to the reference picture is a displacement vector or a motion vector.

Prediction when the target picture layer and the reference picture layer are the same layer is called the same layer prediction, and the vector obtained in this case is a motion vector. Prediction when the target picture layer and the reference picture layer are different layers is called inter-layer prediction, and the vector obtained in this case is a displacement vector.

[[AMVP Prediction Parameter Deriving Unit]] The AMVP prediction parameter deriving unit 3032 is a vector (motion vector or displacement vector) stored in the prediction parameter memory 3067 (FIG. 9) of the decoded picture management unit 306 based on the reference picture index refIdx. ) As a vector candidate mvpLX. The vector read out by the AMVP prediction parameter derivation unit 3032 is a block within a predetermined range from the decoding target block (for example, all or a part of the blocks in contact with the lower left end, upper left upper end, and upper right end of the decoding target block, respectively). It is a vector concerning.

The AMVP prediction parameter derivation unit 3032 selects a vector candidate indicated by the vector index mvp_LX_idx input from the inter prediction parameter decoding control unit 3031 among the read vector candidates as a prediction vector mvpLX. The AMVP prediction parameter derivation unit 3032 outputs the selected prediction vector mvpLX to the addition unit 3035.

FIG. 4B is a conceptual diagram showing an example of vector candidates. A prediction vector list 602 illustrated in FIG. 4B is a list including a plurality of vector candidates derived by the AMVP prediction parameter deriving unit 3032. In the prediction vector list 602, five rectangles arranged in a line on the left and right indicate areas indicating prediction vectors, respectively. The downward arrow directly below the second mvp_LX_idx from the left end and mvpLX below the mvp_LX_idx indicate that the vector index mvp_LX_idx is an index referring to the vector mvpLX in the prediction parameter memory 3067 of the decoded picture management unit 306.

The candidate vector is a block for which the decoding process has been completed, and is generated based on a vector related to the referenced block with reference to a block (for example, an adjacent block) in a predetermined range from the decoding target block. The adjacent block has a block that is spatially adjacent to the target block, for example, the left block and the upper block, and a block that is temporally adjacent to the target block, for example, the same position as the target block, and has a different display time. Contains blocks derived from blocks.

The addition unit 3035 adds the prediction vector mvpLX input from the prediction vector selection unit 3034 and the difference vector mvdLX input from the inter prediction parameter decoding control unit to calculate a vector mvLX. The adding unit 3035 outputs the calculated vector mvLX to the predicted image generation unit 308 (FIG. 6).

[Decoded picture management unit] The detailed configuration of the decoded picture management unit 306 will be described with reference to FIG. FIG. 9 is a functional block diagram illustrating the configuration of the decoded picture management unit 306.

As shown in FIG. 9, the decoded picture management unit 306 includes a DPB 3061, an RPS derivation unit 3062, a reference picture control unit 3063, a reference layer picture control unit 3064, an RPL derivation unit 3065, an output control unit 3066, and a prediction parameter memory 3067. .

[[DPB 3061]] The DPB 3061 is also called a decoded picture buffer (DecodedDePicture Buffer), and records the decoded picture of each picture of the target layer output from the adding unit 312. In the DPB, the decoded pictures corresponding to the pictures in the target layer are recorded in association with the output order (POC: “Picture” Order “Count”). In addition, a reference mark and an output mark can be set for each DPB picture.

The reference mark is information indicating whether or not the picture on the DPB can be used for predicted image generation processing (for example, inter prediction or inter-layer image prediction) in decoding processing after the target picture. Specifically, reference marks are "short-term reference use" ("used for-short-term reference"), "long-term reference use" ("used for-long-term reference"), "reference not used" ("not usedfor One of the values of “reference”).

In addition, although the value which a reference mark can take is set as the above, it is not restricted to it. For example, a reference mark may be set to the value of “inter-layer reference use” (“used for inter-layer reference”). Further, without distinguishing between “short-term reference use” and “long-term reference use”, the union of both may be defined as “reference use” (“used for reference”).

The output mark is information indicating whether or not it is necessary to output the picture on the DPB to the outside. Specifically, the output mark takes one of the values “output required” (“needed for output”) and “output unnecessary” (“not needed for output”). Note that the reference mark and the output mark may not be explicitly set at a specific timing of the decoding process or the encoding process. In this case, the reference mark or the output mark is determined as “undefined”.

[[RPS Deriving Unit 3062]] The RPS deriving unit 3062 derives an RPS (Reference Picture Set) to be used for decoding the target picture based on the input syntax value, and the base reference picture control unit 144 To the reference picture control unit 143 and the RPL deriving unit 3065.

RPS generally represents a set of reference pictures that can be used in decoding processing of a target picture or decoding processing of pictures following the target picture in decoding order.

(Sub RPS definition) RPS can be divided into multiple sub RPSs based on the nature of the reference picture. In the present embodiment, the RPS is composed of the following five types of sub-RPSs.

(1) Forward short-term RPS: a reference picture specified by a relative position in display order with respect to the target picture, including a reference picture that belongs to the same layer as the target picture and whose display order is earlier than the target picture Sub RPS.

(2) Rear short-term RPS: a reference picture specified by a relative position in the display order with respect to the target picture, which belongs to the same layer as the target picture and includes a reference picture whose display order is after the target picture Sub RPS.

(3) Long-term RPS: A sub-RPS that is a reference picture specified by an absolute position in display order and includes a reference picture belonging to the same layer as the target picture.

(4) Inter-layer pixel RPS: a sub-RPS that includes a reference picture that belongs to a different layer from the target picture and whose pixel value is referred to in inter-layer prediction.

(5) Inter-layer motion limited RPS: a sub-RPS that includes a reference picture that belongs to a layer different from the target picture and that does not refer to a pixel value by referring to motion information in inter-layer prediction.

In the following, the union of forward short-term RPS and backward short-term RPS is also referred to as short-term RPS. That is, the short-term RPS is a reference picture specified by a relative position in display order with respect to the target picture, and includes a picture belonging to the same layer as the target picture. The union of inter-layer pixel RPS and inter-layer motion limited RPS is also referred to as inter-layer RPS. That is, the inter-layer RPS includes a reference picture (inter-layer reference picture) belonging to a layer different from the target picture.

(Sub-RPS derivation) RPS derivation in the RPS derivation unit 3062 is executed for each sub-RPS. Hereinafter, the related syntax will be described for each sub RPS described above, and the process for deriving the sub RPS from the syntax will be described.

(Short-term RPS) The syntax for short-term RPS (forward short-term RPS and backward short-term RPS) is SPS short-term RPS information that is short-term reference picture set information included in the SPS, and short-term reference picture set information included in the slice header. There is SH short-term RPS information.

(SPS short-term RPS information) The SPS short-term RPS information includes information on a plurality of short-term RPS candidates that can be selected as short-term SPS in each picture referring to the SPS. Note that the short-term RPS is a set of pictures that can be a reference picture (short-term reference picture) specified by a relative position with respect to the target picture (for example, a POC difference from the target picture).

SPS short-term RPS information will be described with reference to FIG. FIG. 16 illustrates a part of the SPS syntax table used at the time of SPS decoding. The part (A) in FIG. 16 corresponds to SPS short-term RPS information. The SPS short-term RPS information includes the number of short-term RPS included in the SPS (num_short_term_ref_pic_sets) and the definition of each short-term RPS (short_term_ref_pic_set (i)).

The short-term RPS information will be described with reference to FIG. FIG. 17A illustrates a short-term RPS syntax table used at the time of SPS decoding and slice header decoding.

The short-term RPS information includes the number of short-term reference pictures whose display order is earlier than the target picture (num_negative_pics) and the number of short-term reference pictures whose display order is later than the target picture (num_positive_pics). In the following, a short-term reference picture whose display order is earlier than the target picture is referred to as a front short-term reference picture, and a short-term reference picture whose display order is later than the target picture is referred to as a rear short-term reference picture.

The short-term RPS information includes, for each forward short-term reference picture, the absolute value of the POC difference with respect to the target picture (delta_poc_s0_minus1 [i]), and the presence / absence of use as a reference picture of the target picture (used_by_curr_pic_s0_flag [ i]). In addition, for each backward short-term reference picture, the absolute value of the POC difference with respect to the target picture (delta_poc_s1_minus1 [i]) and the possibility of being used as the reference picture of the target picture (used_by_curr_pic_s1_flag [i]) are included It is.

(SH short-term RPS information) The SH short-term RPS information includes information on a single short-term RPS that can be used from a picture referring to a slice header.

Decoding of SH short-term RPS information will be described with reference to FIG. FIG. 17B illustrates a part of the slice header syntax table used when decoding the slice header. The part (A) in FIG. 17B corresponds to the SH short-term RPS information. The SH short-term RPS information includes a flag (short_term_ref_pic_set_sps_flag) indicating whether the short-term RPS is selected from among the short-term RPS candidates decoded by the SPS or explicitly included in the slice header. When selecting from short-term RPS candidates, an identifier (short_term_ref_pic_set_idx) for selecting any one short-term RPS candidate is included. When explicitly included in the slice header, information corresponding to the syntax table (short_term_ref_pic_set (idx)) described with reference to FIG. 17A is included in the SPS short-term RPS information.

(Short-term RPS derivation processing) Short-term RPS, that is, forward short-term RPS and backward short-term RPS are derived from the short-term RPS information. A subsequent reference short-term RPS is also derived.
Forward short RPS: A picture that can be referred to the current picture specified by the SPS short-term RPS information or the SH short-term RPS information, and includes a picture whose display order is earlier than the target picture.
Backward short-term RPS: A picture that can be referred to the current picture specified by SPS short-term RPS information or SH short-term RPS information, and includes a picture whose display order is later than the target picture.
Subsequent reference short-term RPS: includes pictures that are not referenced in the current picture but can be referenced in pictures that follow the current picture in decoding order.

A forward short-term RPS (ListStCurrBefore), a backward short-term RPS (ListStCurrAfter), and a subsequent reference short-term RPS (ListStFoll) are derived by the following procedure. Note that the front short-term RPS, the rear short-term RPS, and the subsequent reference short-term RPS are set to be empty before the start of the following processing.
(S101) Based on the SPS short-term RPS information and the SH short-term RPS information, short-term RPS information used for decoding the target picture is specified. Specifically, when the value of short_term_ref_pic_set_sps included in the SH short-term RPS information is 0, the short-term RPS explicitly transmitted in the slice header included in the SH short-term RPS information is selected. Other than that (when the value of short_term_ref_pic_set_sps is 1, the short-term RPS indicated by short_term_ref_pic_set_idx included in the SH short-term RPS information is selected from a plurality of short-term RPS candidates included in the SPS short-term RPS information.
(S102) The POC of each reference picture included in the selected short-term RPS is derived. When the reference picture belongs to the forward short-term RPS, the reference picture POC is derived by subtracting the value of “delta_poc_s0_minus1 [i] +1” from the POC of the target picture. On the other hand, when the reference picture belongs to the backward short-term RPS, it is derived by adding the value of “delta_poc_s1_minus1 [i] +1” to the POC of the target picture.
(S103) The forward reference pictures included in the short-term RPS are confirmed in the order of transmission, and when the associated used_by_curr_pic_s0_flag [i] value is 1, the forward reference picture is added to ListStCurrBefore. Otherwise (used_by_curr_pic_s0_flag [i] has a value of 0), the forward reference picture is added to ListStFoll.
(S104) The backward reference pictures included in the short-term RPS are confirmed in the order of transmission. When the value of used_by_curr_pic_s1_flag [i] associated with the backward reference picture is 1, the backward reference picture is added to ListStCurrAfter. In other cases (used_by_curr_pic_s1_flag [i] has a value of 0), the backward reference picture is added to ListStFoll.

(Long-term RPS) The syntax related to long-term RPS includes SPS long-term RPS information that is long-term reference picture information included in the SPS and SH long-term RPS information that is long-term reference picture information included in the slice header.

(SPS long-term RPS information) The SPS long-term RPS information includes information on a plurality of long-term reference pictures that can be used from each picture referring to the SPS. Note that the long-term reference picture is a reference picture specified by an absolute position (for example, POC) in the sequence.

Decoding of SPS long-term RPS information will be described with reference to FIG. 16 again. The part (B) in FIG. 16 corresponds to the SPS long-term RPS information. The SPS long-term RPS information includes a flag (long_term_ref_pics_present_flag) indicating whether or not long-term reference pictures are transmitted in SPS, the number of long-term reference pictures (num_long_term_ref_pics_sps) transmitted in SPS, and information on each long-term reference picture. The long-term reference picture information includes the POC of the reference picture (lt_ref_pic_poc_lsb_sps [i]) and the presence / absence of the possibility that the long-term reference picture is referenced in the target picture (used_by_curr_pic_lt_sps_flag [i]).

The POC of the reference picture may be the POC value associated with the reference picture, or the LSB of the POC (Least Significant Bit), that is, the POC divided by a predetermined power of two. The remainder value may be used.

(SH long-term RPS information) The SH long-term RPS information includes information on a long-term reference picture that can be used from a picture that references a slice header.

Decoding of SH long-term RPS information will be described with reference again to FIG. The part (B) of FIG. 17B corresponds to the SH long-term RPS information. The SH long-term RPS information is included in the slice header when use of a long-term reference picture is indicated by a flag (long_term_ref_pic_present_flag). When the SPS long-term RPS information includes one or more long-term reference pictures (num_long_term_ref_pics_sps> 0), the number of reference pictures (num_long_term_sps) that can be referred to by the target picture among the long-term reference pictures decoded by SPS is the SH long-term RPS information. included. In addition, the number of long-term reference pictures (num_long_term_pics) explicitly transmitted in the slice header is included in the SH long-term RPS information. In addition, information (lt_idx_sps [i]) for selecting the num_long_term_sps number of long-term reference pictures from the long-term reference pictures included in the SPS long-term RPS information is included in the SH long-term RPS information. Furthermore, as information on long-term reference pictures to be explicitly included in the slice header, the number of reference pictures POC (poc_lsb_lt [i]) and the presence / absence of use as a reference picture of the target picture (used_by_curr_pic_lt_flag) [i]) is included.

(Long-term RPS derivation process) A long-term RPS is derived from long-term RPS information. A subsequent reference long-term RPS is also derived.
Long-term RPS: Contains a picture that can be referred to the current picture specified by SPS long-term RPS information or SH long-term RPS information.
Subsequent reference long-term RPS: includes a reference picture that is not referenced in the current picture but can be referenced in a picture that follows the current picture in decoding order.

The long-term RPS (ListLtCurr) and subsequent reference long-term RPS (ListLtFoll) are derived by the following procedure. Note that the long-term RPS and the subsequent reference long-term RPS are set to be empty before starting the following processing. (S201) Based on the SPS long-term RPS information and the SH long-term RP information, the long-term reference picture used for decoding the current picture is specified. Specifically, num_long_term_sps reference pictures are selected from the reference pictures included in the SPS long-term RPS information and added to the long-term RPS. The selected reference picture is the reference picture indicated by lt_idx_sps [i].
(S202) Subsequently, reference pictures included in the SH long-term RPS information of the number num_long_term_pics are sequentially added to the long-term RPS.
(S203) The POC of each reference picture included in the long-term RPS is derived. The POC of the long-term reference picture is directly derived from the value of poc_lst_lt [i] or lt_ref_pic_poc_lsb_sps [i] decoded in association with each other.
(S204) The reference pictures included in the long-term RPS are checked in order, and when the value of associated used_by_curr_pic_lt_flag [i] or used_by_curr_pic_lt_sps_flag [i] is 1, the long-term reference picture is added to ListLtCurr. In other cases (used_by_curr_pic_lt_flag [i] or used_by_curr_pic_lt_sps_flag [i] is 0), the long-term reference picture is added to the subsequent reference long-term RPS (ListLtFoll).

(Inter-layer RPS) The syntax related to inter-layer pixel RPS and inter-layer motion limited RPS includes IL-RPS information (inter-layer RPS information, inter-layer reference picture set information).

(IL-RPS information) The IL-RPS information includes information on an inter-layer reference picture that can be referred to in inter-layer prediction from a picture including a slice header.

The IL-RPS information will be described with reference to FIGS. 18A and 18B.

(A) in FIG. 18 is a part of a syntax table that is referred to when the VPS extension (vps_extension) included in the VPS is decoded, and corresponds to the IL-RPS information. As illustrated in FIG. 18A, the VPS includes max_one_active_ref_layer_flag, direct_dep_type_len_minus2, and direct_dependency_type [i] [j], which are syntaxes included in the IL-RPS information.

The syntax max_one_active_ref_layer_flag is a flag indicating whether the maximum value of the layer referred to when decoding an arbitrary picture of an arbitrary layer is 1 or less. When the maximum value is 1 or less, the flag value is set to 1. When the maximum value is other than (the maximum value is 2 or more), the flag value is set to 0.

The syntax direct_dep_type_len_minus2 is a value representing the number of bits of the syntax direct_dependency_type [i] [j]. The number of bits of direct_dependency_type [i] [j] is (direct_dep_type_len_minus2 + 2).

The syntax direct_dependency_type [i] [j] is a value indicating the type of inter-layer prediction that can be used when referring to the layer indicated by “j” from the layer indicated by “i”. In the following description, direct_dependency_type [i] [j] is also referred to as a dependency type when referring to a reference layer (layer j) from a target layer (layer i). Also, layer i and layer j are omitted, and are also referred to as dependency types (direct_dependency_type).

FIG. 19A shows the relationship between the dependency type and inter-layer prediction availability when there are inter-layer image prediction and inter-layer motion prediction as the types of inter-layer prediction. When the dependency type is “0”, it is pixel-dependent and motion-dependent. When the dependency type is “1”, it is pixel-dependent and not motion-dependent. When the dependency type is “2”, it is motion-dependent and not pixel-dependent.

Here, when the dependency type indicates pixel dependency, the target layer i can use the pixel of the reference layer j for prediction. For example, it can be used for inter-layer image prediction. When the dependency type indicates motion dependency, the target layer i can use the motion information (motion vector and reference picture index) of the reference layer j for prediction. For example, it can be used for inter-layer motion prediction. Inter-layer image prediction is a process of generating a predicted image of a target picture using pixel values of a decoded image of a reference layer. In addition, inter-layer motion prediction is processing for generating a predicted image of a target picture by directly or indirectly using reference layer motion information (motion vector, reference picture index, inter prediction type).

Therefore, when the dependency type is “0”, it means that both the decoded pixel of the reference layer (the pixel value of the decoded image) and the motion information may be used in the decoding process of the target picture. When the dependency type is “1”, it means that the decoded pixel of the reference layer may be used, but the motion information may not be used. When the dependency type is “2”, it means that the motion information of the reference layer may be used, but the decoded pixel may not be used.

When the dependency type indicates that the decoded pixel of the reference layer is referenced (in the above definition, the dependency type is “0” or “1”), it can be expressed that the dependency type indicates pixel dependency. On the other hand, when the dependency type indicates that the decoded pixel of the reference layer is not referred to (in the above definition, the dependency type is “2”), it can be expressed that the dependency type indicates pixel independence.

Similarly, when the dependency type indicates that the motion information of the reference layer is referred to (in the above definition, the dependency type is “0” or “2”), it can be expressed that the dependency type indicates motion dependency. . On the other hand, when the dependency type indicates that the decoded pixel of the reference layer is not referred to (in the above definition, the dependency type is “2”), it can be expressed that the dependency type indicates motion independence.

For example, a pixel dependency flag SampleEnableFlag [i] [j] indicating whether or not the target layer i uses a decoded pixel of the reference layer j (pixel dependency), and whether or not the target layer i uses motion information of the reference layer j ( The motion dependency flag MotionEnableFlag [i] [j] indicating motion dependency can be derived from the following equation.

SampleEnableFlag [i] [j] = (3-direct_dependency_type [i] [j]) & 2) >> 1
MotionEnableFlag [i] [j] = (3-direct_dependency_type [i] [j]) & 1)
The meaning of the dependency type value is not necessarily limited to the above. For example, when the dependency type is “0”, pixel dependency and motion independence are indicated, when the dependency type is “1”, motion dependency and pixel independence are indicated, and when the dependency type is “2”, pixel dependency and It may indicate movement dependence.

In this example, the pixel dependency flag SampleEnableFlag [i] [j] and the motion dependency flag MotionEnableFlag [i] [j] are derived from the following equations.

SampleEnableFlag [i] [j] = (direct_dependency_type [i] [j] + 1) & 1)
MotionEnableFlag [i] [j] = (direct_dependency_type [i] [j] + 1) & 2) >> 1
In addition, in order to indicate that the target layer refers to other than the decoded pixel and motion information of the reference layer, a dependency type other than pixel dependency and motion dependency may be indicated depending on the dependency type. For example, the type of dependence can be block division information, transform coefficient information (such as the presence or absence of transform coefficients), loop filter information, and the like. Also in this case, the pixel dependence flag and the motion dependence flag can be derived by the above-described formulas and the like, and flags indicating whether or not there is an additional dependence (for example, XXX dependence flag XXXEnableFlag, XXX is a block Division information, transform coefficient information, loop filter information, etc.) can be derived from the following equations.

XXXEnableFlag [i] [j] = (direct_dependency_type [i] [j] + 1) & 4) >> 2
(B) of FIG. 18 is a part of the syntax table referred to when the slice header is decoded, and corresponds to the IL-RPS information.

The IL-RPS information includes an inter-layer prediction enabled flag (inter_layer_pred_enabled_flag). Furthermore, when the inter-layer prediction valid flag is 1 (inter-layer prediction is valid) and the number of reference layers that can be referred from the target picture (NumDirectRefLayers [nuh_layer_id]) is greater than 1, the number of inter-layer reference pictures is set. The expressing syntax (num_inter_layr_ref_pics_minus1) is included in the IL-RPS information. The number of reference pictures between active layers (NumActiveRefLayerPics) is set to a value of “num_inter_layer_ref_pics_minus1 + 1”. The number of reference pictures between active layers corresponds to the number of reference pictures between layers that can be referred to by inter-layer prediction in the target picture. In addition, a layer identifier (inter_layer_pred_layer_idc [i]) indicating a layer to which each inter-layer reference picture belongs is included in the IL-RPS information.

Note that each syntax included in the IL-RPS information may be omitted if it is obvious. For example, when the number of inter-layer reference pictures that can be referenced from one picture is limited to one, the syntax related to the number of inter-layer reference pictures is not necessary.

(Inter-layer RPS derivation process) Inter-layer RPS, that is, inter-layer pixel RPS and inter-layer motion limited RPS are derived from IL-RPS information.

Prior to the description of the derivation process, the relationship between the inter-layer pixel RPS, the inter-layer motion limited RPS, and the dependency type, which are derived by the RPS deriving unit 3062 according to the present embodiment, will be described with reference to FIG. FIG. 19B is a diagram illustrating the relationship between the sub RPS (inter-layer pixel RPS and inter-layer motion limited RPS) included in the inter-layer RPS and the dependency type.

As shown in FIG. 19B, the inter-layer RPS includes two sub-RPSs, an inter-layer pixel RPS and an inter-layer motion limited RPS. Hereinafter, an inter-layer reference picture whose layer identifier is x and whose dependency type is y is referred to as “LID = x, DT = y”. In the example of FIG. 19B, the inter-layer RPS includes three inter-layer reference pictures (“LID = 0, DT = 0”, “LID = 1, DT = 1”, “LID = 3, DT = 0” ")including. The inter-layer motion limited RPS includes two inter-layer reference pictures (“LID = 2, DT = 2”, “LID = 4, DT = 2”).

That is, the inter-layer pixel RPS includes a reference picture whose dependency type is “0” and an inter-layer reference picture whose dependency type is “1”. On the other hand, the inter-layer motion limited RPS includes an inter-layer reference picture whose dependency type is “2”. In other words, the inter-layer pixel RPS includes an inter-layer reference picture that can refer to a decoded pixel. On the other hand, the inter-layer motion limited RPS does not include an inter-layer reference picture that may refer to a decoded pixel, but includes an inter-layer reference picture that may refer to motion information.

The procedure for deriving the inter-layer pixel RPS (IL-RPS0) and the inter-layer motion limited RPS (IL-RPS1) will be described with reference to FIG. FIG. 20 is a flowchart showing a derivation process of sub RPS (inter-layer pixel RPS and inter-layer motion limited RPS) included in the inter-layer RPS.

(S301) The list IL-RPS0 representing the inter-layer pixel RPS and the list IL-RPS1 representing the inter-layer motion limited RPS are respectively set to be empty.

(S302) The variable i is set to “0”. The process proceeds to S303.

(S303) If the dependency type of the i-th active layer reference picture is “0” or “1” (YES), the process proceeds to S304. In other cases (NO), the process proceeds to S305.

(S304) The i-th active layer reference picture is added to the end of IL-RPS0 (interlayer pixel RPS), and the process proceeds to S307.

(S305) If the dependency type of the i-th active layer reference picture is “2” (YES), the process proceeds to S306. In other cases (NO), the process proceeds to S307.

(S306) The i-th active layer reference picture is added to the end of IL-RPS1 (inter-layer motion limited RPS), and the process proceeds to S307.

(S307) If the value of i is smaller than the number of reference pictures between active layers (NumActiveRefLayerPics) (YES), the process proceeds to S308. In other cases (in the case of NO), the process is terminated.

(S308) Add 1 to the value of the variable i, and proceed to S303.

Through the above processing, the inter-layer pixel RPS and inter-layer motion limited RPS having the properties described with reference to FIG. 19B can be derived based on the IL-RPS information.

Note that the determinations in S303 and S305 can be expressed more generally as follows.

(S303r1) When the dependency type of the i-th active layer reference picture indicates that the decoded pixel of the inter-layer reference picture may be referred to (the reference picture corresponding to the i-th active layer reference picture) When SampleEnableFlag is 1, the process proceeds to S304. Otherwise, the process proceeds to S305.

(S305r1) When the dependency type of the i-th active layer reference picture indicates that the motion information of the inter-layer reference picture may be referred to (the reference picture corresponding to the i-th active layer reference picture) When SampleEnableFlag is 0 and MotionEnableFlag is 1, the process proceeds to S306. Otherwise, the process proceeds to S307.

The method for deriving the inter-layer pixel RPS and the inter-layer motion limited RPS described with reference to FIG. 20 is merely an example. Inter-layer pixel RPS and inter-layer motion limited RPS have specific characteristics, i.e., inter-layer pixel RPS includes an inter-layer reference picture to which a decoded pixel may be referenced, and inter-layer motion limited RPS includes a decoded pixel Different methods may be used as long as the property that an inter-layer reference picture that may be referred to is not included and an inter-layer reference picture that motion information may be referred to is included is satisfied. In the above description, the sub-RPS of the inter-layer pixel RPS and the inter-layer motion limited RPS is derived. However, the inter-layer pixel RPS and the inter-layer pixel-independent RPS sub-RPS may be derived. In this case, the determinations in S303 and S305 are replaced with the following determinations S303r2 and S305r2, respectively.

(S303r2) When the dependency type of the i-th active layer reference picture indicates that the decoded pixel of the inter-layer reference picture may be referred to (the reference picture corresponding to the i-th active layer reference picture) When SampleEnableFlag is 1, the process proceeds to S304. Otherwise, the process proceeds to S305.

(S305r2) When the dependency type of the i-th active layer reference picture does not indicate that the decoded pixel of the inter-layer reference picture is referred to (SampleEnableFlag of the reference picture corresponding to the i-th active layer reference picture is 0) ), The process proceeds to S306. Otherwise, the process proceeds to S307. Note that when there are only two types of dependency types, pixel-dependent and motion-dependent, the inter-layer motion limited RPS and the inter-layer pixel-independent RPS are equal.

[[Reference picture control unit 3063]] The reference picture control unit 3063 updates the DPB 3061 based on the input RPS. Schematically, the reference picture control unit 3063 performs “reference use” (“short-term reference use” or “short-term reference use” or a reference mark of a picture that is indicated by the inter prediction of the target picture (current picture) in the input RPS. Set to “Long term reference use”). In addition, a decoded picture of the target layer recorded in the DPB, which is not marked as “reference use” in the above process, is set to “reference nonuse”. Note that the reference mark of the inter-layer reference picture on the DPB is not changed by the reference picture control unit 3063. In other words, the reference mark change of the picture on the DPB derived from the base decoded picture is not performed by the reference picture control unit 3063 but by the reference layer picture control unit 3064 described later.

[[Reference layer picture control unit 3064]] The reference layer picture control unit 3064 updates the DPB 3061 based on the input decoded picture and RPS of the reference layer. Schematically, the reference layer picture control unit 3064 records, in the DPB, a decoded picture of a reference layer corresponding to a picture that can be referred to in inter-layer inter prediction of the target picture (current picture) in the input RPS. To do. In addition, the reference mark of the picture is set to “use reference” (“use of short-term reference” or “use of long-term reference” on the DPB. In addition, the output mark of the picture is set to “output unnecessary” on the DPB. In addition, when recording the decoded picture of the reference layer, it may be recorded in the picture buffer after applying scaling and filtering as necessary, in particular, the resolution of the output picture of the reference layer and the target layer is different. In this case (in the case of spatial scalability), it is necessary to scale the decoded picture of the reference layer according to the resolution of the output picture of the target layer.

[[RPL Deriving Unit 3065]] The RPL deriving unit 3065 uses a reference picture list to be used for inter prediction or inter-layer prediction of the target slice of the target picture based on the input RPS and the RPL information included in the input syntax value. Derived and output.

The RPL information is a syntax value that is decoded from the SPS or slice header in order to construct the reference picture list RPL. The RPL information includes SPS list correction information and SH list correction information.

The SPS list modification information is information included in the SPS, and is information related to restrictions on modification of the reference picture list. The SPS list correction information will be described with reference to FIG. 16 again. The part (C) in FIG. 16 corresponds to SPS list correction information. In the SPS list correction information, a flag (restricted_ref_pic_lists_flag) indicating whether or not the reference picture list is common in the previous slice included in the picture, and a flag (whether or not information related to list rearrangement exists in the slice header ( lists_modification_present_flag).

The SH list correction information is information included in the slice header, and the update information of the length of the reference picture list (reference list length) applied to the target picture, and the reordering information of the reference picture list (reference list reordering information) ) Is included. The SH list correction information will be described with reference to FIG. FIG. 21A illustrates a part of a slice header syntax table used when decoding a slice header. The part (C) in FIG. 21A corresponds to the SH list correction information.

The reference list length update information includes a flag (num_ref_idx_active_override_flag) indicating whether or not the list length is updated. In addition, information (num_ref_idx_l0_active_minus1) indicating the reference list length after the change of the L0 reference list and information (num_ref_idx_l1_active_minus1) indicating the reference list length after the change of the L1 reference list are included.

Information included in the slice header as reference list rearrangement information will be described with reference to FIG. FIG. 21B illustrates a syntax table of reference list rearrangement information used at the time of decoding the slice header.

The reference list rearrangement information includes an L0 reference list rearrangement presence / absence flag (ref_pic_list_modification_flag_l0). When the value of the flag is 1 (when the L0 reference list is rearranged) and NumPocTotalCurr is larger than 2, the L0 reference list rearrangement order (list_entry_l0 [i]) is included in the reference list rearrangement information. Here, NumPocTotalCurr is a variable representing the number of reference pictures that can be used in the current picture. Therefore, the L0 reference list rearrangement order is included in the slice header only when the L0 reference list is rearranged and the number of reference pictures available in the current picture is larger than two.

Similarly, when the reference picture is a B slice, that is, when the L1 reference list is available in the target picture, the L1 reference list rearrangement presence / absence flag (ref_pic_list_modification_flag_l1) is included in the reference list rearrangement information. When the value of the flag is 1 and NumPocTotalCurr is greater than 2, the L1 reference list rearrangement order (list_entry_l1 [i]) is included in the reference list rearrangement information. In other words, the L1 reference list rearrangement order is included in the slice header only when the L1 reference list is rearranged and the number of reference pictures available in the current picture is larger than two.

(RPL derivation process) Details of the reference picture list derivation process (RPL derivation process) will be described. The reference picture list deriving unit generates a reference picture list RPL used for decoding the target picture based on the reference picture set RPS and the RPL correction information.

There are two lists of reference picture lists: an L0 reference list and an L1 reference list. Prior to the description of deriving each reference picture list, the characteristics of the provisional L0 reference list and the provisional L1 reference list generated in the course of the derivation process will be described with reference to FIG. FIG. 22 is a diagram showing an outline of the temporary L0 reference list and the temporary L1 reference list generated in the intermediate process of the L0 reference list and L1 reference list derivation in the RPL deriving unit 3065.

As shown in FIG. 22 (a), the provisional L0 reference list includes a forward short-term RPS (StBef in the figure), an inter-layer pixel RPS (ILSample in the figure), and a rear in order from the top of the list (in order of priority). Sub RPSs are included in the order of short-term RPS (StAft in the figure), long-term RPS (Lt in the figure), and inter-layer motion limited RPS (ILMotion in the figure).

As shown in FIG. 22 (b), the provisional L1 reference list includes a backward short-term RPS (StAft in the figure), forward short-term RPS (StBef in the figure), and long-term RPS in order from the top of the list (in order of priority). (Lt in the drawing), inter-layer pixel RPS (ILSample in the drawing), and inter-layer motion limited RPS (ILMotion in the drawing) are included in this order.

That is, the provisional reference list (the provisional reference L0 reference list and the provisional L1 reference list) generated by the RPL deriving unit 3065 is a position closer to the top of the list of the inter-layer pixel RPS than the inter-layer motion limited RPS. Is included. In other words, the provisional reference list includes the inter-layer pixel RPS at a position corresponding to a higher priority than the inter-layer motion limited RPS.

Subsequently, the procedure for deriving the L0 reference list and the L1 reference list will be described. The L0 reference list is constructed according to the procedure shown in S401 to S409 below.

(S401) A temporary L0 reference list is generated and initialized to an empty list.

(S402) Reference pictures included in the forward short-term RPS are sequentially added to the provisional L0 reference list.

(S403) Reference pictures included in the inter-layer pixel RPS are sequentially added to the provisional L0 reference list.

(S404) The reference pictures included in the backward short-term RPS are sequentially added to the provisional L0 reference list.

(S405) Reference pictures included in the long-term RPS are sequentially added to the provisional L0 reference list.

(S406) Reference pictures included in the inter-layer motion limited RPS are sequentially added to the provisional L0 reference list.

(S407) When the reference picture list is modified (when the value of lists_modification_present_flag included in the RPL modification information is 1), the process proceeds to S408 below. In other cases (when the value of lists_modification_present_flag is 0), the process proceeds to S409.

(S408) Based on the value of the reference list rearrangement order list_entry_l0 [i], the elements of the provisional L0 reference list are rearranged to form the L0 reference list. The element RefPicList0 [rIdx] of the L0 reference list corresponding to the reference picture index rIdx is derived by the following equation. Here, RefListTemp0 [i] represents the i-th element of the provisional L0 reference list.

RefPicList0 [rIdx] = RefPicListTemp0 [list_entry_l0 [rIdx]]
According to the above formula, in the reference list rearrangement order list_entry_l0 [i], the value recorded at the position indicated by the reference picture index rIdx is referred to, and the reference recorded at the position of the value in the provisional L0 reference list The picture is stored as a reference picture at the position of rIdx in the L0 reference list.

(S409) The provisional L0 reference list is set as the L0 reference list.

Next, the L1 reference list construction procedure will be described. The construction of the L1 reference list is executed according to the procedure described in S501 to S509 below.

(S501) A provisional L1 reference list is generated and initialized to an empty list.

(S502) The reference pictures included in the backward short-term RPS are sequentially added to the provisional L1 reference list.

(S503) The reference pictures included in the forward short-term RPS are sequentially added to the provisional L1 reference list.

(S504) Reference pictures included in the long-term RPS are sequentially added to the provisional L1 reference list.

(S505) Reference pictures included in the inter-layer pixel RPS are sequentially added to the provisional L1 reference list.

(S506) Reference pictures included in the inter-layer motion limited RPS are sequentially added to the provisional L1 reference list.

(S507) When the reference picture list is modified (when the value of lists_modification_present_flag included in the RPL modification information is 1), the process proceeds to S508 below. In other cases (when the value of lists_modification_present_flag is 0), the process proceeds to S509.

(S508) The elements of the provisional L1 reference list are rearranged based on the value of the reference list rearrangement order list_entry_l1 リスト [i] to obtain an L1 reference list. The element RefPicList1 [rIdx] of the L1 reference list corresponding to the reference picture index rIdx is derived by the following equation. Here, RefListTemp1 [i] represents the i-th element of the provisional L1 reference list.

RefPicList1 [rIdx] = RefPicListTemp1 [list_entry_l1 [rIdx]]
According to the above formula, in the reference list rearrangement order list_entry_l1 [i], the value recorded at the position indicated by the reference picture index rIdx is referred to, and the reference recorded at the position of the value in the provisional L1 reference list The picture is stored as a reference picture at the position of rIdx in the L1 reference list.

(S509) The provisional L1 reference list is set as the L1 reference list.

According to the above reference picture list derivation procedure, the reference picture list (reference list L0 and reference list L1) selects and selects the corresponding provisional reference list (provisional reference list L0 and provisional reference list L1) based on the RPL correction information. Generated by sorting. At that time, the provisional reference list includes the inter-layer pixel RPS at a position corresponding to a higher priority than the inter-layer motion limited RPS. In other words, the provisional reference list includes the inter-layer pixel RPS at a position closer to the top of the list than the inter-layer motion limited RPS.

When the RPL correction information indicates that no reordering is performed, the inter-layer reference picture included in the inter-layer pixel RPS is closer to the top of the reference picture list than the inter-layer reference picture included in the inter-layer motion limited RPS ( Associated with the smaller reference picture index). Therefore, the inter-layer reference picture included in the inter-layer pixel RPS can be designated with a reference picture index having a smaller value than the inter-layer reference picture included in the inter-layer motion limited RPS. The inter-layer reference picture included in the inter-layer pixel RPS is an inter-layer reference picture that may be used for inter-layer pixel prediction. On the other hand, the inter-layer reference picture included in the inter-layer motion limited RPS is an inter-layer reference picture that may not be used for inter-layer pixel prediction but may be used for inter-layer motion prediction. In general, a larger number of reference picture indexes for specifying reference pictures used for inter-layer image prediction are included than reference picture indexes for specifying reference pictures used for inter-layer motion prediction. For example, the former is included for each prediction unit, while the latter is included for each slice. Therefore, the inter-layer reference picture included in the inter-layer pixel RPS is assigned a smaller reference picture index than the inter-layer reference picture included in the inter-layer motion limited RPS so that it can be specified with a smaller code amount. Thus, the code amount of the entire encoded data can be reduced.

When the RPL correction information indicates that reordering is to be performed, the position in the provisional reference list is designated by the reference list reordering information (list_entry_l0, list_entry_l1), thereby reordering the provisional reference list to generate a reference list. In this case, it is possible to reduce the code amount of the RPL correction information by making it possible to specify a reference picture that is likely to be rearranged at the head of the reference picture list with reference list rearrangement information having a smaller value. In the provisional reference list, the inter-layer reference picture included in the inter-layer pixel RPS is less likely to be rearranged at the top of the reference picture list, compared to the inter-layer reference picture included in the inter-layer motion limited RPS. Since it is near the top of the list, it can be specified with reference list sorting information with a smaller value. Therefore, the code amount of the RPL correction information can be reduced by using the provisional reference list derived by the above procedure.

In addition, the above provisional reference list includes a motion limited RPS at the end of the provisional reference list. The provisional reference list includes the motion limited RPS at a position closer to the back of the list than the short-term RPS. The inter-layer reference pictures included in the motion-limited reference list have a lower selection frequency than the reference pictures included in the short-term RPS and long-term RPS that are sub-RPSs related to inter prediction in the same layer. The information related to the selection of the reference picture can be decoded with a smaller code amount.

When deriving the sub-RPS of the inter-layer pixel RPS and the inter-layer pixel-independent RPS instead of the sub-RPS of the inter-layer pixel RPS and the inter-layer motion limited RPS, the reference picture list derivation process (RPL derivation) is performed. In the processing), the inter-layer motion limited RPS is read as an inter-layer pixel-independent RPS and processed. More specifically, the processes of S406 and S506 are replaced with the following processes.

(S406r2) The reference pictures included in the inter-layer pixel-independent RPS are sequentially added to the provisional L0 reference list.

(S506r2) Reference pictures included in the inter-layer pixel-independent RPS are sequentially added to the provisional L1 reference list.

Thereby, the provisional reference list includes the inter-layer pixel RPS at a position corresponding to a higher priority than the inter-layer pixel-independent RPS. In other words, the temporary reference list includes the inter-layer pixel RPS at a position closer to the top of the list than the inter-layer pixel-independent RPS. In addition, the provisional reference list includes a pixel-independent limited RPS at the end of the provisional reference list. The provisional reference list includes a pixel-independent RPS at a position closer to the back of the list than the short-term RPS.

As described above, since a reference picture index for specifying a reference picture used for inter-layer image prediction is included in each prediction unit, prediction other than inter-layer image prediction (for example, motion information dependent) is also used in general cases. In the case of motion prediction between layers), it is considered that a larger number than the reference picture index for specifying the reference picture is included. Therefore, an inter-layer reference picture included in an inter-layer pixel RPS can be designated with a smaller code amount by assigning a smaller reference picture index than an inter-layer reference picture included in an inter-layer pixel-independent RPS. Thus, the code amount of the entire encoded data can be reduced.

[[Output Control Unit 3066]] The output control unit 3066 generally outputs the DPB 3061 picture to the outside and updates the output mark at a predetermined timing. Specifically, the picture output processing by the output control unit 3066 is executed in the following procedure.

First, a picture with a minimum POC is output from the pictures on the DPB whose output mark is “output required”. Next, the output mark of the output picture is set to “output unnecessary”. Finally, a picture whose reference mark is “reference not used” and whose output mark is “output unnecessary” is selected from the pictures on the DPB, and the picture is deleted from the DPB.

[[Prediction parameter memory 3067]] The prediction parameter memory 3067 includes inter prediction parameters decoded by the inter prediction parameter decoding unit 303, intra prediction parameters decoded by the intra prediction parameter decoding unit 304, and prediction modes separated by the entropy decoding unit 301. predMode is stored. The inter prediction parameters stored in the prediction parameter memory 3067 include, for example, a prediction list use flag predFlagLX (inter prediction flag inter_pred_idcinter_pred_idc), a reference picture index refIdxLX, and a vector mvLX.

[Inter predicted image generation unit 309] FIG. 10 is a schematic diagram illustrating a configuration of the inter predicted image generation unit 309 according to the present embodiment. The inter predicted image generation unit 309 includes a motion displacement compensation unit 3091, a residual prediction unit 3092, an illuminance compensation unit 3093, a weight prediction unit 3094, and a reference image determination unit 3095.

Based on the prediction list use flag predFlagLX, the reference picture index refIdxLX, and the motion vector mvLX input from the inter prediction parameter decoding unit 303, the motion displacement compensation unit 3091 uses the reference picture index refIdxLX from the DPB 3061 of the decoded picture management unit 306. A motion displacement compensation image is generated by reading out a block at a position shifted by the vector mvLX starting from the position of the target block of the designated reference picture. Here, when the vector mvLX is not an integer vector, a motion displacement compensation image is generated by applying a filter called a motion compensation filter (or displacement compensation filter) for generating a pixel at a decimal position. In general, when the vector mvLX is a motion vector, the above processing is called motion compensation, and when the vector mvLX is a displacement vector, it is called displacement compensation. Here, it is collectively referred to as motion displacement compensation. Hereinafter, the motion displacement compensation image for L0 prediction is referred to as predSamplesL0, and the motion displacement compensation image for L1 prediction is referred to as predSamplesL1. If the two are not distinguished, they are called predSamplesLX. Hereinafter, an example in which residual prediction and illuminance compensation are further performed on the motion displacement compensation image predSamplesLX obtained by the motion displacement compensation unit 3091 will be described. These output images are also referred to as motion displacement compensation images predSamplesLX. In the following residual prediction and illuminance compensation, when an input image and an output image are distinguished, the input image is expressed as predSamplesLX and the output image is expressed as predSamplesLX ′.

The reference image determination unit 3095 uses a reference picture (decoded image) (refIvRefPic) in a reference layer (reference view) (hereinafter referred to as “reference layer reference picture refIvRefPic” for convenience of explanation) used for residual prediction. It is determined whether or not (refer) is available. When the reference layer reference picture refIvRefPic is available, the reference image determination unit 3095 sets 1 to the reference layer reference picture use flag refIvRefPicAvailable (refIvRefPicAvailable2). On the other hand, when the reference layer reference picture refIvRefPic cannot be used, the reference image determination unit 3095 sets the reference layer reference picture use flag refIvRefPicAvailable (refIvRefPicAvailable2) to 0. The reference image determination unit 3095 outputs the reference layer reference picture use flag refIvRefPicAvailable (refIvRefPicAvailable2) set in this way to the residual prediction unit 3092.

The reference image determination unit 3095 may determine whether or not the reference picture can be used as follows. Hereinafter, the reference picture in the target layer is referred to as an ARP reference picture arpRefPic. The ARP reference picture arpRefPic has a picture order different from the picture order of the target picture. Also, it is assumed that the picture (RefPicListX [arpRefIdxLX]) referenced by the index arpRefIdxLX of the reference picture list RefPicListX (X = 0, 1) is the ARP reference picture arpRefPic. Also, an index indicating the view identifier is described as ViewIdx, and the view identifier of the reference layer (view) in the coordinates of the target block (xP, yP) is described as refViewIdx [xP] [yP]. Also, PicOrderCnt (X) indicates the picture order number POC of picture X. The index arpRefIdxLX may be set to 0, or the method shown in option Y5 of the fourth embodiment described later may be used.

(Option 1): Determine whether or not the reference picture exists in the RPS. If there is a reference picture Pic that satisfies the following (1-1) and (1-2), the reference picture Pic is included in the reference layer reference picture refIvRefPic. And the reference layer reference picture usage flag refIvRefPicAvailable is set to 1. Otherwise, the reference layer reference picture usage flag refIvRefPicAvailable is set to 0.

(1-1) PicOrderCnt (Pic) is PicOrderCnt (arpRefPic), and ViewIdx of Pic is refViewIdx [xP] [yP].

(1-2) The reference picture Pic that satisfies the condition (1-1) is present in the RPS of the picture on the reference layer whose POC is the same as the target picture and whose ViewIdx is refViewIdx [xP] [yP] To do.

Since RPS is invariant among a plurality of slices included in a picture and exists even when the slice type is I slice, determination by RPS is appropriate.

(Option 2): Determine whether or not a reference picture exists in the RPL. If there is a reference picture Pic that satisfies the above (1-1) and the following (2-2), the reference picture Pic is used as the reference layer reference picture refIvRefPic. And the reference layer reference picture usage flag refIvRefPicAvailable is set to 1. Otherwise, the reference layer reference picture usage flag refIvRefPicAvailable is set to 0.

(2-2) The reference picture Pic that satisfies the above condition (1-1) is the same POC as the target picture, and the slice type on the reference layer (ViewIdx is refViewIdx [xP] [yP]) slice_type) exists in the reference layer list RPL (RefPicListX) of a picture that is not an I slice.

RPL can be easily derived in many cases. Note that since the length and / or content of the RPL is different for each slice, as described above, the RefPicListX [] of the first slice can be used for the determination. However, the present invention is not limited to this, and the RefPicListX [] of a predetermined slice among the slices included in the target picture may be used between the slices included in the target picture. For example, the last slice can be used. In the above determination, the presence in the reference picture list does not exist in a specific order element of the reference picture list, but may exist in any order. Further, when there are the L0 list RefPicList0 [] and the L1 list RefPicList1 [] as reference picture lists, they may exist as elements of any list.

(Option 3a): Determine whether or not a reference picture exists in the DPB. If there is a reference picture Pic that satisfies the above (1-1) and the following (3-1), the reference picture Pic is used as the reference layer reference picture refIvRefPic. And the reference layer reference picture usage flag refIvRefPicAvailable is set to 1. Otherwise, the reference layer reference picture usage flag refIvRefPicAvailable is set to 0.

(3-1) A reference picture Pic that satisfies the above condition (1-1) exists in the decoded picture buffer (DPB) having the same POC as that of the target picture and ViewIdx of refViewIdx [xP] [yP]. To do.

Since DPB 3061 stores usable pictures, DPB may be used for determination as described above.

(Option 3b): It is determined whether or not a reference picture exists in the DPB and is marked as used for reference. Reference picture satisfying the above (1-1) and the following (3b-2) When Pic exists, the reference picture Pic is adopted as the reference layer reference picture refIvRefPic, and 1 is set to the reference layer reference layer reference picture use flag refIvRefPicAvailable. Otherwise, the reference layer reference layer reference picture usage flag refIvRefPicAvailable is set to 0.

(3b-2) The reference picture Pic that satisfies the above condition (1-1) is the same POC as the target picture, the ViewIdx is refViewIdx [xP] [yP], and the reference use (used forreference) The picture marked with is present in the decoded picture buffer (DPB).

Since the DPB operation is decoder-dependent, the picture storage state in the DPB may not be reliable. Specifically, the DPB is not marked as need-for-output and discards the picture that is marked for reference use, but the output timing may be decoder-dependent, so some decoders In this case, the picture is discarded, but it can be said that the picture is not discarded in other decoders. In this case, the determination of whether a reference picture (used for ARP) exists in the DPB may differ depending on the decoder. Since the mark for reference use is set according to the RPS that is explicitly decoded, there is no difference between the decoders. For this reason, as described above, it is preferable to determine whether or not the reference mark of the reference picture is “use of reference”.

In the above options 1 to 3b, the following (1-1b) may be used instead of the setting (1-1).
(1-1b) PicOrderCnt (Pic) is PicOrderCnt (arpRefPic), Pic's ViewIdx is refViewIdx [xP] [yP], and Pic's DepthFlag is 0.

According to (1-1b), when the reference picture is a depth picture (when DepthFlag is 1), it is not set as the target reference picture Pic. ARP functions when a texture picture (DepthFlag = 0) is used as a reference picture, and does not function when a depth picture (DepthFlag = 1) is used as a reference picture. Therefore, the determination of (1-1b) has an effect of causing the effect of residual prediction. That is, in the case of RPS (option 1), RPL (option 2), and DPB (option 3a, option 3b), when setting the reference picture Pic from them, the depth picture is set by mistake and left. There is an effect of preventing a problem that the effect of the difference prediction cannot be obtained.

The residual prediction unit 3092 performs the residual prediction (ARP: Advanced Residual Prediction) on the input motion displacement compensation image predSamplesLX when the residual prediction execution flag resPredFlag is 1. When the residual prediction execution flag resPredFlag is 0, the input motion displacement compensation image predSamplesLX is output as it is.

Residual prediction is an image (motion compensation) in which a residual of a reference layer (first layer image) different from a target layer (second layer image) that is a target of predicted image generation is predicted from a reference picture of the target layer. This is performed by adding to the motion displacement compensation image predSamplesLX. This is called the first residual prediction. In addition, residual prediction refers to a residual at a different time (POC) from a target picture between a target layer (second layer image) and a reference layer (first layer image) that are targets of predicted image generation. It can also be performed by adding to the motion displacement compensation image predSamplesLX which is an image (displacement compensation image) predicted from the reference picture of the layer. This is called second residual prediction.

In the first residual prediction, assuming that a residual similar to the reference layer also occurs in the target layer, the picture refIvRefPic at a different time (POC) from the target picture of the already derived reference layer, and the reference layer target The residual between the pictures currIvRefPic at the same time (POC) as the picture is used as an estimate of the residual in motion compensation prediction from a picture (for example, arpRefPic) at a different time (POC) from the target picture of the target layer.

In the second residual prediction, the residuals of the reference layer picture refIvRefPic and the inter-picture arpRefPic of the picture at the time (POC) different from the already derived target picture are set at the same time (POC) as the target picture. This is used as an estimate of the residual in displacement compensation prediction from the reference layer picture currIvRefPic.

In the following embodiments, both the first residual prediction and the second residual prediction are used. However, for simplification, only one of the first residual prediction and the second residual prediction is used. I do not care.

FIG. 1 is a block diagram showing the configuration of the residual prediction unit 3092. The residual prediction unit 3092 includes a residual prediction execution flag deriving unit 30921, a reference image acquisition unit 30922, and a residual synthesis unit 30923.

The residual prediction execution flag deriving unit 30921 has a residual prediction execution flag resPredFlag when (1) the residual prediction flag iv_res_pred_weight_idx is not 0 and (2) the reference picture use flag is “1”. 1 is set to indicate that difference prediction is to be executed. On the other hand, when the residual prediction flag iv_res_pred_weight_idx is 0, or when the reference picture use flag is not “1” (in the case of disparity compensation), 0 is set to the residual prediction execution flag resPredFlag.

That is, the residual prediction execution flag deriving unit 30921 may derive the residual prediction execution flag resPredFlag according to the following conditional expression (R-1).

resPredFlag = (iv_res_pred_weight_idx! = 0) && refIvRefPicAvailable (R-1)
Here, PicOrderCntVal is the picture order number POC of the target picture.

In the case of limiting to motion prediction, the residual prediction execution flag deriving unit 30921 determines (3) whether the target block is motion compensation in addition to (1) and (2) above. (PicOrderCnt (RefPicListX [refIdxLX])! = PicOrderCntVal). That is, the residual prediction execution flag resPredFlag may be derived from the following conditional expression (R-2).

resPredFlag = (iv_res_pred_weight_idx! = 0)
(PicOrderCnt (RefPicListX [refIdxLX])! = PicOrderCntVal) && refIvRefPicAvail ・・・ (R-2)
When the residual prediction execution flag resPredFlag is 1, the reference image acquisition unit 30922 stores the motion vector mvLX and the residual prediction displacement vector mvDisp input from the inter prediction parameter decoding unit 303, and the decoded picture management unit 306. The corresponding block currIvSamplesLX and the reference block refIvSamplesLX of the corresponding block are read.

FIG. 11A is a diagram for explaining the corresponding block currIvSamplesLX. As shown in FIG. 11A, the corresponding block Cor1 corresponding to the target block Tar1 on the target layer has a positional relationship between the reference layer and the target layer, starting from the position of the target block Tar1 ′ of the image on the reference layer. Is located in a block at a position shifted by a displacement vector mvDisp, which is a vector indicating.

Specifically, the reference image acquisition unit 30922 derives a pixel at a position where the coordinates (x, y) of the pixel of the target block are shifted by the displacement vector mvDisp of the target block. Considering that the displacement vector mvDisp has a decimal precision of 1/4 pel, the residual generation unit 30922 uses the X of the pixel R0 with integer precision corresponding to the case where the coordinates of the pixel of the target block are (xP, yP). XInt = xPb + (mvLX [0] >> 2) The coordinates xInt and Y coordinates yInt, and the fractional part xFrac of the displacement vector mvDisp and the fractional part yFrac of the Y component
yInt = yPb + (mvLX [1] >> 2)
xFrac = mvLX [0] & 3
yFrac = mvLX [1] & 3
It is derived by the following formula. Here, X & 3 is a mathematical expression for extracting only the lower 2 bits of X.

Next, the reference image acquisition unit 30922 generates an interpolation pixel predPartLX [x] [y] in consideration of the fact that the displacement vector mvDisp has a pel resolution of 1/4 pel. First, the coordinates of integer pixels A (xA, yB), B (xB, yB), C (xC, yC) and D (xD, yD) are set to xA = Clip3 (0, picWidthInSamples-1, xInt)
xB = Clip3 (0, picWidthInSamples-1, xInt + 1)
xC = Clip3 (0, picWidthInSamples-1, xInt)
xD = Clip3 (0, picWidthInSamples-1, xInt + 1)
yA = Clip3 (0, picHeightInSamples-1, yInt)
yB = Clip3 (0, picHeightInSamples-1, yInt)
yC = Clip3 (0, picHeightInSamples-1, yInt + 1)
yD = Clip3 (0, picHeightInSamples-1, yInt + 1)
It is derived by the following formula. Here, the integer pixel A is a pixel corresponding to the pixel R0, and the integer pixels B, C, and D are integer precision pixels adjacent to the right, bottom, and bottom right of the integer pixel A, respectively, and Clip3 (x, y , z) is a function that limits (clips) z to be greater than or equal to x and less than or equal to y. The reference image acquisition unit 30922 includes reference pixels refPicLX [xA] [yA], refPicLX [xB] [yB], refPicLX [xC] [yC], and refPicLX [corresponding to the integer pixels A, B, C, and D, respectively. xD] [yD] is read from the DPB 3061 of the decoded picture management unit 306.

Then, the reference image acquisition unit 30922 includes the reference pixel refPicLX [xA] [yA], refPicLX [xB] [yB], refPicLX [xC] [yC], refPicLX [xD] [yD] and the X component of the displacement vector mvDisp. An interpolated pixel predPartLX [x] [y], which is a pixel shifted by the decimal part of the displacement vector mvDisp from the pixel R0, is derived using the fractional part xFrac and the fractional part yFrac of the Y component. In particular,
predPartLX [x] [y] = (refPicLX [xA] [yA] * (8-xFrac) * (8-yFrac)
+ refPicLX [xB] [yB] * (8-yFrac) * xFrac
+ refPicLX [xC] [yC] * (8-xFrac) * yFrac
+ refPicLX [xD] [yD] * xFrac * yFrac) >> 6
It is derived by the following formula.

The reference image acquisition unit 30922 performs the above interpolation pixel derivation process on each pixel in the target block, and sets a set of interpolation pixels as an interpolation block predPartLX. The reference image acquisition unit 30922 outputs the derived interpolation block predPartLX to the residual synthesis unit 30923 as the corresponding block currIvSamplesLX.

FIG. 11B is a diagram for explaining the reference block refIvSamplesLX. As shown in FIG. 11B, the reference block corresponding to the corresponding block on the reference layer is shifted to the position shifted by the motion vector mvLX of the target block from the position of the corresponding block of the reference image on the reference layer. Located in the block.

The reference image acquisition unit 30922 except for the process of deriving the corresponding block currIvSamplesLX and replacing the displacement vector mvDisp with a vector (mvDisp [0] + mvLX [0], mvDisp [1] + mvLX [1]) The corresponding block refIvSamplesLX is derived by performing the same processing. The reference image acquisition unit 30922 outputs the corresponding block refIvSamplesLX to the residual synthesis unit 30923.

When the residual prediction execution flag resPredFlag is 1, the residual synthesis unit 30923 derives a corrected motion displacement compensation image predSamplesLX ′ from the motion displacement compensation image predSamplesLX, the corresponding block currIvSamplesLX, the reference block refIvSamplesLX, and the residual prediction flag iv_res_pred_weight_idx. To do. The corrected motion displacement compensation image predSamplesLX´
predSamplesLX´ = predSamplesLX + ((currIvSamplesLX-refIvSamplesLX) >> (iv_res_pred_weight_idx-1))
It is calculated using the following formula. When the residual prediction execution flag resPredFlag is 0, the residual synthesis unit 30923 outputs the motion displacement compensation image predSamplesLX as it is.

When the illumination compensation flag ic_enable_flag is 1, the illumination compensation unit 3093 performs illumination compensation on the input motion displacement compensation image predSamplesLX. When the illumination compensation flag ic_enable_flag is 0, the input motion displacement compensation image predSamplesLX is output as it is. The motion displacement compensation image predSamplesLX input to the illuminance compensation unit 3093 is an output image of the motion displacement compensation unit 3091 when the residual prediction is off, and the residual prediction unit when the residual prediction is on. 3092 is an output image.

The weight prediction unit 3094 generates a predicted picture block P (predicted image) by multiplying the input motion displacement image predSamplesLX by a weight coefficient. The input motion displacement image predSamplesLX is an image on which residual prediction and illuminance compensation are performed. When one of the reference list usage flags (predFlagL0 or predFlagL1) is 1 (in the case of single prediction) and weight prediction is not used, the input motion displacement image predSamplesLX (LX is L0 or L1) is set to the number of pixel bits. The following formula is processed.

predSamples [x] [y] = Clip3 (0, (1 << bitDepth)-1, (predSamplesLX [x] [y] + offset1) >> shift1)
Here, shift1 = 14-bitDepth, offset1 = 1 << (shift1-1).

When both of the reference list usage flags (predFlagL0 or predFlagL1) are 1 (in the case of bi-prediction) and weight prediction is not used, the input motion displacement images predSamplesL0 and predSamplesL1 are averaged to obtain the number of pixel bits. The following formula is processed.

predSamples [x] [y] = Clip3 (0, (1 << bitDepth)-1, (predSamplesL0 [x] [y] + predSamplesL1 [x] [y] + offset2) >> shift2)
Here, shift2 = 15-bitDepth, offset2 = 1 << (shift2-1).

Furthermore, in the case of single prediction, when performing weight prediction, the weight prediction unit 3094 derives the weight prediction coefficient w0 and the offset o0, and performs the following processing.

predSamples [x] [y] = Clip3 (0, (1 << bitDepth)-1, ((predSamplesLX [x] [y] * w0 + 2log2WD-1) >> log2WD) + o0)
Here, log2WD is a variable indicating a predetermined shift amount.

Furthermore, in the case of bi-prediction, when weight prediction is performed, the weight prediction unit 3094 derives weight prediction coefficients w0, w1, o0, o1, and performs the processing of the following equation.

predSamples [x] [y] = Clip3 (0, (1 << bitDepth)-1, (predSamplesL0 [x] [y] * w0 + predSamplesL1 [x] [y] * w1 + ((o0 + o1 + 1) << log2WD)) >> (log2WD + 1))
(Modification 1) In the above (reference image determination), it has been described that the use of the reference layer reference picture refIvRefPic is determined using the motion compensation level condition (resPredFlag). However, the present invention is not limited to this, and conditions in the parsing stage may be used.

That is, the reference image determination unit 3095 may determine whether or not the ARP reference picture arpRefPic can be used in the parsing stage, and supply the determination result arpRefPicAvailable to the inter prediction parameter decoding control unit 3031. Hereinafter, a flag indicating the availability of the ARP reference picture arpRefPic in the parsing stage is distinguished as arpRefPicAvailable, and a flag indicating the availability of the ARP reference picture arpRefPic in the motion compensation stage is distinguished as refIvRefPicAvailable.

It will be described in more detail with reference to FIG. As shown in SYN11 and SYN12 in FIG. 14, when deriving a CU level residual prediction flag (ARP flag) iv_res_pred_weight_idx, the reference image determination unit 3095 may determine whether or not the ARP reference picture arpRefPic can be used.

In the CU level parameter decoding, when the inter-view residual prediction is a usable layer (iv_res_pred_flag [nuh_layer_id] is not 0) and the ARP reference picture arpRefPic is available (arpRefPicAvailable is not 0), The inter prediction parameter decoding control unit 3031 may decode iv_res_pred_weight_idx.

Thereby, it is possible to suppress the occurrence of a situation in which the reference picture cannot be used in the residual prediction process.

Next, the configuration of the image encoding device 11 according to the present embodiment will be described. FIG. 12 is a block diagram illustrating a configuration of the image encoding device 11 according to the present embodiment. The image encoding device 11 includes a predicted image generation unit 101, a subtraction unit 102, a DCT / quantization unit 103, an entropy encoding unit 104, an inverse quantization / inverse DCT unit 105, an addition unit 106, a decoded picture management unit (reference image) (Storage unit, frame memory) 109, encoding parameter determination unit 110, prediction parameter encoding unit 111, and residual storage unit 313 (residual recording unit). The prediction parameter encoding unit 111 includes an inter prediction parameter encoding unit 112 and an intra prediction parameter encoding unit 113.

The predicted image generation unit 101 generates a predicted picture block P for each block which is an area obtained by dividing the picture for each viewpoint of the layer image T input from the outside. Here, the predicted image generation unit 101 reads the reference picture block from the decoded picture management unit 109 based on the prediction parameter input from the prediction parameter encoding unit 111. The prediction parameter input from the prediction parameter encoding unit 111 is, for example, a motion vector or a displacement vector. The predicted image generation unit 101 reads the reference picture block of the block at the position indicated by the motion vector or the displacement vector predicted from the encoding target block. The prediction image generation unit 101 generates a prediction picture block P using one prediction method among a plurality of prediction methods for the read reference picture block. The predicted image generation unit 101 outputs the generated predicted picture block P to the subtraction unit 102. Note that since the predicted image generation unit 101 performs the same operation as the predicted image generation unit 308 already described, details of generation of the predicted picture block P are omitted.

In order to select a prediction method, the predicted image generation unit 101, for example, calculates an error value based on a difference between a signal value for each pixel of a block included in the layer image and a signal value for each corresponding pixel of the predicted picture block P. Select the prediction method to minimize. The method for selecting the prediction method is not limited to this.

When the picture to be encoded is a base view picture, the plurality of prediction methods are intra prediction, motion prediction, and merge prediction. Motion prediction is prediction between display times among the above-mentioned inter predictions. The merge prediction is a prediction that uses the same reference picture block and prediction parameter as a block that has already been encoded and is within a predetermined range from the encoding target block. When the picture to be encoded is a non-base view picture, the plurality of prediction methods are intra prediction, motion prediction, merge prediction, and displacement prediction. The displacement prediction (disparity prediction) is prediction between different layer images (different viewpoint images) in the above-described inter prediction. Furthermore, motion prediction, merge prediction, and displacement prediction. For displacement prediction (disparity prediction), there are predictions with and without additional prediction (residual prediction and illuminance compensation).

The prediction image generation unit 101 outputs a prediction mode predMode indicating the intra prediction mode used when generating the prediction picture block P to the prediction parameter encoding unit 111 when intra prediction is selected.

When the motion picture prediction is selected, the prediction image generation unit 101 stores the motion vector mvLX used when generating the prediction picture block P in the decoded picture management unit 109 and outputs the motion vector mvLX to the inter prediction parameter encoding unit 112. The motion vector mvLX indicates a vector from the position of the encoding target block to the position of the reference picture block when the predicted picture block P is generated. The information indicating the motion vector mvLX may include information indicating a reference picture (for example, a reference picture index refIdxLX, a picture order number POC), and may represent a prediction parameter. Further, the predicted image generation unit 101 outputs a prediction mode predMode indicating the inter prediction mode to the prediction parameter encoding unit 111.

When the prediction image generation unit 101 selects displacement prediction, the prediction image generation unit 101 stores the displacement vector used in generating the prediction picture block P in the decoded picture management unit 109 and outputs it to the inter prediction parameter encoding unit 112. The displacement vector dvLX indicates a vector from the position of the encoding target block to the position of the reference picture block when the predicted picture block P is generated. The information indicating the displacement vector dvLX may include information indicating a reference picture (for example, reference picture index refIdxLX, view IDview_id) and may represent a prediction parameter. Further, the predicted image generation unit 101 outputs a prediction mode predMode indicating the inter prediction mode to the prediction parameter encoding unit 111.

When the prediction image generation unit 101 selects merge prediction, the prediction image generation unit 101 outputs a merge index merge_idx indicating the selected reference picture block to the inter prediction parameter encoding unit 112. Further, the predicted image generation unit 101 outputs a prediction mode predMode indicating the merge prediction mode to the prediction parameter encoding unit 111.

In the above-described motion prediction, displacement prediction, and merge prediction, the prediction image generation unit 101, when the residual prediction execution flag resPredFlag indicates that the residual prediction is performed, as described above, the prediction image generation unit 101 Residual prediction is performed in the included residual prediction unit 3092.

The subtraction unit 102 subtracts the signal value of the prediction picture block P input from the prediction image generation unit 101 for each pixel from the signal value of the corresponding block of the layer image T input from the outside, and generates a residual signal. Generate. The subtraction unit 102 outputs the generated residual signal to the DCT / quantization unit 103 and the encoding parameter determination unit 110.

The DCT / quantization unit 103 performs DCT on the residual signal input from the subtraction unit 102 and calculates a DCT coefficient. The DCT / quantization unit 103 quantizes the calculated DCT coefficient to obtain a quantization coefficient. The DCT / quantization unit 103 outputs the obtained quantization coefficient to the entropy encoding unit 104 and the inverse quantization / inverse DCT unit 105.

The entropy coding unit 104 receives the quantization coefficient from the DCT / quantization unit 103 and the coding parameter from the coding parameter determination unit 110. Input coding parameters include codes such as a reference picture index refIdxLX, a vector index mvp_LX_idx, a difference vector mvdLX, a prediction mode predMode, a residual prediction flag iv_res_pred_weight_idx, and a merge index merge_idx.

The entropy encoding unit 104 generates an encoded stream Te by entropy encoding the input quantization coefficient and encoding parameter, and outputs the generated encoded stream Te to the outside.

The inverse quantization / inverse DCT unit 105 inversely quantizes the quantization coefficient input from the DCT / quantization unit 103 to obtain a DCT coefficient. The inverse quantization / inverse DCT unit 105 performs inverse DCT on the obtained DCT coefficient to calculate a decoded residual signal. The inverse quantization / inverse DCT unit 105 outputs the calculated decoded residual signal to the addition unit 106.

The addition unit 106 adds the signal value of the predicted picture block P input from the predicted image generation unit 101 and the signal value of the decoded residual signal input from the inverse quantization / inverse DCT unit 105 for each pixel, and refers to them. Generate a picture block. The adding unit 106 stores the generated reference picture block in the decoded picture management unit 109.

Similar to the decoded picture management unit 306 of the image decoding device 31, the decoded picture management unit 109 has a prediction parameter memory (not shown), and the prediction parameter generated by the prediction parameter encoding unit 111 in the prediction parameter memory. Are stored in a predetermined position for each picture and block to be encoded.

Similarly to the decoded picture management unit 306 of the image decoding device 31, the decoded picture management unit 109 has a DPB (not shown), and encodes the reference picture block generated by the addition unit 106 in the DPB. Store in a predetermined position for each target picture and block.

Note that the details of the decoded picture management unit 109 are the same as those described for the decoded picture management unit 306 of the image decoding device 31, and thus the description thereof is omitted here.

The encoding parameter determination unit 110 selects one set from among a plurality of sets of encoding parameters. The encoding parameter is a parameter to be encoded that is generated in association with the above-described prediction parameter or the prediction parameter. The predicted image generation unit 101 generates a predicted picture block P using each of these sets of encoding parameters.

The encoding parameter determination unit 110 calculates a cost value indicating the amount of information and the encoding error for each of a plurality of sets. The cost value is, for example, the sum of a code amount and a square error multiplied by a coefficient λ. The code amount is the information amount of the encoded stream Te obtained by entropy encoding the quantization error and the encoding parameter. The square error is the sum between pixels regarding the square value of the residual value of the residual signal calculated by the subtracting unit 102. The coefficient λ is a real number larger than a preset zero. The encoding parameter determination unit 110 selects a set of encoding parameters that minimizes the calculated cost value. As a result, the entropy encoding unit 104 outputs the selected set of encoding parameters to the outside as the encoded stream Te, and does not output the set of unselected encoding parameters.

The prediction parameter encoding unit 111 derives a prediction parameter used when generating a prediction picture based on the parameter input from the prediction image generation unit 101, and encodes the derived prediction parameter to generate a set of encoding parameters. To do. The prediction parameter encoding unit 111 outputs the generated set of encoding parameters to the entropy encoding unit 104.

The prediction parameter encoding unit 111 stores, in the decoded picture management unit 109, a prediction parameter corresponding to the generated encoding parameter set selected by the encoding parameter determination unit 110.

The prediction parameter encoding unit 111 operates the inter prediction parameter encoding unit 112 when the prediction mode predMode input from the prediction image generation unit 101 indicates the inter prediction mode. The prediction parameter encoding unit 111 operates the intra prediction parameter encoding unit 113 when the prediction mode predMode indicates the intra prediction mode.

The inter prediction parameter encoding unit 112 derives an inter prediction parameter based on the prediction parameter input from the encoding parameter determination unit 110. The inter prediction parameter encoding unit 112 includes the same configuration as the configuration in which the inter prediction parameter decoding unit 303 (see FIG. 6 and the like) derives the inter prediction parameter as a configuration for deriving the inter prediction parameter. The configuration of the inter prediction parameter encoding unit 112 will be described later.

The intra prediction parameter encoding unit 113 determines the intra prediction mode IntraPredMode indicated by the prediction mode predMode input from the encoding parameter determination unit 110 as a set of inter prediction parameters.

Next, the configuration of the inter prediction parameter encoding unit 112 will be described. The inter prediction parameter encoding unit 112 is means corresponding to the inter prediction parameter decoding unit 303.

FIG. 13 is a schematic diagram illustrating a configuration of the inter prediction parameter encoding unit 112 according to the present embodiment.

The inter prediction parameter encoding unit 112 includes an inter prediction parameter encoding control unit 1031, a merge prediction parameter derivation unit 1121, an AMVP prediction parameter derivation unit 1122, a subtraction unit 1123, and a prediction parameter integration unit 1126.

The merge prediction parameter derivation unit 1121 has the same configuration as the merge prediction parameter derivation unit 3036 (see FIG. 7).

The inter prediction parameter encoding control unit 1031 instructs the entropy encoding unit 104 to decode a code related to the inter prediction (syntax element decoding), for example, a code (syntax element) included in the encoded data. , Merge flag merge_flag, merge index merge_idx, inter prediction flag inter_pred_idcinter_pred_idc, reference picture index refIdxLX, prediction vector index mvp_LX_idx, and difference vector mvdLX are encoded.

The inter prediction parameter encoding control unit 1031 includes an additional prediction flag encoding unit 10311, a merge index encoding unit 10312, a vector candidate index encoding unit 10313, and a split mode encoding unit, a merge flag encoding unit, an inter not shown. A prediction flag encoding unit, a reference picture index encoding unit, and a vector difference encoding unit are configured. The division mode encoding unit, the merge flag encoding unit, the merge index encoding unit, the inter prediction flag encoding unit, the reference picture index encoding unit, the vector candidate index encoding unit 10313, and the vector difference encoding unit are respectively divided modes. Part_mode, merge flag merge_flag, merge index merge_idx, inter prediction flag inter_pred_idcinter_pred_idc, reference picture index refIdxLX, prediction vector index mvp_LX_idx, and difference vector mvdLX are encoded.

The additional prediction flag encoding unit 1031 encodes the illumination compensation flag ic_enable_flag and the residual prediction flag iv_res_pred_weight_flsg to indicate whether or not additional prediction is performed.

The merge index merge_idx is input from the encoding parameter determination unit 110 to the merge prediction parameter derivation unit 1121 when the prediction mode predMode input from the prediction image generation unit 101 indicates the merge prediction mode. The merge index merge_idx is output to the prediction parameter integration unit 1126. The merge prediction parameter deriving unit 1121 reads the reference picture index refIdxLX and vector mvLX of the reference block indicated by the merge index merge_idx from the merge candidates from the decoded picture management unit 109. The merge candidate is a reference block (for example, a reference block in contact with the lower left end, upper left end, and upper right end of the encoding target block) within a predetermined range from the encoding target block to be encoded, This is a reference block for which the encoding process has been completed.

The AMVP prediction parameter derivation unit 1122 has the same configuration as the AMVP prediction parameter derivation unit 3032 (see FIG. 7).

The AMVP prediction parameter derivation unit 1122 receives the vector mvLX from the encoding parameter determination unit 110 when the prediction mode predMode input from the prediction image generation unit 101 indicates the inter prediction mode. The AMVP prediction parameter derivation unit 1122 derives a prediction vector mvpLX based on the input vector mvLX. The AMVP prediction parameter derivation unit 1122 outputs the derived prediction vector mvpLX to the subtraction unit 1123. Note that the reference picture index refIdx and the vector index mvp_LX_idx are output to the prediction parameter integration unit 1126.

The subtraction unit 1123 subtracts the prediction vector mvpLX input from the AMVP prediction parameter derivation unit 1122 from the vector mvLX input from the coding parameter determination unit 110 to generate a difference vector mvdLX. The difference vector mvdLX is output to the prediction parameter integration unit 1126.

When the prediction mode predMode input from the predicted image generation unit 101 indicates the merge prediction mode, the prediction parameter integration unit 1126 outputs the merge index merge_idx input from the encoding parameter determination unit 110 to the entropy encoding unit 104. To do.

When the prediction mode predMode input from the predicted image generation unit 101 indicates the inter prediction mode, the prediction parameter integration unit 1126 performs the following process.

The prediction parameter integration unit 1126 integrates the reference picture index refIdxLX and the vector index mvp_LX_idx input from the encoding parameter determination unit 110 and the difference vector mvdLX input from the subtraction unit 1123. The prediction parameter integration unit 1126 outputs the integrated code to the entropy encoding unit 104.

[Second Embodiment]
The second embodiment of the present invention will be described below with reference to the drawings. In the second embodiment, an example will be described in which a restriction is imposed on the bitstream so that the availability of the reference layer reference picture refIvRefPic can be determined. Hereinafter, the constraint on which the bitstream is to be imposed is referred to as bitstream conformance.

(Option B1): Bitstream conformance due to RPS restriction The prediction parameter encoding unit 111 of the image encoding device 11 may encode an ARP on / off flag at the slice level. In addition, when the inter prediction process is performed, the image encoding device 11 satisfies the following conditions (B1-1) and (B1-2) as bitstream conformance.

(B1-1) In the RPS of the target picture, there is a reference picture Pic with ViewIdx = refViewIdx [xP] [yP].

(B1-2) PicOrderCnt (Pic) is PicOrderCnt (arpRefPic), and ViewIdx is refViewIdx [xP] [yP].

The image decoding device 31 decodes the encoded data having the above bitstream conformance. By imposing restrictions on the ARP reference picture as a bit stream to be decoded by the image decoding device 31, it is possible to prevent the decoding process from failing because the ARP reference picture cannot be referred to.

Further, the image decoding device 31 may be configured as follows. That is, the image decoding device 31 receives the encoded stream generated according to the bitstream conformance, and sets the ARP on / off flag encoded at the slice level (slice header / segment header) to the inter prediction parameter decoding unit. In 303, decryption is performed. In addition, the residual prediction unit 3092 executes a residual prediction process according to the decoded ARP on / off flag. By encoding the ARP on / off in units of slice headers, the ARP can be turned off if the bitstream conformance cannot be observed in the reference picture list configuration of the reference layer.

(Option B2): Bitstream Conformance Based on RPL Constraint The prediction parameter encoding unit 111 of the image encoding device 11 may encode the ARP on / off flag at the slice level. In addition, when the inter prediction process is executed, the image encoding device 11 satisfies the following (B2-1) and (B2-2) as bitstream conformance.

(B2-1) The reference picture Pic is ViewIdx = refViewIdx [xP] [yP], and the slice type (slice_type) exists in the RefPicListX [0] of the first slice that is not the I slice of the target picture.

(B2-2) PicOrderCnt (Pic) is PicOrderCnt (arpRefPic) and ViewIdx is refViewIdx [xP] [yP].

Further, the image decoding device 31 may be configured as follows. That is, the image decoding device 31 receives the encoded stream generated according to the bit stream conformance, and the inter prediction parameter decoding unit 303 decodes the ARP on / off flag encoded at the slice level. In addition, the residual prediction unit 3092 executes a residual prediction process according to the decoded ARP on / off flag.

Note that the present invention is not limited to the above, and in (B2-1), a RefPicListX [0] of a predetermined slice among slices that are not I slices of the target picture can be used. For example, RefPicListX [0] of the last slice can be used. Further, in the above determination, not only the 0th picture in the reference picture list but also a reference picture at a predetermined position may be targeted.

(Option B3): It is determined whether or not a reference picture exists in the DPB. The prediction parameter encoding unit 111 of the image encoding device 11 encodes an ARP on / off flag at the slice level. In addition, when the inter prediction process is executed, the image encoding device 11 satisfies the following (B3-1) and (B3-2) as bitstream conformance.

(B3-1) The reference picture Pic is ViewIdx = refViewIdx [xP] [yP], PicOrderCnt (Pic) is PicOrderCnt (arpRefPic), and exists in the DPB.

(B3-2) The reference mark of the reference picture Pic is “use reference”.

[Third Embodiment]
Hereinafter, a third embodiment will be described. When the ARP reference picture arpRefPic is the reference picture RefPicListX [arpRefIdxLX] of the arpRefIdxLX of the reference picture list RefPicListX, the reference picture list differs depending on the picture. In the third embodiment, a configuration for preventing the position in the reference picture list of the reference picture arpRefPic in the target layer from changing in units of slices will be described. Note that arpRefIdxLX (X = 0 or 1) is an index for specifying the ARP reference picture arpRefPic.

The reference picture (arpRefPic and corresponding curIvRefPic) used for the ARP reference picture has the following bit stream conformance as (Option X1) or (Option X1 ′) so that it is the same between slices included in the target picture. It is preferable to introduce.

(Option X1)
When the ARP flag (residual prediction flag) of the picture to be encoded indicates that “ARP residual prediction is possible”, the image encoding device 11 performs the following (X1-1) and (X1-2) ) Is bitstream conformance.

(X1-1) In the picture to be encoded, the reference picture referenced by arpRefPicL0 when performing ARP using the reference picture of L0 is the same for all slices whose slice type (slice_type) is “P” To unify.

(X1-2) In the picture to be encoded, the reference picture referenced by arpRefPicL1 when performing ARP using the L1 reference picture is the same for all slices whose slice type (slice_type) is “P” To unify.

The image decoding device 31 decodes the encoded data having the above bitstream conformance. By imposing restrictions on the ARP reference picture as a bit stream to be decoded by the image decoding apparatus 31, an increase in the amount of data transfer due to the change of the ARP reference picture in the picture (such as no cache hit) Can be prevented.

Further, the image decoding device 31 may be configured as follows. That is, the image decoding apparatus 31 receives the encoded stream generated according to the bit stream conformance, and the inter prediction parameter decoding unit 303 decodes the ARP flag encoded at the slice level. Further, the residual prediction unit 3092 executes a residual prediction process according to the decoded ARP flag.

(Option X1 ')
When the ARP flag (residual prediction flag) of the picture to be encoded indicates that “ARP residual prediction is possible”, the image encoding device 11 performs the following (X1-1 ′) and (X1- Satisfying 2 ′) is defined as bit stream conformance.

(X1-1 ′) In a picture to be encoded, the reference picture referred to by RefPicList0 [arpRefIdxL0] is unified for all slices whose slice type (slice_type) is “P”.

(X1-2 ′) In the picture to be encoded, the reference picture referenced by RefPicList1 [arpRefIdxL1] is unified to the same for all slices whose slice type (slice_type) is “B”.

When the ARP reference picture arpRefPic is derived as the head element of the reference picture list RefPicListX, arpRefIdxL0 = 0 and arpRefIdxL1 = 0 may be set in the above bitstream conformance.

(Option X2)
The ARP reference picture arpRefPic is limited to RefPicListX [0] of the first slice whose slice type (slice_type) is not “I” in the picture.

For example, at the time of residual prediction, the reference image acquisition unit 30922 may acquire the ARP reference picture arpRefPic from RefPicListX [0] of the first slice whose slice type (slice_type) is not “I” in the picture. .

It should be noted that the present invention is not limited to the above, and a predetermined slice among slices other than the I slice of the target picture can be used. For example, it is possible to use RefPicListX [0] of the last slice between slices included in the current picture. In the above determination, not only the 0th reference picture list but also the reference picture (that is, RefPicListX [arpRefIdxLX]) at the predetermined position (arpRefIdxLX) may be targeted.

[Fourth Embodiment]
Hereinafter, a fourth embodiment will be described. In ARP, a residual is predicted using a reference picture in a reference picture order that is a POC different from the target picture order that is a POC of the target picture. When this reference picture order is derived as the POC of a specific reference picture in the reference picture list (for example, the POC of the first reference picture RefPicListX [0] of the reference picture list RefPicListX []), the reference picture order is the same as the target picture. May be equal. In this case, ARP cannot be used. In the fourth embodiment, in ARP, in order to prevent an inter-layer picture having the same POC as the target picture from being referred to as a reference picture having a POC different from the target picture, a reference picture to be used for the ARP is used. The method to identify is demonstrated.

(Option Y1):
In option Y1, bitstream conformance regarding a reference picture used for ARP will be described. When the ARP reference picture arpRefPic is the reference picture RefPicListX [arpRefIdxLX] of the arpRefIdxLX of the reference picture list RefPicListX, the image encoding device 11 indicates that the following (Y1-1) and (Y1-2) are satisfied. Let's do it.

(Y1-1) In a picture to which ARP is applied, DiffPicOrderCnt (RefPicList0 [arpRefIdxL0], currPic) is not 0 and the slice type (slice_type) is not I.

(Y1-2) In a picture to which ARP is applied, DiffPicOrderCnt (RefPicList1 [arpRefIdxL1], currPic) is not 0 and the slice type (slice_type) is B.

In the above, the expression “DiffPicOrderCnt (PicA, currPic) is not 0” can be replaced with “PicOrderCnt (PicA)! = PicOrderCntVal” (the same applies hereinafter).

The image decoding device 31 receives the encoded stream generated according to the bit stream conformance and executes a residual prediction process.

(Option Y2):
Any one of the inter prediction parameter decoding unit 303 and the entropy decoding unit 301 may include a reference image determination unit 3095.

Then, as shown in FIG. 15, the reference image determination unit 3095 may derive the ARP reference picture use flag arpRefPicAvailable flag at the parsing stage (CU level). Further, as shown in SYN21 and SYN22 in FIG. 15, at the parsing stage, that is, in either the inter prediction parameter decoding unit 303 or the entropy decoding unit 301, the reference image determination unit 3095 determines whether or not the ARP reference picture can be used (ARP The residual prediction flag (ARP flag) iv_res_pred_weight_idx may be decoded according to the reference picture usage flag arpRefPicAvailable). That is, in the parsing stage parameter decoding, when the inter-view residual prediction is available (iv_res_pred_flag [nuh_layer_id] is not 0) and the ARP reference picture arpRefPic is available (arpRefPicAvailable is not 0), iv_res_pred_weight_idx may be decoded.

The reference image determination unit 3095 derives arpRefPicAvailable according to the following equation (Y2-1) in the parsing stage.

arpRefPicAvailable =! (DiffPicOrderCnt (RefPicList0 [arpRefIdxL0], currPic) == 0)
&&! (DiffPicOrderCnt (RefPicList1 [arpRefIdxL1], currPic) == 0) && slice_type == B) ... (Y2-1)
In the equation (Y2-1), true (1) is set for arpRefPicAvailable when the following (Y2.1) and (Y2.2) are satisfied.
(Y2.1) The difference between the POC of the reference picture with the index arpRefIdxL0 in the L0 reference picture list and the POC (PicOrderCntVal) of the target picture currPic is not zero.
(Y2.2) The difference between the POC of the reference picture of the index arpRefIdxL1 of the L1 reference picture list and the POC (PicOrderCntVal) of the target picture currPic is not 0, and the slice type (slice_type) is B.

That is, either the POC of the reference picture arpRefPicL0 (RefPicList0 [arpRefIdxL0] in the above expression) of the L0 list or the POC of the reference picture arpRefPicL1 (RefPicList1 [arpRefIdxL1] in the above expression) of the L0 list is equal to the POC of the target picture currPic. Otherwise, the above formula (Y4-1) is true. Since the L1 list is used only when the slice type slice_type is B indicating that bi-directional prediction is possible, in the above formula, the condition relating to the L1 list is used only when slice_type == B. (Y2-1) can also be described as (Y2-1 ′) if it is modified.

arpRefPicAvailable =! (PicOrderCnt (RefPicListL0 [arpRefPicL0]) == PicOrderCntVal) ||! (PicOrderCnt (RefPicListL1 [arpRefPicL1]) == PicOrderCntVal) && slice_type == B) ... (Y2-1 ')
Here, PicOrderCnt (Pic) is a function for deriving the POC of Pic.

When the ARP reference picture arpRefPic is derived as the first element of the reference picture list RefPicListX, the following expression (Y2-1 ″) with arpRefIdxL0 = 0 and arpRefIdxL1 = 0 in the above expression (Y2-1) ) To derive arpRefPicAvailable.

arpRefPicAvailable =! (PicOrderCnt (RefPicListL0]) == PicOrderCntVal) ||! (PicOrderCnt (RefPicListL1 [0]) == PicOrderCntVal) && slice_type == B) ・・・ (Y2-1´´)
According to the reference image determination unit 3095 and the entropy decoding unit 301 (inter prediction parameter decoding unit 303) of the option Y2, depending on whether the ARP reference picture arpRefPic is equal to the POC of the target picture currPic, a residual prediction flag (ARP flag) Decode iv_res_pred_weight_idx. When ARP cannot be performed, that is, when the ARP reference picture arpRefPic is equal to the POC of the target picture currPic, iv_res_pred_weight_idx, which is a useless flag, is not decoded.

(Option Y3)
In option 3, the inter prediction image generation unit 309 performs determination in the motion compensation stage. Hereinafter, the picture (RefPicListX [arpRefIdxLX]) of the index arpRefIdxLX of the reference picture list is referred to as an ARP reference picture arpRefPic.

First, in the inter predicted image generation unit 309, the reference image determination unit 3095 refers to the reference picture arpRefPic depending on whether the picture order of the arpRefPic (POCPO (arpRefPic)) is equal to the picture order PicOrderCntVal of the target picture currPic. Set the layer reference picture availability flag (refIvRefPicAvailable2). Specifically, if DiffPicOrderCnt (arpRefPic, currPic) is not 0, refIvRefPicAvailable2 is set to 1; otherwise, refIvRefPicAvailable2 is set to 0.

The reference image determination unit 3095 sets a reference layer reference picture use flag (refIvRefPicAvailable2) for the reference picture arpRefPic depending on whether the picture order of arpRefPic (PicOrderCnt (arpRefPic)) is equal to the picture order PicOrderCntVal of the target picture currPic To do. Specifically, if DiffPicOrderCnt (arpRefPic, currPic) is not 0, refIvRefPicAvailable2 is set to non-zero, otherwise refIvRefPicAvailable2 is set to 0.

refIvRefPicAvailable2 =! DiffPicOrderCnt (arpRefPic, currPic) (Y3-1)
Here, the expression (Y3-1) can also be expressed as the following expression (Y3-1 ′).

refIvRefPicAvailable2 = PicOrderCnt (arpRefPic)! = PicOrderCntVal (Y3-1 ')
When the reference picture arpRefPic is derived from the element RefPicListX [arpRefIdxLX] of the reference index arpRefIdxLX of the reference picture list RefPicListX [], it is derived by the following equation (Y3-2).

refIvRefPicAvailable2 =! DiffPicOrderCnt (RefPicListX [arpRefIdxLX], currPic) (Y3-2)
Here, when residual prediction is performed on a motion compensated image derived from the LX (X = 0, 1) list, whether the reference picture index of the list LX is arpRefIdxLX is 0 or more (valid for the LX list) The equation (Y3-2) can also be expressed as the following equation (Y3-2 ′).

refIvRefPicAvailable2 = PicOrderCnt (RefPicListX [arpRefIdxLX])! = PicOrderCntVal
(Y3-2 ')
The residual prediction implementation flag deriving unit 30921 derives resPredFlag according to the following equation (C3-1) so that the implementation flag resPredFlag is non-zero only when the derived refIvRefPicAvailable2 is non-zero.

resPredFlag = (iv_res_pred_weight_idx! = 0) && arpRefPicAvailable2 Expression (C3-1)
When the residual prediction is limited to motion prediction, resPredFlag may be derived using the following equation (C3-2), which is a modification of equation (Y3-1) The same applies to the embodiment)
resPredFlag = (iv_res_pred_weight_idx! = 0) &&
(PicOrderCnt (RefPicListX [refIdxLX])! = PicOrderCntVal) && arpRefPicAvailable2
... (C3-2)
The residual prediction implementation flag deriving unit 30921 may derive the implementation flag resPredFlag using the refIvRefPicAvailable derived in the first embodiment together. Specifically, the following formula may be used by combining the formula (R-1) and the formula (Y3-1) of Embodiment 1.

resPredFlag = (iv_res_pred_weight_idx! = 0) && refIvRefPicAvailable && refIvRefPicAvailable2 (C3-3)
When the residual prediction is limited to motion prediction, resPredFlag may be derived using the following equation (C3-4), which is a modification of equation (Y3-3) The same applies to the embodiment)
resPredFlag = (iv_res_pred_weight_idx! = 0) &&
(PicOrderCnt (RefPicListX [refIdxLX])! = PicOrderCntVal) && arpRefPicAvailable1 && arpRefPicAvailable2 (C3-4)
According to the reference image determination unit 3095 and the residual prediction execution flag derivation unit 30921 (inter prediction image generation unit 309) of the option Y3, the motion compensation image is determined according to whether the ARP reference picture arpRefPic is equal to the POC of the target picture currPic. Determine whether to perform residual prediction (resPredFlag). When the ARP cannot be performed, that is, when the ARP reference picture arpRefPic is equal to the POC of the target picture currPic, the ARP operation is not performed, so that an invalid operation is avoided.

(Option Y4)
In the option Y4, either the inter prediction parameter decoding unit 303 or the entropy decoding unit 301 is configured to include the reference image determination unit 3095.

Also, in option Y4, the determination at the parsing stage in the inter prediction parameter decoding unit 303 and the entropy decoding unit 301 and the determination at the motion compensation stage in the inter prediction image generation unit 309 are used together.

In the parsing stage, in either the inter prediction parameter decoding unit 303 or the entropy decoding unit 301, the reference image determination unit 3095 derives the arpRefPicAvailable shown in FIG. 15 according to the above equation (Y2-1) and its modification. May be.

Also, the inter prediction parameter decoding unit 303 and the entropy decoding unit 301 may decode the residual prediction flag (ARP flag) iv_res_pred_weight_idx according to the ARP reference picture use flag arpRefPicAvailable flag. That is, as shown in FIG. 16, iv_res_pred_weight_idx may be decoded only when the arpRefPicAvailable flag is true.

On the other hand, in the motion compensation stage, the reference image determination unit 3095 is modified to perform the determination as follows.

The reference image determination unit 3095 derives refIvRefPicAvailable2 in accordance with the above equation (Y3-1) and its modifications.

Also, the residual prediction execution flag deriving unit 30921 derives resPredFlag according to the equation (Y3-1) described in the third embodiment and its modification.

According to the reference image determination unit 3095 and the entropy decoding unit 301 (inter prediction parameter decoding unit 303) of the option Y4, similarly to the option Y2, the iv_res_pred_weight_idx becomes a useless flag when the ARP reference picture arpRefPic is equal to the POC of the target picture currPic. Therefore, the amount of code can be reduced.

According to the reference image determination unit 3095 and the residual prediction execution flag deriving unit 30921 (inter prediction image generation unit 309) of the option Y4, as in the case of the option Y3, that is, the ARP reference picture arpRefPic is the target. By not performing the ARP operation when it is equal to the POC of the picture currPic, there is an effect of avoiding an invalid operation.

(Option Y5)
In option 5, when the ARP reference picture arpRefPic is the reference picture RefPicListX [arpRefIdxLX] of the arpRefIdxLX of the reference picture list RefPicListX, the arpRefPic is not an inter-layer picture, that is, if possible, it is the POC of the target picture Select arpRefIdxLX to set arpRefPic having a POC different from PictureOrderCntVal.

The reference image determination unit 3095 of this embodiment also derives a reference picture arpRefPic.

The reference image determination unit 3095 searches for an element of the reference picture list RefPicListX [] that satisfies the condition that it is not an inter-layer picture with respect to X = 0 or に従い 1 according to the following pseudo code, an ARP reference picture index arpRefIdxLX, and arpRefPicAvailable And may be derived.

In the following pseudo code, X = 0 or 1. As initialization processing before execution of the following pseudo code, arpRefIdxL0 and arpRefIdxL1 are set to negative values (in this case, −1) which are invalid values as reference picture indexes, and arpRefPicAvailable is 0 indicating false. Set to.
================================================== ============================== for (i = 0; i <= num_ref_idx_lX_active_minus1; i ++) {
if (DiffPicOrderCnt (RefPicListX [i], currPic)) {
arpRefIdxLX = i
arpRefPicAvailable = 1
}
}
================================================== ==============================
In the above pseudo code, the reference picture RefPicListX [i] of index i from 0 to num_ref_idx_lX_active_minus1 in the reference picture list RefPicListX [] is sequentially scanned, and the POC of the reference picture RefPicListX [i] and the POC of the target picture currPic (PicOrderCntVal) The reference picture arpRefPicLX that satisfies the condition that the difference between and is not 0 is searched. Regarding the arpRefIdxLX obtained by the pseudo code, the reference image determination unit 3095 may adopt RefPIcListX [arpRefIdxLX] as the ARP reference picture arpRefPic.

The above processing can also be performed using the following pseudo code.

arpRefIdxL0 = -1
for (i = 0; i <= num_ref_idx_l0_active_minus1 && arpRefIdxL0 <0; i ++)
if (PicOrderCnt (RefPicList0 [i])! = PicOrderCntVal)
arpRefIdxL0 = i
arpRefIdxL1 = -1
if (slice_type! = B)
for (i = 0; i <= num_ref_idx_l1_active_minus1 && arpRefIdxL1 <0; i ++)
if (PicOrderCnt (RefPicList1 [i])! = PicOrderCntVal)
arpRefIdxL1 = i
In the above pseudo code, each reference picture list LX with X = 0 and X = 1 is initialized with a reference picture index arpRefIdxLX with a value less than 0 (here, −1), and then the reference picture list RefPicListX [ ] Is sequentially scanned from 0 to num_ref_idx_lX_active_minus1, and the reference picture (inter-layer picture) that satisfies the different POC of the reference picture RefPicListX [i] and the POC (PicOrderCntVal) of the target picture (Reference picture other than) and the reference picture index i of the reference picture satisfying the condition is held as arpRefIdxLX. In the initial state, as the reference picture index arpRefIdxLX, a negative value (here, -1) is given, and the above search is performed. When a reference picture that satisfies the condition is obtained, arpRefIdxLX is set to a value of 0 or more. The search ends when a reference picture that satisfies the condition is obtained, that is, when arpRefIdxLX becomes 0 or more.

Option Y5 may also perform determination at the parsing stage in the inter prediction parameter decoding unit 303 and the entropy decoding unit 301, similarly to the option Y2 and the option Y4. In this case, either the inter prediction parameter decoding unit 303 or the entropy decoding unit 301 is configured to include the reference image determination unit 3095.

In the parsing stage, in either the inter prediction parameter decoding unit 303 or the entropy decoding unit 301, the reference image determination unit 3095 follows the above equation (Y2-1) and the modification example (Y2-1 ′) as described above. Although it is possible to derive arpRefPicAvailable shown in FIG. 15, it is also possible to derive it using the following equation (Y5-1).

arpRefPicAvailable = arpRefIdxL0> = 0 || arpRefIdxL1> = 0
Here, in the parsing stage, the reference image determination unit 3095 has a value (0 or more) in which the arpRefIdxL0 that is the reference picture index of L0 is valid or a value (0 or more) in which the arpRefIdxL1 that is the reference picture index of L1 is valid. In this case, arpRefPicAvailable is set to true (a value other than 0). That is, when a picture other than the inter-layer picture is found in either the L0 reference picture list or the L1 reference picture list, arpRefPicAvailable is set to true (non-zero).

Also, the option Y5 can also make a determination at the motion compensation stage in the inter-predicted image generation unit 309, like the options Y3 and Y4.

In the motion compensation stage, in the inter prediction image generation unit 309, the reference image determination unit 3095 can derive refIvRefPicAvailable2 in accordance with the above-described equation (Y3-1) and its modification as described above. It is also possible to derive in the equation (Y5-2).

refIvRefPicAvailable2 = arpRefIdxLX> = 0 Formula (Y5-2)
Here, in the motion compensation stage, when residual prediction is performed on a motion compensated image derived from the LX (X = 0, 1) list, whether the reference picture index arpRefIdxLX of the list LX is 0 or more The determination is made based on (whether there is a valid reference picture arpRefPic in the LX list).

Also, the residual prediction execution flag deriving unit 30921 derives resPredFlag according to the equation (C3-1) described in the option Y3 and its modification.

According to the reference image determination unit 3095 of option Y5, the arpRefIdxLX is selected so that the arpRefPic having a POC different from the PictureOrderCntVal that is the POC of the target picture is selected. Therefore, there is an effect of eliminating the problem that the ARP does not operate.

Further, when the parsing stage is determined, according to the reference image determination unit 3095 and the entropy decoding unit 301 (inter prediction parameter decoding unit 303) of the option Y5, the ARP reference picture arpRefPic is the target picture currPic as in the option Y2. Since iv_res_pred_weight_idx, which is a useless flag when it is equal to the POC, is not decoded, the code amount can be reduced.

Further, when the determination of the motion compensation stage is performed, according to the reference image determination unit 3095 and the residual prediction execution flag deriving unit 30921 (inter prediction image generation unit 309) of the option Y5, ARP is performed similarly to the option Y3. If this is not possible, that is, if the ARP reference picture arpRefPic is equal to the POC of the target picture currPic, the ARP operation is not performed, so that an invalid operation is avoided.

(Modification of option Y5)
Hereinafter, a modified example of the option Y5 will be described. In the modified example of the option Y5, the reference picture is determined so as to set an arpRefPic having a POC different from the PictureOrderCntVal that is the POC of the target picture.

The reference image determination unit 3095 may derive the ARP reference picture arpRecPic from the reference pictures included in the reference picture list RefPicListX according to the following pseudo code. In addition, as an initialization process before the execution of the following pseudo code, miniDiffPOC indicating the provisional minimum value of the POC difference is set to a sufficiently large predetermined value (eg, 2 ^{16 in this case} ), and ARP reference is made. ArpRefPicAvailable, which is a flag indicating availability of a picture, is set to 0 (false).
================================================== ======================= for (rIdx = 0; rIdx <= num_ref_idx_lX_active_minus1; rIdx ++) {
if (isNearerPOC (rIdx, minDiffPOC)) {
minDiffPOC = abs (DiffPicOrderCnt (RefPicListX [rIdx], currPic)
rIdxSel = rIdx
arpRefPicAvailable = 1}
}
================================================== ==============================
In the above pseudo code, the for loop processing is executed for each reference picture included in the reference picture list.

Here, isNearerPOC (rIdx, minDiffPOC) includes the rIdx-th reference picture of the reference picture list included in the RPS of the target picture, and ViewIdxId = RefViewIdx [xP] [yP], and POC is the following conditional expression C ( This function determines whether or not there is a reference picture that satisfies 1-1).

(The reference picture has the same POC as the reference picture stored in RefPicListX [rIdx]) &&minDiffPOC> abs (DiffPicOrderCnt (RefPicListX [rIdx], currPic)) (C1-1)
In the conditional expression (C1-1), DiffPicOrderCnt (PicA, PicB) is a function for obtaining a value obtained by subtracting the POC of the picture PicB from the POC of the picture PicA. (The same applies to the following configurations). Since the POC of the target picture currPic is PicOrderCntVal, DiffPicOrderCnt (picA, currPic) may be POC (picA) -PicOrderCntVal. In the conditional expression C (1-1), there is a reference picture having the same POC as the reference picture stored in RefPicListX [rIdx], and the POC of the rIdx-th reference picture and the POC (PicOrderCntVal of the target picture currPic) ) And the absolute value of the difference is less than minDiffPOC, it is “true”.

When the conditional expression (C1-1) is “true”, minDiffPOC is updated by abs (DiffPicOrderCnt (RefPicListX [rIdx], currPic), and rIdx becomes a tentative ARP reference picture candidate (rIdxSel). At this time, arpRefPicAvailable is set to 1.

In this way, by executing the for loop for each reference picture, a reference picture having a POC closest to the POC of the reference picture is derived as an index indicating the ARP reference picture (rIdxSel).

That is, the reference image determination unit 3095 derives the ARP reference picture arpRefPi as the reference picture RefPicListX [rIdxSel] of the index rIdxSel of the reference picture list RefPicListX [] for the rIdxSel obtained in this way.

According to the reference image determination unit 3095 of the modified example of the option Y5, the arpRefIdxLX is selected so that the arpRefPic having a POC different from the PictureOrderCntVal that is the POC of the target picture is set. Therefore, the POC of the arpRefPic is the POC of the target picture Since it is equal to PictureOrderCntVal, the effect that ARP does not operate is solved.

Furthermore, the parsing stage may be determined in the same manner as option Y5. In this case, according to the reference image determination unit 3095 and the entropy decoding unit 301 (inter prediction parameter decoding unit 303) of the option Y5, similarly to the option Y2, it is useless when the ARP reference picture arpRefPic is equal to the POC of the target picture currPic. Since iv_res_pred_weight_idx, which is a bad flag, is not decoded, the code amount can be reduced.

Furthermore, the motion compensation stage may be determined in the same manner as option Y5. In this case, according to the reference image determination unit 3095 and the residual prediction execution flag deriving unit 30921 (inter prediction image generation unit 309) of the option Y5, as in the case of the option Y3, that is, the ARP When the reference picture arpRefPic is equal to the POC of the target picture currPic, by not performing the ARP operation, there is an effect of avoiding an invalid operation.

In addition, you may make it implement | achieve each part of the image coding apparatus 11 in the embodiment mentioned above, and the image decoding apparatus 31 with a computer. In that case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. The “computer system” here is a computer system built in the image encoding device 11 and the image decoding device 31, and includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM or a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, In such a case, a volatile memory inside a computer system serving as a server or a client may be included and a program that holds a program for a certain period of time. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

Further, part or all of the image encoding device 11 and the image decoding device 31 in the above-described embodiment may be realized as an integrated circuit such as an LSI (Large Scale Integration). Each functional block of the image encoding device 11 and the image decoding device 31 may be individually made into a processor, or a part or all of them may be integrated into a processor. Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.

As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to

[Summary]
An image decoding device according to aspect 1 of the present invention is an image decoding device that generates a predicted image of a target picture by applying residual prediction using a reference layer different from the target picture to the motion compensated image. A reference picture determination unit (reference image determination unit 3095) for determining whether or not a reference picture of the reference layer having a picture order different from the picture order of the target picture is available, and at least according to the determination result, A residual prediction application unit (residual synthesis) that applies the residual prediction based on the reference picture of the reference layer and the decoded picture of the reference layer having the same picture order as the target picture to the motion compensated image Part 30923).

The target picture is a picture to be decoded. The reference layer is a layer different from the target layer to which the target picture belongs. When performing residual prediction in the reference layer, reference pictures belonging to the reference layer are referred to.

The residual prediction is a technique for estimating the residual in the reference layer as the residual in the target layer. The residual prediction is performed based on the reference picture of the reference layer having a picture order different from the picture order of the target picture and the decoded picture of the reference layer having the same picture order as the picture order of the target picture.

According to the above configuration, the residual prediction can be applied to the motion compensated image depending on whether or not a reference picture of the reference layer is available.

Therefore, it is possible to avoid a situation in which the decoded picture of the reference layer cannot be used at the time of residual prediction for the target picture.

In the image decoding device according to aspect 2 of the present invention, in the aspect 1, the reference picture determination unit refers to the reference layer in a reference picture set indicating a picture to be referred to when decoding the decoded picture of the reference layer. The above determination may be performed depending on whether or not a picture exists.

The reference picture set is derived for each picture. Therefore, it is commonly used in decoding of slices included in the target picture.

As in the above configuration, by determining the presence or absence of the reference picture of the reference layer based on a reference picture set, it is possible to determine the presence or absence of the reference picture of the reference layer based on a common criterion between slices. it can.

In the image decoding device according to aspect 3 of the present invention, in the aspect 1, the reference picture determination unit refers to the reference layer at a predetermined position in a reference picture list of a slice that is not the predetermined I-th slice of the target picture. The above determination may be performed depending on whether or not a picture exists.

According to the above configuration, since a predetermined position in the reference picture list of a slice that is not the predetermined I-th slice of the target picture is used for determination between slices included in the target picture, the determination process can be shared. .

In the image decoding device according to aspect 4 of the present invention, in the aspect 1, the reference picture determination unit performs the determination according to whether or not the reference picture of the reference layer is stored in a DPB (Decoded Picture Buffer). May be performed.

Thus, it is also possible to make a determination according to whether or not the reference picture of the reference layer is defined in the DPB that holds the decoded picture.

In the image decoding device according to aspect 5 of the present invention, in the aspect 4, the reference picture determination unit determines whether or not the reference mark in the DPB (Decoded Picture Buffer) of the reference picture of the reference layer is “reference use”. Depending on whether or not, the above determination may be made.

Some decoders do not support DPB processing as defined by HRD (Hypothetical Reference Decoder), so the DPB state may be inaccurate. Therefore, by checking the reference mark in the DPB (Decoded Picture Buffer) of the reference picture of the reference layer, it is possible to more accurately determine whether or not the reference picture of the reference layer is usable.

That is, when the reference mark is “reference use”, it may be determined that the reference picture of the reference layer exists.

The image decoding device according to aspect 6 of the present invention is an image decoding device that generates a predicted image of a target picture by applying residual prediction using a reference layer different from the target picture to the motion compensated image. Reference for obtaining a reference picture belonging to a target layer to which the target picture belongs and having a picture order different from the picture order of the target picture, from a predetermined position in a reference picture list of a slice that is not the predetermined I-th slice of the target picture A picture acquisition unit (reference image acquisition unit 30922), a reference picture of the reference layer having the same picture order as the picture order of the acquired reference picture, and a reference layer having the same picture order as the picture order of the target picture The residual prediction based on the decoded picture of Comprising residual prediction application unit that applies to the image (the residual synthesis section 30923), the.

According to the above configuration, a reference picture (a so-called ARP reference picture) belonging to the target layer to which the target picture belongs and having a picture order different from the picture order of the target picture is not a predetermined I-th slice of the target picture. Obtained from a predetermined position in the reference picture list of the slice.

The slice that is not the predetermined I-slice of the target picture is preferably a slice that is not the first I-slice of the target picture. Further, the predetermined position of the reference picture list may be the 0th position (first position) of the reference picture list. If the slices included in the target picture are common, the slice number and the position of the reference picture list are arbitrary.

According to the above configuration, the determination of the ARP reference picture can be made common among the slices included in the target picture.

The image decoding device according to aspect 7 of the present invention is an image decoding device that generates a predicted image of a target picture by applying residual prediction using a reference layer different from the target picture to the motion compensated image. A flag decoding unit (inter prediction parameter decoding unit 303) that decodes a residual prediction execution flag that instructs execution of residual prediction, and a reference picture of the reference layer that has a picture order different from the picture order of the target picture. A receiving unit that receives a usable bitstream, and a residual prediction execution unit (residual prediction unit 3092) that executes the residual prediction according to the residual prediction execution flag.

An image encoding device according to aspect 8 of the present invention is an image encoding device that generates a predicted image of a target picture by applying residual prediction using a reference layer different from the target picture to the motion compensated image. A flag encoding unit (prediction parameter encoding unit 111) that encodes a residual prediction execution flag that instructs execution of the residual prediction, and the reference layer having a picture order different from the picture order of the target picture A bit stream generation unit (prediction parameter encoding unit 111) that generates a bit stream in which the reference picture can be used, and a bit stream transmission unit (entropy code) that transmits the generated bit stream to the image decoding apparatus And a conversion unit 104).

The image decoding device according to aspect 7 or the image encoding device according to aspect 8 includes feature points corresponding to the image decoding device according to aspect 1. Therefore, the image decoding device according to aspect 7 or the image encoding device according to aspect 8 can achieve the same effects as those of the image decoding device according to aspect 1.

The image decoding device according to aspect 9 of the present invention is an image decoding device that generates a prediction image of a target picture by applying residual prediction using a reference layer of a layer different from the target picture to the motion compensated image. Among the reference pictures included in the picture list, a reference picture selection unit that selects a reference picture that belongs to the target layer to which the target picture belongs and has a picture order different from the picture order of the target picture, and the selected reference The residual prediction based on the reference picture of the reference layer having the same picture order as the picture order of the picture and the decoded picture of the reference layer having the same picture order as the picture order of the target picture, and the motion compensation A residual prediction application unit to be applied to the image.

According to the above configuration, it is possible to ensure that the arpRefpic POC is different from the POC of the current picture. That is, the picture order of the reference picture of the reference layer having the same picture order as the picture order of the selected reference picture, and the picture order of the decoded picture of the reference layer having the same picture order as the picture order of the target picture Can be guaranteed to be different.

The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

The present invention can be suitably applied to an image decoding apparatus that decodes encoded data obtained by encoding image data and an image encoding apparatus that generates encoded data obtained by encoding image data. Further, the present invention can be suitably applied to the data structure of encoded data generated by an image encoding device and referenced by the image decoding device.

DESCRIPTION OF SYMBOLS 1 ... Image transmission system 11 ... Image encoding apparatus 101 ... Prediction image generation part 102 ... Subtraction part 103 ... DCT / quantization part 104 ... Entropy encoding part 105 ... Inverse quantization / inverse DCT part 106 ... Addition part 109 ... Decoding Picture management unit 110 ... coding parameter determination unit 111 ... prediction parameter coding unit (flag coding unit, bit stream generation unit)
DESCRIPTION OF SYMBOLS 112 ... Inter prediction parameter encoding part 1121 ... Merge prediction parameter derivation part 1122 ... AMVP prediction parameter derivation part 1123 ... Subtraction part 1126 ... Prediction parameter integration part 113 ... Intra prediction parameter encoding part 21 ... Network 31 ... Image decoding apparatus 301 ... Entropy decoding unit 302 ... Prediction parameter decoding unit 303 ... Inter prediction parameter decoding unit (flag decoding unit)
3031 ... Inter prediction parameter decoding control unit 3032 ... AMVP prediction parameter derivation unit 3035 ... Adder 304 ... Intra prediction parameter decoding unit 306 ... Decoded picture management unit 3061 ... DPB
3062 ... RPS derivation unit 3063 ... Reference picture control unit 3064 ... Reference layer picture control unit 3065 ... RPL derivation unit 3066 ... Output control unit 3067 ... Prediction parameter memory 308 ... Prediction image generation unit 309 ... Inter prediction image generation unit 3091 ... Displacement compensation Unit 3092 ... Residual prediction unit (residual prediction execution unit)
30921 ... Residual prediction execution flag deriving unit 30922 ... Reference image acquisition unit (reference picture acquisition unit)
30923 ... Residual synthesis unit (residual prediction application unit)
3093: Illuminance compensation unit 3094 ... Weight prediction unit 3095 ... Reference image determination unit (reference picture determination unit)
310 ... Intra predicted image generation unit 311 ... Inverse quantization / inverse DCT unit 312 ... Adder 313 ... Residual storage unit 41 ... Image display device

Claims

A reference picture determination unit that determines whether a residual prediction reference picture is available;
A residual prediction applying unit that performs residual prediction using the residual prediction reference picture, wherein the reference picture determination unit is configured to determine whether a reference picture of the residual prediction reference layer is stored in a DPB (Decoded Picture Buffer) An image decoding apparatus that performs the above determination according to whether or not.
2. The image according to claim 1, wherein the reference picture determination unit performs the determination based on information indicating whether or not the reference picture of the reference layer can be referred to in a DPB (Decoded Picture Buffer). Decoding device.
A reference picture deriving unit for deriving a reference picture for residual prediction, and a residual prediction applying unit for performing residual prediction using the residual prediction reference picture,
The reference picture deriving unit derives, as the residual prediction reference picture, a reference picture having a picture order different from the picture order of the target picture among the reference pictures included in the reference picture list .
The reference picture deriving unit sequentially scans the reference picture list from the top, and derives a reference picture when the POC of the reference picture RefPicListX [i] and the POC (PicOrderCntVal) of the target picture currPic are different as the residual prediction reference picture The image decoding apparatus according to claim 3, wherein:
A reference picture selection unit that derives a reference picture for residual prediction, and a residual prediction application unit that performs residual prediction using the residual prediction reference picture when the residual prediction reference picture is available ,
The reference picture list is scanned in order from the top, and if the absolute value of the difference between the POC of the reference picture RefPicListX [i] and the POC (PicOrderCntVal) of the target picture currPic is smaller than the POC difference, the reference picture is determined as a residual. An image decoding apparatus characterized by being set to a reference picture for prediction.
A reference picture deriving unit for deriving a reference picture having a picture order different from the picture order of the target picture among the reference pictures included in the reference picture list as a residual prediction reference picture;
When the residual prediction reference picture is available,
An image decoding apparatus including an inter prediction parameter decoding control unit that decodes a residual prediction flag iv_res_pred_weight_idx.