WO2018230493A1

WO2018230493A1 - Video decoding device, video encoding device, prediction image generation device and motion vector derivation device

Info

Publication number: WO2018230493A1
Application number: PCT/JP2018/022196
Authority: WO
Inventors: 知宏猪飼
Original assignee: シャープ株式会社
Priority date: 2017-06-14
Filing date: 2018-06-11
Publication date: 2018-12-20

Abstract

According to the present invention, a video decoding device (31) is provided with a prediction image generation unit (308) using at least one of a single prediction mode, a double prediction mode and a BIO mode for generating a prediction image. The prediction image generation unit (308) prevents the generation of a prediction image using the BIO mode, when a reference block in at least one of a first reference image and a second reference image is outside a reference image screen.

Description

Video decoding device, video encoding device, predicted image generation device, and motion vector derivation device

The present invention relates to a video decoding device, a video encoding device, a predicted image generation device, and a motion vector derivation device.

In order to efficiently transmit or record a moving image, a moving image encoding device that generates encoded data by encoding the moving image, and a moving image that generates a decoded image by decoding the encoded data An image decoding device is used.

Specific examples of the moving image encoding method include a method proposed in H.264 / AVC and HEVC (High-Efficiency Video Coding).

In such a moving image coding system, an image (picture) constituting a moving image is a slice obtained by dividing the image, a coding tree unit (CTU: Coding Tree Unit obtained by dividing the slice). ), A coding unit obtained by dividing a coding tree unit (sometimes called a coding unit (Coding Unit: CU)), and a prediction unit which is a block obtained by dividing a coding unit (PU) and a hierarchical structure composed of conversion units (TU), and encoded / decoded for each CU.

In such a moving image coding method, a predicted image is usually generated based on a local decoded image obtained by encoding / decoding an input image, and the predicted image is generated from the input image (original image). A prediction residual obtained by subtraction (sometimes referred to as “difference image” or “residual image”) is encoded. Examples of methods for generating a predicted image include inter-screen prediction (inter prediction) and intra-screen prediction (intra prediction).

Also, Non-Patent Documents 1 to 3 can be cited as recent moving picture encoding and decoding techniques. Non-Patent Document 1 describes a technique related to BIO (bi-directional optical flow: bi-predictive gradient change) prediction that is corrected using gradient information in motion compensation processing when generating a predicted image. Non-Patent Document 2 describes a technique using affine prediction. Non-Patent Document 3 describes a technique for searching for a motion vector by matching in an encoding device and a decoding device.

When a predicted image is generated using the above-described technique, the memory bandwidth becomes large, so a technique for reducing the memory bandwidth is required.

The present invention aims to reduce the memory bandwidth.

In order to solve the above problem, a predicted image generation apparatus according to an aspect of the present invention is a predicted image generation apparatus that generates a predicted image by performing motion compensation on one or more reference images, Single prediction mode for generating a predicted image with reference to the first reference image, bi-prediction mode for generating a predicted image with reference to the first reference image and the second reference image, and the first reference image A predicted image generation unit that generates a predicted image using at least one of the BIO modes for generating a predicted image with reference to the second reference image and the gradient correction term, wherein the predicted image generation unit includes: When the reference block in at least one of the reference image and the second reference image is outside the screen of the reference image, the generation of the predicted image using the BIO mode is prohibited.

In order to solve the above problem, a predicted image generation apparatus according to an aspect of the present invention is a predicted image generation apparatus that generates a predicted image by performing motion compensation on a plurality of reference images. A predicted image generation unit that generates a predicted image using at least a plurality of modes including a BIO mode that generates a predicted image with reference to the reference image, the second reference image, and the gradient correction term. The unit generates a pixel value outside the reading region that is a pixel value outside the reading region with respect to the corresponding block in at least one of the first reference image and the second reference image, and uses the BIO mode to generate a predicted image When generating, the generation process of the pixel value outside the reading area along the vertical direction or the horizontal direction by the predicted image generation unit is prohibited.

In order to solve the above problem, a motion vector deriving device according to an aspect of the present invention is a motion vector deriving device for deriving a motion vector of each of subblocks constituting a target block. A motion vector deriving unit that calculates a motion vector of each sub-block with reference to motion vectors at a plurality of control points set in a reference block sharing a vertex with the target block, Restricts the range of the difference of each motion vector at the plurality of control points.

In order to solve the above-described problem, a motion vector derivation device according to an aspect of the present invention provides a motion vector derivation that generates a motion vector referred to in order to generate a prediction image used for encoding or decoding a moving image. In the apparatus, a first motion vector search unit that searches for a motion vector for each prediction block by matching processing and a motion vector selected by the first motion vector search unit are included in the prediction block. A second motion vector search unit that searches for a motion vector by a matching process for each of the plurality of sub-blocks, and the first motion vector search unit performs an initial vector search for a prediction block; The second motion vector search unit searches for a motion vector by performing a local search. A motion vector is searched by performing a local search after performing an initial vector search for a block, and a local search by the first motion vector search unit and a local search by the second motion vector search unit In at least one of the target searches, a search in an oblique direction is prohibited.

According to the above configuration, the memory bandwidth can be reduced.

It is a figure which shows the hierarchical structure of the data of an encoding stream. It is a figure which shows the pattern of PU division | segmentation mode. (A) to (h) respectively show the partition shapes when the PU partitioning modes are 2Nx2N, 2NxN, 2NxnU, 2NxnD, Nx2N, nLx2N, nRx2N, and NxN. It is a conceptual diagram which shows an example of a reference picture and a reference picture list. It is a block diagram which shows the structure of the image coding apparatus which concerns on Embodiment 1. FIG. It is a block diagram which shows the structure of the image decoding apparatus which concerns on Embodiment 1. It is a block diagram which shows the structure of the inter estimated image generation part of the image coding apparatus which concerns on Embodiment 1. FIG. It is a block diagram which shows the structure of the merge prediction parameter derivation | leading-out part which concerns on Embodiment 1. FIG. 3 is a block diagram illustrating a configuration of an AMVP prediction parameter derivation unit according to Embodiment 1. FIG. It is a block diagram which shows the structure of the inter prediction parameter encoding part of the image coding apparatus which concerns on Embodiment 1. FIG. It is a block diagram which shows the structure of the inter estimated image generation part of the image decoding apparatus which concerns on Embodiment 1. FIG. It is a block diagram which shows the structure of the inter prediction parameter decoding part which concerns on Embodiment 1. FIG. It is a flowchart explaining the flow of the process in which the motion compensation part 3091 provided with the motion compensation function using BIO prediction derives | leads-out a predicted image. It is a figure which shows an example of the flow which derives | leads-out the concept of gradient change. It is a figure explaining the method to derive | lead-out a correction | amendment weight vector depending on a target image and a reference image. (A) is a specific example of the motion compensation filter mcFilter, and (b) is a specific example of the gradient filter. It is a figure explaining the method to derive | lead-out a correction | amendment weight vector using the least squares method. It is a block diagram which shows the structural example of a motion compensation part. It is a flowchart which shows an example of the flow of the process of BIO prediction which a motion compensation part performs. It is a figure which shows the area | region where the motion compensation part in the image decoding apparatus which concerns on Embodiment 1 performs BIO padding. It is a figure which shows the example in case the motion compensation part in the image decoding apparatus which concerns on Embodiment 1 performs off-screen padding. 5 is a flowchart illustrating an example of a flow of processing in which a motion compensation unit in the image decoding device according to Embodiment 1 generates a predicted image. 12 is a flowchart illustrating another example of a flow of processing in which the motion compensation unit in the image decoding device according to Embodiment 1 generates a predicted image. It is a block diagram which shows the structure of the image decoding apparatus which concerns on Embodiment 2. It is the schematic which shows the structure of the inter estimated image generation part of the image coding apparatus which concerns on Embodiment 2. FIG. It is a figure which shows the area | region where the motion compensation part in the image decoding apparatus which concerns on Embodiment 2 performs BIO padding. It is a figure which shows the area | region where the motion compensation part in the image decoding apparatus which concerns on Embodiment 2 performs BIO padding. It is a figure which shows the motion vector in an adjacent block. 12 is a flowchart illustrating an example of a flow of processing in which the motion compensation unit in the image decoding device according to the second embodiment determines whether to prohibit BIO padding in the horizontal direction. 10 is a flowchart illustrating an example of a flow of processing for determining whether or not the motion compensation unit in the image decoding device according to the second embodiment prohibits BIO padding in the vertical direction. 12 is a flowchart illustrating an example of a flow of processing for determining whether or not the motion compensation unit in the image decoding device according to the second embodiment prohibits BIO padding in the horizontal direction or the vertical direction. It is the schematic of the corresponding block referred when the motion compensation part in the image decoding apparatus which concerns on Embodiment 2 uses BIO mode. 14 is a flowchart illustrating an operation of motion vector decoding processing of the image decoding apparatus according to the third embodiment. It is a block diagram which shows the structure of the inter prediction parameter encoding part of the image coding apparatus which concerns on Embodiment 3. FIG. It is a block diagram which shows the structure of the inter prediction parameter decoding part which concerns on Embodiment 3. FIG. It is a figure which shows the example which derives | leads-out the motion vector spMvLX [xi] [yj] of each subblock which comprises PU (horizontal width nPbW) which is the object which estimates a motion vector. (A) is a figure for demonstrating bilateral matching (Bilateral (matching) matching). (B) is a figure for demonstrating template matching (Template | matching). It is a figure which shows the example of the position of the prediction unit utilized for derivation | leading-out of the motion vector of the control point in AMVP mode and merge mode. It is a figure which shows the example of the subblock of a square with one side BW in which the control point V0 of the object block (horizontal width W, height H) which is an object which predicts a motion vector is located in an upper left vertex. It is a flowchart figure which shows the outline | summary of a subblock size determination flow. It is a block diagram which shows the structure of the image coding apparatus which concerns on Embodiment 4. It is a block diagram which shows the structure of the image decoding apparatus which concerns on Embodiment 4. It is a block diagram which shows the structure of the inter prediction parameter encoding part of the image coding apparatus which concerns on Embodiment 4. FIG. It is a block diagram which shows the structure of the inter prediction parameter decoding part which concerns on Embodiment 4. FIG. It is a flowchart figure which shows the outline | summary of a motion prediction mode determination flow. It is a sequence diagram which shows the flow of a motion prediction mode determination flow. It is a flowchart figure which shows the flow of a pattern match vector derivation process. (A) And (b) is a figure for demonstrating a motion search pattern. It is a flowchart figure which shows the outline | summary of the flow which determines whether a prediction image is produced | generated. It is a flowchart figure which shows the outline | summary of the flow which determines whether a prediction image is produced | generated. It is the figure shown about the structure of the transmitter which mounts an image coding apparatus, and the receiver which mounts an image decoding apparatus. (A) shows a transmission device equipped with an image encoding device, and (b) shows a reception device equipped with an image decoding device. It is the figure shown about the structure of the recording device carrying an image coding apparatus, and the reproducing | regenerating apparatus carrying an image decoding apparatus. (A) shows a recording device equipped with an image encoding device, and (b) shows a playback device equipped with an image decoding device. It is the schematic which shows the structure of an image transmission system.

Embodiment 1
Hereinafter, embodiments of the present invention will be described with reference to the drawings.

FIG. 52 is a schematic diagram illustrating a configuration of the image transmission system 1 according to the first embodiment.

The image transmission system 1 is a system that transmits a code obtained by encoding an encoding target image, decodes the transmitted code, and displays an image. The image transmission system 1 includes an image encoding device (moving image encoding device) 11, a network 21, an image decoding device (moving image decoding device) 31, and an image display device 41.

The image encoding device 11 receives an image T indicating a single layer image or a plurality of layers. A layer is a concept used to distinguish a plurality of pictures when there are one or more pictures constituting a certain time. For example, when the same picture is encoded by a plurality of layers having different image quality or resolution, scalable encoding is performed, and when a picture of a different viewpoint is encoded by a plurality of layers, view scalable encoding is performed. When prediction is performed between pictures of a plurality of layers (inter-layer prediction, inter-view prediction), encoding efficiency is greatly improved. Further, even when prediction is not performed (simultaneous casting), encoded data can be collected.

The network 21 transmits the encoded stream Te generated by the image encoding device 11 to the image decoding device 31. The network 21 is the Internet, a wide area network (WAN: Wide Area Network), a small network (LAN: Local Area Network), or a combination thereof. The network 21 is not necessarily limited to a bidirectional communication network, and may be a one-way communication network that transmits broadcast waves such as terrestrial digital broadcast and satellite broadcast. The network 21 may be replaced by a storage medium that records an encoded stream Te such as a DVD (Digital Versatile Disc) or a BD (Blue-ray Disc).

The image decoding device 31 decodes each of the encoded streams Te transmitted by the network 21, and generates one or a plurality of decoded images Td decoded.

The image display device 41 displays all or part of one or a plurality of decoded images Td generated by the image decoding device 31. The image display device 41 includes a display device such as a liquid crystal display or an organic EL (Electro-luminescence) display. Further, in spatial scalable coding and SNR scalable coding, when the image decoding device 31 and the image display device 41 have high processing capability, an enhancement layer image with high image quality is displayed. When the processing capability is lower, the base layer image that does not require higher processing capability and display capability is displayed as the enhancement layer.

<Operator>
The operators used in this specification are described below.

>> is right bit shift, << is left bit shift, & is bitwise AND, | is bitwise OR, | = is OR assignment operator.

X? Y: z is a ternary operator that takes y when x is true (non-zero) and takes z when x is false (0).

Clip3 (a, b, c) is a function that clips c to a value between a and b, but returns a if c <a, returns b if c> b, otherwise Is a function that returns c (where a <= b).

<Structure of encoded stream Te>
Prior to detailed description of the image encoding device 11 and the image decoding device 31 according to the first embodiment, a data structure of an encoded stream Te generated by the image encoding device 11 and decoded by the image decoding device 31 will be described. .

FIG. 1 is a diagram showing a hierarchical structure of data in the encoded stream Te. The encoded stream Te illustratively includes a sequence and a plurality of pictures constituting the sequence. (A) to (f) of FIG. 1 respectively show an encoded video sequence defining a sequence SEQ, an encoded picture defining a picture PICT, an encoded slice defining a slice S, and an encoded slice defining a slice data It is a figure which shows the coding unit (Coding | unit: CU) contained in the coding tree unit contained in data, coding slice data, and a coding tree unit.

(Encoded video sequence)
In the encoded video sequence, a set of data referred to by the image decoding device 31 for decoding the sequence SEQ to be processed is defined. As shown in FIG. 1A, the sequence SEQ includes a video parameter set (Video Parameter Set), a sequence parameter set SPS (Sequence Parameter Set), a picture parameter set PPS (Picture Parameter Set), a picture PICT, and an addition. Includes SEI (Supplemental Enhancement Information). Here, the value indicated after # indicates the layer ID. FIG. 1 shows an example in which encoded data of # 0 and # 1, that is, layer 0 and layer 1, exists, but the type of layer and the number of layers are not dependent on this.

The video parameter set VPS is a set of coding parameters common to a plurality of moving images, a plurality of layers included in the moving image, and coding parameters related to individual layers in a moving image composed of a plurality of layers. A set is defined.

The sequence parameter set SPS defines a set of encoding parameters that the image decoding device 31 refers to in order to decode the target sequence. For example, the width and height of the picture are defined. A plurality of SPSs may exist. In that case, one of a plurality of SPSs is selected from the PPS.

In the picture parameter set PPS, a set of encoding parameters referred to by the image decoding device 31 in order to decode each picture in the target sequence is defined. For example, a reference value (pic_init_qp_minus26) of a quantization width used for picture decoding and a flag (weighted_pred_flag) indicating application of weighted prediction are included. There may be a plurality of PPSs. In that case, one of a plurality of PPSs is selected from each picture in the target sequence.

(Encoded picture)
In the coded picture, a set of data referred to by the image decoding device 31 in order to decode the picture PICT to be processed is defined. As shown in FIG. 1B, the picture PICT includes slices S0 to S _NS-1 (NS is the total number of slices included in the picture PICT).

In the following description, if it is not necessary to distinguish each of the slices S0 to _SNS-1 , the subscripts may be omitted. The same applies to data included in an encoded stream Te described below and to which other subscripts are attached.

(Encoded slice)
In the coded slice, a set of data referred to by the image decoding device 31 for decoding the slice S to be processed is defined. As shown in FIG. 1C, the slice S includes a slice header SH and slice data SDATA.

The slice header SH includes an encoding parameter group that is referred to by the image decoding device 31 in order to determine a decoding method of the target slice. Slice type designation information (slice_type) for designating a slice type is an example of an encoding parameter included in the slice header SH.

As slice types that can be specified by the slice type specification information, (1) I slice that uses only intra prediction at the time of encoding, (2) P slice that uses unidirectional prediction or intra prediction at the time of encoding, (3) B-slice using unidirectional prediction, bidirectional prediction, or intra prediction at the time of encoding may be used.

Note that the slice header SH may include a reference (pic_parameter_set_id) to the picture parameter set PPS included in the encoded video sequence.

(Encoded slice data)
In the encoded slice data, a set of data referred to by the image decoding device 31 for decoding the slice data SDATA to be processed is defined. The slice data SDATA includes a coding tree unit (CTU) as shown in FIG. A CTU is a block of a fixed size (for example, 64x64) that constitutes a slice, and is sometimes called a maximum coding unit (LCU: Large Coding Unit).

(Encoding tree unit)
As shown in (e) of FIG. 1, a set of data referred to by the image decoding device 31 in order to decode the encoding tree unit to be processed is defined. The coding tree unit is divided by recursive quadtree division. A tree-structured node obtained by recursive quadtree partitioning is referred to as a coding node (CN). An intermediate node of the quadtree is an encoding node, and the encoding tree unit itself is defined as the highest encoding node. The CTU includes a split flag (cu_split_flag), and when cu_split_flag is 1, it is split into four coding nodes CN. When cu_split_flag is 0, the coding node CN is not divided and has one coding unit (CU: Coding Unit) as a node. The encoding unit CU is a terminal node of the encoding node and is not further divided. The encoding unit CU is a basic unit of the encoding process.

In addition, when the size of the coding tree unit CTU is 64 × 64 pixels, the size of the coding unit can be any of 64 × 64 pixels, 32 × 32 pixels, 16 × 16 pixels, and 8 × 8 pixels.

(Encoding unit)
As shown in (f) of FIG. 1, a set of data referred to by the image decoding device 31 in order to decode an encoding unit to be processed is defined. Specifically, the encoding unit includes a prediction tree, a conversion tree, and a CU header CUH. In the CU header, a prediction mode, a division method (PU division mode), and the like are defined.

In the prediction tree, prediction information (a reference picture index, a motion vector, etc.) of each prediction unit (PU) obtained by dividing the coding unit into one or a plurality of parts is defined. In other words, the prediction unit is one or a plurality of non-overlapping areas constituting the encoding unit. The prediction tree includes one or a plurality of prediction units obtained by the above-described division. Hereinafter, a prediction unit obtained by further dividing the prediction unit is referred to as a “sub-block”. The sub block is composed of a plurality of pixels. When the sizes of the prediction unit and the sub-block are equal, the number of sub-blocks in the prediction unit is one. If the prediction unit is larger than the size of the sub-block, the prediction unit is divided into sub-blocks. For example, when the prediction unit is 8x8 and the sub-block is 4x4, the prediction unit is divided into four sub-blocks that are horizontally divided into two and vertically divided into two.

The prediction process may be performed for each prediction unit (sub block).

There are roughly two types of division in the prediction tree: intra prediction and inter prediction. Intra prediction is prediction within the same picture, and inter prediction refers to prediction processing performed between different pictures (for example, between display times or between layer images).

In the case of intra prediction, there are 2Nx2N (the same size as the encoding unit) and NxN division methods.

Also, in the case of inter prediction, the division method is encoded by the PU division mode (part_mode) of encoded data, and 2Nx2N (the same size as the encoding unit), 2NxN, 2NxnU, 2NxnD, Nx2N, nLx2N, nRx2N, and NxN etc. 2NxN and Nx2N indicate 1: 1 symmetrical division, and 2NxnU, 2NxnD, nLx2N, and nRx2N indicate 1: 3 or 3: 1 asymmetric division. The PUs included in the CU are expressed as PU0, PU1, PU2, and PU3 in this order.

(A) to (h) of FIG. 2 specifically illustrate the shape of the partition (the position of the boundary of the PU partition) in each PU partition mode. 2A shows a 2Nx2N partition, and FIGS. 2B, 2C, and 2D show 2NxN, 2NxnU, and 2NxnD partitions (horizontal partitions), respectively. (E), (f), and (g) show partitions (vertical partitions) in the case of Nx2N, nLx2N, and nRx2N, respectively, and (h) shows an NxN partition. The horizontal and vertical partitions are collectively referred to as a rectangular partition, and 2Nx2N and NxN are collectively referred to as a square partition.

In the conversion tree, the encoding unit is divided into one or a plurality of conversion units, and the position and size of each conversion unit are defined. In other words, the transform unit is one or a plurality of non-overlapping areas constituting the encoding unit. The conversion tree includes one or a plurality of conversion units obtained by the above division.

The division in the conversion tree includes a case where an area having the same size as that of the encoding unit is assigned as a conversion unit, and a case where recursive quadtree division is used, as in the case of the CU division described above.

Conversion processing is performed for each conversion unit.

(Prediction parameter)
A prediction image of a prediction unit (PU: Prediction Unit) is derived from a prediction parameter associated with the PU. The prediction parameters include a prediction parameter for intra prediction or a prediction parameter for inter prediction. Hereinafter, prediction parameters for inter prediction (inter prediction parameters) will be described. The inter prediction parameter includes prediction list use flags predFlagL0 and predFlagL1, reference picture indexes refIdxL0 and refIdxL1, and motion vectors mvL0 and mvL1. The prediction list use flags predFlagL0 and predFlagL1 are flags indicating whether or not reference picture lists called L0 list and L1 list are used, respectively, and a reference picture list corresponding to a value of 1 is used. In this specification, when “flag indicating whether or not it is XX” is described, when the flag is other than 0 (for example, 1) is XX, 0 is not XX, and logical negation, logical product, etc. 1 is treated as true and 0 is treated as false (the same applies hereinafter). However, in the actual apparatus and method, other values can be used as true values and false values.

Syntax elements for deriving inter prediction parameters included in the encoded data include, for example, PU partition mode part_mode, merge flag merge_flag, merge index merge_idx, inter prediction identifier inter_pred_idc, reference picture index refIdxLX, prediction vector index mvp_LX_idx, And there is a difference vector mvdLX.

(Reference picture list)
The reference picture list is a list including reference pictures stored in the reference picture memory 306. FIG. 3 is a conceptual diagram illustrating an example of a reference picture and a reference picture list. In FIG. 3A, a rectangle is a picture, an arrow is a picture reference relationship, a horizontal axis is time, I, P and B in the rectangle are intra pictures, uni-prediction pictures and bi-prediction pictures, respectively, and numbers in the rectangles are decoded Show the order. As shown in the figure, the decoding order of pictures is I0, P1, B2, B3, and B4, and the display order is I0, B3, B2, B4, and P1. FIG. 3B shows an example of the reference picture list. The reference picture list is a list representing candidate reference pictures, and one picture (slice) may have one or more reference picture lists. In the illustrated example, the target picture B3 has two reference picture lists, an L0 list RefPicList0 and an L1 list RefPicList1. The reference pictures when the target picture is B3 are I0, P1, and B2, and the reference picture has these pictures as elements. In each prediction unit, which picture in the reference picture list RefPicListX is actually referred to is specified by the reference picture index refIdxLX. The figure shows an example in which reference pictures P1 and B2 are referred to by refIdxL0 and refIdxL1.

(Merge prediction and AMVP prediction)
The prediction parameter decoding (encoding) method includes a merge prediction (merge) mode and an AMVP (Adaptive Motion Vector Prediction) mode. The merge flag merge_flag is a flag for identifying these. The merge prediction mode is a mode in which the prediction list usage flag predFlagLX (or inter prediction identifier inter_pred_idc), the reference picture index refIdxLX, and the motion vector mvLX are not included in the encoded data and are derived from the prediction parameters of already processed neighboring PUs. The AMVP mode is a mode in which the inter prediction identifier inter_pred_idc, the reference picture index refIdxLX, and the motion vector mvLX are included in the encoded data. The motion vector mvLX is encoded as a prediction vector index mvp_LX_idx for identifying the prediction vector mvpLX and a difference vector mvdLX.

The inter prediction identifier inter_pred_idc is a value indicating the type and number of reference pictures, and takes one of PRED_L0, PRED_L1, and PRED_BI. PRED_L0 and PRED_L1 indicate that reference pictures managed in the reference picture lists of the L0 list and the L1 list are used, respectively, and that one reference picture (first reference picture) is used (single prediction mode). PRED_BI indicates that two reference pictures (first reference picture and second reference picture) are used (bi-prediction BiPred mode), and reference pictures managed in the L0 list and the L1 list are used. The prediction vector index mvp_LX_idx is an index indicating a prediction vector, and the reference picture index refIdxLX is an index indicating a reference picture managed in the reference picture list. Note that LX is a description method used when L0 prediction and L1 prediction are not distinguished from each other. By replacing LX with L0 or L1, parameters for the L0 list and parameters for the L1 list are distinguished.

The merge index merge_idx is an index indicating whether or not any prediction parameter is used as a prediction parameter of a decoding target PU among prediction parameter candidates (merge candidates) derived from a PU for which processing has been completed.

(Motion vector)
The motion vector mvLX indicates a shift amount between blocks on two different pictures. A prediction vector and a difference vector related to the motion vector mvLX are referred to as a prediction vector mvpLX and a difference vector mvdLX, respectively.

(Inter prediction identifier inter_pred_idc and prediction list use flag predFlagLX)
The relationship between the inter prediction identifier inter_pred_idc and the prediction list use flags predFlagL0 and predFlagL1 is as follows and can be converted into each other.

inter_pred_idc = (predFlagL1 << 1) + predFlagL0
predFlagL0 = inter_pred_idc & 1
predFlagL1 = inter_pred_idc >> 1
Note that a prediction list use flag or an inter prediction identifier may be used as the inter prediction parameter. Further, the determination using the prediction list use flag may be replaced with the determination using the inter prediction identifier. Conversely, the determination using the inter prediction identifier may be replaced with the determination using the prediction list use flag.

(Determination of bi-prediction biPred mode)
The flag biPred as to whether it is in the bi-prediction BiPred mode can be derived based on whether both of the two prediction list use flags are 1. For example, it can be derived by the following equation.

biPred = (predFlagL0 == 1 && predFlagL1 == 1)
The flag biPred can also be derived depending on whether or not the inter prediction identifier is a value indicating that two prediction lists (reference pictures) are used. For example, it can be derived by the following equation.

biPred = (inter_pred_idc == PRED_BI)? 1: 0
The above formula can also be expressed by the following formula.

biPred = (inter_pred_idc == PRED_BI)
For example, a value of 3 can be used for PRED_BI.

(Configuration of image decoding device)
Next, the configuration of the image decoding device 31 according to the first embodiment will be described. FIG. 5 is a block diagram illustrating a configuration of the image decoding device 31 according to the first embodiment. The image decoding device 31 includes an entropy decoding unit 301, a prediction parameter decoding unit 302, a loop filter 305, a reference picture memory 306, a prediction parameter memory 307, a prediction image generation unit (prediction image generation device) 308, and an inverse quantization / inverse conversion unit. 311 and an adder 312.

The prediction parameter decoding unit 302 includes an inter prediction parameter decoding unit 303 and an intra prediction parameter decoding unit 304. The predicted image generation unit 308 includes an inter predicted image generation unit 309 and an intra predicted image generation unit 310.

The entropy decoding unit 301 performs entropy decoding on the coded stream Te input from the outside, and separates and decodes individual codes (syntax elements). The separated codes include prediction information for generating a prediction image, residual information for generating a difference image, and the like.

The entropy decoding unit 301 outputs a part of the separated code to the prediction parameter decoding unit 302. Some of the separated codes are, for example, the prediction mode predMode, the PU partition mode part_mode, the merge flag merge_flag, the merge index merge_idx, the inter prediction identifier inter_pred_idc, the reference picture index refIdxLX, the prediction vector index mvp_LX_idx, and the difference vector mvdLX. Control of which code is decoded is performed based on an instruction from the prediction parameter decoding unit 302. The entropy decoding unit 301 outputs the quantized coefficient to the inverse quantization / inverse transform unit 311. In the encoding process, this quantized coefficient is obtained by performing DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), or KLT (Karyhnen Loeve Transform) in the encoding process. This is a coefficient obtained by performing frequency conversion such as conversion) and quantizing.

The inter prediction parameter decoding unit 303 decodes the inter prediction parameter with reference to the prediction parameter stored in the prediction parameter memory 307 based on the code input from the entropy decoding unit 301.

The inter prediction parameter decoding unit 303 outputs the decoded inter prediction parameter to the prediction image generation unit 308 and stores it in the prediction parameter memory 307. Details of the inter prediction parameter decoding unit 303 will be described later.

The intra prediction parameter decoding unit 304 refers to the prediction parameter stored in the prediction parameter memory 307 on the basis of the code input from the entropy decoding unit 301 and decodes the intra prediction parameter. The intra prediction parameter is a parameter used in the process of predicting a CU within one picture, for example, an intra prediction mode IntraPredMode. The intra prediction parameter decoding unit 304 outputs the decoded intra prediction parameter to the prediction image generation unit 308 and stores it in the prediction parameter memory 307.

The intra prediction parameter decoding unit 304 may derive different intra prediction modes for luminance and color difference. In this case, the intra prediction parameter decoding unit 304 decodes the luminance prediction mode IntraPredModeY as the luminance prediction parameter and the color difference prediction mode IntraPredModeC as the color difference prediction parameter. The luminance prediction mode IntraPredModeY is a 35 mode, and corresponds to planar prediction (0), DC prediction (1), and direction prediction (2 to 34). The color difference prediction mode IntraPredModeC uses one of the planar prediction (0), the DC prediction (1), the direction prediction (2 to 34), and the LM mode (35). The intra prediction parameter decoding unit 304 decodes a flag indicating whether IntraPredModeC is the same mode as the luminance mode. The intra prediction parameter decoding unit 304 assigns IntraPredModeY to IntraPredModeC if it indicates that the flag is the same mode as the luminance mode. If the flag indicates that the flag is different from the luminance mode, the intra prediction parameter decoding unit 304 sets the Plane prediction (0), DC prediction (1), direction prediction (2 to 34), and LM mode (35) as IntraPredModeC. May be decrypted.

The loop filter 305 applies filters such as a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) to the decoded image of the CU generated by the adding unit 312.

The reference picture memory 306 stores the decoded image of the CU generated by the adding unit 312 at a predetermined position for each decoding target picture and CU.

The prediction parameter memory 307 stores the prediction parameter in a predetermined position for each decoding target picture and prediction unit (or sub-block, fixed-size block or pixel). Specifically, the prediction parameter memory 307 stores the inter prediction parameter decoded by the inter prediction parameter decoding unit 303, the intra prediction parameter decoded by the intra prediction parameter decoding unit 304, and the prediction mode predMode separated by the entropy decoding unit 301. . The stored inter prediction parameters include, for example, a prediction list utilization flag predFlagLX (inter prediction identifier inter_pred_idc), a reference picture index refIdxLX, and a motion vector mvLX.

The prediction image generation unit 308 receives the prediction mode predMode input from the entropy decoding unit 301 and the prediction parameter from the prediction parameter decoding unit 302. Further, the predicted image generation unit 308 reads a reference picture from the reference picture memory 306. The prediction image generation unit 308 generates a prediction image of a PU or sub-block using the input prediction parameter and the read reference picture (reference picture block) according to the prediction mode indicated by the prediction mode predMode.

Here, when the prediction mode predMode indicates the inter prediction mode, the inter prediction image generation unit 309 uses the inter prediction parameter input from the inter prediction parameter decoding unit 303 and the read reference picture (reference picture block). To generate a prediction image of the PU or sub-block.

For the reference picture list (L0 list or L1 list) in which the prediction list use flag predFlagLX is 1, the inter predicted image generation unit 309 performs the motion vector mvLX with reference to the decoding target PU from the reference picture indicated by the reference picture index refIdxLX. The reference picture block at the position indicated by is read from the reference picture memory 306. The inter prediction image generation unit 309 performs prediction based on the read reference picture block to generate a prediction image of the PU. The inter prediction image generation unit 309 outputs the generated prediction image of the PU to the addition unit 312. Here, the reference picture block is a set of pixels on the reference picture (usually called a block because it is a rectangle), and is an area that is referred to in order to generate a predicted image of a PU or sub-block.

When the prediction mode predMode indicates the intra prediction mode, the intra predicted image generation unit 310 performs intra prediction using the intra prediction parameter input from the intra prediction parameter decoding unit 304 and the read reference picture. Specifically, the intra predicted image generation unit 310 reads, from the reference picture memory 306, neighboring PUs that are pictures to be decoded and are in a predetermined range from the decoding target PUs among the PUs that have already been decoded. The predetermined range is, for example, one of the left, upper left, upper and upper right adjacent PUs when the decoding target PU sequentially moves in the so-called raster scan order, and varies depending on the intra prediction mode. The raster scan order is an order in which each row is sequentially moved from the left end to the right end in each picture from the upper end to the lower end.

The intra prediction image generation unit 310 generates a prediction image of the PU by performing prediction based on the prediction mode indicated by the intra prediction mode IntraPredMode based on the read adjacent PU. The intra predicted image generation unit 310 outputs the generated predicted image of the PU to the adding unit 312.

When the intra prediction parameter decoding unit 304 derives different intra prediction modes for luminance and chrominance, the intra prediction image generation unit 310 performs planar prediction (0), DC prediction (1) and DC prediction according to the luminance prediction mode IntraPredModeY. A prediction image of the luminance PU is generated by any one of the direction predictions (2 to 34). The intra-prediction image generation unit 310 predicts a color difference PU according to any of the planar prediction (0), DC prediction (1), direction prediction (2 to 34), and LM mode (35) according to the color difference prediction mode IntraPredModeC. Generate an image.

The inverse quantization / inverse transform unit 311 inversely quantizes the quantized coefficient input from the entropy decoding unit 301 to obtain a transform coefficient. The inverse quantization / inverse transform unit 311 performs inverse frequency transform such as inverse DCT, inverse DST, and inverse KLT on the obtained transform coefficient, and calculates a residual signal. The inverse quantization / inverse transform unit 311 outputs the calculated residual signal to the adder 312.

The addition unit 312 adds the prediction image of the PU input from the inter prediction image generation unit 309 or the intra prediction image generation unit 310 and the residual signal input from the inverse quantization / inverse conversion unit 311 for each pixel, Generate a decoded PU image. The adding unit 312 stores the generated decoded image of the PU in the reference picture memory 306, and outputs a decoded image Td in which the generated decoded image of the PU is integrated for each picture to the outside.

(Configuration of inter prediction parameter decoding unit)
Next, the configuration of the inter prediction parameter decoding unit 303 will be described.

FIG. 11 is a block diagram illustrating a configuration of the inter prediction parameter decoding unit 303 according to the first embodiment. The inter prediction parameter decoding unit 303 includes an inter prediction parameter decoding control unit 3031, an AMVP prediction parameter derivation unit 3032, an addition unit 3038, a merge prediction parameter derivation unit 3036, and a sub-block prediction parameter derivation unit 3037.

The inter prediction parameter decoding control unit 3031 instructs the entropy decoding unit 301 to decode a code (syntax element) related to inter prediction. Also, the inter prediction parameter decoding control unit 3031 includes codes (syntax elements) included in the encoded data, for example, PU partition mode part_mode, merge flag merge_flag, merge index merge_idx, inter prediction identifier inter_pred_idc, reference picture index refIdxLX, prediction Vector index mvp_LX_idx or difference vector mvdLX is extracted.

The inter prediction parameter decoding control unit 3031 first extracts a merge flag merge_flag. When the inter prediction parameter decoding control unit 3031 expresses that a certain syntax element is to be extracted, it means that the entropy decoding unit 301 is instructed to decode a certain syntax element, and the corresponding syntax element is read from the encoded data. To do.

When the merge flag merge_flag is 0, that is, indicates the AMVP prediction mode, the inter prediction parameter decoding control unit 3031 uses the entropy decoding unit 301 to extract the AMVP prediction parameter from the encoded data. Examples of AMVP prediction parameters include an inter prediction identifier inter_pred_idc, a reference picture index refIdxLX, a prediction vector index mvp_LX_idx, and a difference vector mvdLX. The AMVP prediction parameter derivation unit 3032 derives a prediction vector mvpLX from the prediction vector index mvp_LX_idx. Details will be described later. The inter prediction parameter decoding control unit 3031 outputs the difference vector mvdLX to the addition unit 3038. The adding unit 3038 adds the prediction vector mvpLX and the difference vector mvdLX to derive a motion vector.

When the merge flag merge_flag is 1, that is, indicates the merge prediction mode, the inter prediction parameter decoding control unit 3031 extracts the merge index merge_idx as a prediction parameter related to merge prediction. The inter prediction parameter decoding control unit 3031 outputs the extracted merge index merge_idx to the merge prediction parameter derivation unit 3036 (details will be described later), and outputs the sub-block prediction mode flag subPbMotionFlag to the sub-block prediction parameter derivation unit 3037. The subblock prediction parameter deriving unit 3037 divides the PU into a plurality of subblocks according to the value of the subblock prediction mode flag subPbMotionFlag, and derives a motion vector in units of subblocks. That is, in the sub-block prediction mode, the prediction block is predicted for each small block unit of 4 × 4 or 8 × 8. In the image encoding device 11 to be described later, sub-block prediction is used for a method in which a CU is divided into a plurality of partitions (PUs such as 2NxN, Nx2N, and NxN) and the syntax of a prediction parameter is encoded for each partition unit. In the mode, a plurality of sub-blocks are collected into a set, and the syntax of the prediction parameter is encoded for each set. Therefore, motion information of a large number of sub-blocks can be encoded with a small code amount.

FIG. 7 is a block diagram illustrating a configuration of the merge prediction parameter deriving unit 3036 according to the first embodiment. The merge prediction parameter derivation unit 3036 includes a merge candidate derivation unit 30361, a merge candidate selection unit 30362, and a merge candidate storage unit 30363. The merge candidate storage unit 30363 stores the merge candidates input from the merge candidate derivation unit 30361. The merge candidate includes a prediction list use flag predFlagLX, a motion vector mvLX, and a reference picture index refIdxLX. In the merge candidate storage unit 30363, an index is assigned to the stored merge candidate according to a predetermined rule.

The merge candidate derivation unit 30361 derives a merge candidate using the motion vector of the adjacent PU that has already been decoded and the reference picture index refIdxLX as they are. In addition, merge candidates may be derived using affine prediction. This method will be described in detail below. The merge candidate derivation unit 30361 may use affine prediction for a spatial merge candidate derivation process, a temporal merge candidate derivation process, a combined merge candidate derivation process, and a zero merge candidate derivation process described later. Note that affine prediction is performed for each subblock, and the prediction parameters are stored in the prediction parameter memory 307 for each subblock. Alternatively, the affine prediction may be performed for each pixel unit.

(Spatial merge candidate derivation process)
As the spatial merge candidate derivation process, the merge candidate derivation unit 30361 reads and reads the prediction parameters (prediction list use flag predFlagLX, motion vector mvLX, reference picture index refIdxLX) stored in the prediction parameter memory 307 according to a predetermined rule. The predicted parameters are derived as merge candidates. The prediction parameter to be read is a prediction parameter related to each of the PUs within a predetermined range from the decoding target PU (for example, all or part of the PUs in contact with the lower left end, the upper left end, and the upper right end of the decoding target PU, respectively). is there.

(Time merge candidate derivation process)
As the temporal merge derivation process, the merge candidate derivation unit 30361 reads the prediction parameter of the PU in the reference image including the lower right coordinate of the decoding target PU from the prediction parameter memory 307 and sets it as a merge candidate. The reference picture designation method may be, for example, the reference picture index refIdxLX designated in the slice header, or may be designated using the smallest reference picture index refIdxLX of the PU adjacent to the decoding target PU.

(Join merge candidate derivation process)
As a merge merge derivation process, the merge candidate derivation unit 30361 uses two different derived merge candidate motion vectors and reference picture indexes already derived and stored in the merge candidate storage unit 30363 as the motion vectors of L0 and L1, respectively. Combined merge candidates are derived by combining them.

(Zero merge candidate derivation process)
As the zero merge candidate derivation process, the merge candidate derivation unit 30361 derives a merge candidate in which the reference picture index refIdxLX is 0 and both the X component and the Y component of the motion vector mvLX are 0.

The merge candidates derived by the merge candidate deriving unit 30361 are stored in the merge candidate storage unit 30363.

The merge candidate selection unit 30362 selects, from the merge candidates stored in the merge candidate storage unit 30363, a merge candidate to which an index corresponding to the merge index merge_idx input from the inter prediction parameter decoding control unit 3031 is assigned. As an inter prediction parameter. The merge candidate selection unit 30362 stores the selected merge candidate in the prediction parameter memory 307 and outputs it to the prediction image generation unit 308.

FIG. 8 is a block diagram illustrating a configuration of the AMVP prediction parameter derivation unit 3032 according to the first embodiment. The AMVP prediction parameter derivation unit 3032 includes a vector candidate derivation unit 3033, a vector candidate selection unit 3034, and a vector candidate storage unit 3035. The vector candidate derivation unit 3033 derives a prediction vector candidate from the already processed PU motion vector mvLX stored in the prediction parameter memory 307 based on the reference picture index refIdx. The vector candidate derivation unit 3033 stores the derived prediction vector candidate in the vector candidate storage unit 3035 prediction vector candidate list mvpListLX [].

The vector candidate selection unit 3034 selects the motion vector mvpListLX [mvp_LX_idx] indicated by the prediction vector index mvp_LX_idx from the prediction vector candidates in the prediction vector candidate list mvpListLX [] as the prediction vector mvpLX. The vector candidate selection unit 3034 outputs the selected prediction vector mvpLX to the addition unit 3038.

Note that a prediction vector candidate is a PU for which decoding processing has been completed, and is derived by scaling a motion vector of a PU (for example, an adjacent PU) within a predetermined range from the decoding target PU. The adjacent PU includes a PU that is spatially adjacent to the decoding target PU, for example, the left PU and the upper PU, and an area that is temporally adjacent to the decoding target PU, for example, the same position as the decoding target PU. It includes areas obtained from prediction parameters of PUs with different times.

The addition unit 3038 adds the prediction vector mvpLX input from the AMVP prediction parameter derivation unit 3032 and the difference vector mvdLX input from the inter prediction parameter decoding control unit 3031 to calculate a motion vector mvLX. The adding unit 3038 outputs the calculated motion vector mvLX to the predicted image generation unit 308 and the prediction parameter memory 307.

(Inter prediction image generation unit 309)
FIG. 10 is a block diagram illustrating a configuration of the inter predicted image generation unit 309 included in the predicted image generation unit 308 according to the first embodiment. The inter prediction image generation unit 309 includes a motion compensation unit 3091 and a weight prediction unit 3094.

(Motion compensation unit 3091)
The motion compensation unit 3091 receives the reference picture index refIdxLX from the reference picture memory 306 based on the inter prediction parameters (prediction list use flag predFlagLX, reference picture index refIdxLX, and motion vector mvLX) input from the inter prediction parameter decoding unit 303. In the reference picture RefX (any one of the reference pictures included in the reference picture list) specified in (1), an interpolation picture (by reading a reference block at a position shifted by the motion vector mvLX starting from the position of the decoding target PU) Motion compensated image predSamplesLX) is generated. Here, when the accuracy of the motion vector mvLX is not integer accuracy, a motion compensation image is generated by applying a filter for generating a pixel at a decimal position called a motion compensation filter.

Here, the motion compensation unit 3091 generates a prediction image by referring to the single prediction mode, the bi-prediction mode, the two reference images (first reference image and second reference image), and the gradient correction term. A predicted image is generated using at least one of the BIO mode using prediction. The motion compensation unit 3091 also determines that the reference block in at least one of the first reference image and the second reference image is outside the screen of the reference image (in other words, the reference block in the first reference image is the first block). Generation of a predicted image using the BIO mode is prohibited when the reference image is outside the screen and / or the reference block in the second reference image is outside the second reference image). To do. That is, in the above case, the motion compensation unit 3091 does not perform prediction image generation using the BIO mode. Details of processing executed by the motion compensation unit 3091 will be described later.

Note that the reference images referred to by the single prediction mode, the bi-prediction mode, and the BIO mode may be the same or different.

(Weight predictor 3094)
The weight prediction unit 3094 generates a prediction image of the PU by multiplying the input motion compensation image predSamplesLX by a weight coefficient.

(Switching prediction mode in motion compensation unit with gradient change prediction)
FIG. 12 is a flowchart for explaining the flow of processing in which the motion compensation unit 3091 having a motion compensation function using gradient change (BIO) prediction derives a predicted image.

When it is determined that the inter prediction parameter decoding unit 303 is not in the bi-prediction mode (No in step S101, that is, uni-prediction UniPred), the process proceeds to step S105, and the motion compensation unit 3091 performs unidirectional motion compensation. On the other hand, when it is determined that the inter prediction parameter decoding unit 303 is in the bi-prediction mode (Yes in step S101, BiPred), the inter prediction parameter decoding unit 303 performs the determination shown in step S102.

If the L0 reference image refImgL0 and the L1 reference image refImgL1 (described as “two prediction blocks” in FIG. 12) acquired from the reference picture memory 306 are different reference images (Yes in step S102), the process proceeds to step S103. The motion compensation unit 3091 performs motion compensation using BIO prediction described later. On the other hand, if the L0 reference image refImgL0 and the L1 reference image refImgL1 acquired from the reference picture memory 306 are the same reference image (No in step S102), the process proceeds to step S104, and the motion compensation unit 3091 does not apply BIO prediction. Perform motion compensation.

(Basic concept of gradient change)
In the gradient change (Optical Flow), it is assumed that the pixel value of each point does not change and only the position changes. This is because the change in the pixel value I in the horizontal direction (horizontal gradient value lx) and the change in the position Vx, the change in the pixel value I in the vertical direction (vertical gradient value ly), the change in the position Vy, and the pixel value I Using time change lt, it can be expressed as follows.

lx * Vx + ly * Vy + lt = 0
Hereinafter, the change in position (Vx, Vy) is referred to as a correction weight vector (u, v).

(About BIO prediction)
Here, motion compensation using BIO prediction will be described with reference to FIGS. FIG. 17 is a block diagram illustrating an example of the configuration of the motion compensation unit 3091. The motion compensation unit 3091 includes a motion compensation gradient unit 30911 and a gradient correction coefficient unit 30912. The motion compensation gradient unit 30911 includes a motion compensation derivation unit 309111 and a gradient derivation unit 309112, while the gradient correction coefficient unit 30912 includes a gradient product derivation unit 309121 and a gradient correction coefficient derivation unit 309122. Using these, the motion compensation unit 3091 applies bi-directional optical flow (bi-predictive gradient change: BIO) prediction that performs motion correction by applying a gradient correction term to a bi-prediction (BiPred) image. I do.

That is, when the BIO is not applied, the motion compensation unit 3091 uses the following prediction formula Pred = {(P0 + P1) + shiftOffset} >> shiftPred (Formula A1)
Is used to derive each pixel value Pred of the predicted image. Note that P0 is the pixel value of the motion compensated image P0, and P1 is the pixel value of the motion compensated image P1.

When applying BIO, the motion compensation unit 3091 uses the following prediction formula Pred = {(P0 + P1) + modBIO + shiftOffset} >> shiftPred (Formula A2)
Is used to derive the pixel value Pred of the predicted image. ModBIO is
modBIO = {((lx0-lx1) * u + (ly0-ly1) * v) >> 1} << shiftPred This is a gradient correction term derived by Expression (A3). In Expression A3, lx0 (first gradient image), ly0 (second gradient image), lx1 (third gradient image), and ly1 (fourth gradient image) are gradient images. The gradient images lx0 and lx1 show the gradient along the horizontal direction (x direction or first direction), and the gradient images ly0 and ly1 show the gradient along the vertical direction (y direction or second direction). Yes. U and v are correction weight vectors.

(Outline of BIO prediction processing)
First, the motion compensation unit 3091 generates a prediction image (inter prediction image) with reference to the motion compensation image P0 (first reference image), the motion compensation image P1 (second reference image), and the gradient correction term. The processing flow will be described with reference to FIG. FIG. 18 is a flowchart showing the flow of processing in motion compensation using BIO prediction.

As shown in FIG. 18, in motion compensation using BIO prediction, the motion compensation unit 3091 performs the following three steps of STEP 111 to 113 to derive a predicted image.

In STEP 111, the motion compensation deriving unit 309111 reads out the L0 reference image refImgL0 (first reference image) and the L1 reference image refImgL1 (second reference image) to be used as the reference image from the reference picture memory 306, and the motion compensation image. P0 and P1 are derived (S111a in FIG. 18).

Next, the gradient deriving unit 309112 derives horizontal gradient images lx0 and lx1 and vertical gradient images ly0 and ly1 for the motion compensated images P0 and P1 derived by the motion compensation deriving unit 309111 (S111b in FIG. 18).

Note that refImgL0, refImgL1, P0, P1, lx0, lx1, ly0, and ly1 are two-dimensional reference images refImgL01 [x] [y], refImgL1 [x] [y], and motion compensation image P0 [x] [ y], P1 [x] [y], gradient image lx0 [x] [y], lx1 [x] [y], ly0 [x] [y] and ly1 [x] [y] (x and y are predetermined Or a reference image value refImgL0, refImgL1, motion compensation image value P0, P1, gradient values lx0, lx1, ly0, and ly1.

Subsequently, in STEP 112, the gradient product deriving unit 309121 derives the gradient product (S112a in FIG. 18). Next, the gradient correction coefficient deriving unit 309122 uses the gradient product derived by the gradient product deriving unit 309121 to derive a correction weight vector (u, v) (gradient correction coefficient, uh and vh in integer arithmetic) (see FIG. 18 S112b).

The gradient correction coefficient deriving unit 309122 may derive the correction weight vector (u, v) using Equation 5 in FIG. 13 or may be derived using the least square method as described later.

When the correction weight vector (u, uv) is derived using Equation 5 in FIG. 13, the motion compensation unit 3091 has gradient changes dI / dx (ie, lx) and dI / dy (ie, ly) at a plurality of points. ) And time variation dI / dt, u and v satisfying the condition that minimizes the square of the left side of Equation 5 in FIG. 13 may be derived.

As shown in FIG. 14, the pixel value I of the point I on the target image Cur, the pixel value I0 of the point 10 (lx0, ly0) on the L0 reference image refImgL0, and the point l1 (lx1, L1 on the L1 reference image refImgL1) This corresponds to the pixel value I1 of ly1). FIG. 14 shows only the x component at the point l0 and the x component at the point l1, but the same applies to the y component at the point l0 and the y component at the point l1. If the assumption of gradient change (only the position of each point changes without changing) in the target region holds, I = I0 = I1 holds. When attention is paid to the gradient that is the amount of change in the spatial pixel value, the pixel value I0 of a certain point on the L0 reference image refImgL0 uses the gradient at the pixel values P0 and P0 of another point and the distance between I0 and P0. Therefore, it can be predicted from I0 = P0 + gradient × distance. Here, the gradient is (lx0, ly0), and the distance between IO and P0 is (Vx, Vy). Similarly, the pixel value I1 of a certain point on the L0 reference image refImgL0 is obtained by using the gradient at the pixel values P1 and P1 of another point and the distance between I1 and P1, I1 = P0 + gradient (lx1, ly1) * It can be derived from the distance (-Vx, -Vy). Alternatively, by deriving a parameter (Vx, Vy) that minimizes the difference between I0 and I1 from the assumption that the pixel value I0 and the pixel value I1 are equal, the distance (Vx, Vy) = correction weight vector (u, v) Can be derived.

The motion compensation unit 3091 uses the pixel value I of the point I on the target image Cur, which is the prediction target image, as the pixel value l0 of the point 10 on the L0 reference image refImgL0 and the pixel value l1 of the point 11 on the L1 reference image refImgL1. Average of
I = (l0 + l1) >> 1 (Formula A4-1)
= {P0 + P1 + (lx0-lx1) * u + (ly0-ly1) * v} >> 1 (Formula A4-2)
To predict. here,
l0 = P0 + (lx0 * u + ly0 * v)
l1 = P1-(lx1 * u + ly1 * v)
It is.

The correction weight vector is (u, v) that minimizes Formula A5 from the assumption that the pixel value does not change.

Σ│l0-l1│ ² = Σ│ (P0-P1) + (lx0 + lx1) * u + (ly0 + ly1) * v│ ² (Formula A5)
Here, Σ is P0, at the target pixel (x, y) and its surrounding points (x + dx, y + dy) (for example, dx = -2..2, dy = -2..2). This corresponds to an operation of calculating P1, lx0, ly0, ly0, and ly1 and adding the values made up of them.

Finally, in STEP 113, the gradient correction bi-prediction derivation unit 30913 derives the gradient correction term modBIO [] [] using the correction weight vector (u, v) derived in STEP 112 (see Equation A3 and Equation A19 described later). (S113a in FIG. 18). Then, the gradient correction bi-prediction derivation unit 30913 derives the pixel value Pred of the gradient correction bi-prediction image (prediction image, corrected prediction image) using Expression A2 (S113b in FIG. 18).

The motion compensation unit 3091 derives the gradient correction term modBIO by substituting the correction weight vector (u, v) derived in STEP 112 into Equation A3, and calculates the pixel value Pred of the predicted image as
Pred = {P0 + P1 + (lx0-lx1) * u + (ly0-ly1) * v} >> 1 (Formula A6)
Derived using In addition, weaken the slope correction term modBIO to 1/2,
Pred = {P0 + P1 + ((lx0-lx1) * u + (ly0-ly1) * v) >> 1} >> 1 (Formula A7)
It may be used as follows.

Subsequently, the processing of each of the above STEPs will be described in detail. Here, a case where the motion compensation unit 3091 derives (generates) a motion compensation image and a gradient image will be described as an example, but the motion compensation unit 3091 derives a pixel value included in the image, not an image. It may be a configuration. That is, the motion compensation unit 3091 may derive the pixel value of the motion compensation image and the pixel value of the gradient image.

Hereinafter, details of each STEP will be described with reference to FIG. 18 showing the configuration of the motion compensation unit 3091.

(Details of STEP111)
The motion compensation derivation unit 309111 derives motion compensation images P0 and P1 (also referred to as basic motion compensation images) based on the L0 reference image refImgL0 and the L1 reference image refImgL1 (S111a in FIG. 18). Further, the gradient deriving unit 309112 derives horizontal gradient images lx0 and lx1 and vertical gradient images ly0 and ly1 for the derived motion compensation images P0 and P1 (S111b in FIG. 18).

(Details of S111a in STEP111)
The motion compensation deriving unit 309111 applies a vertical motion compensation filter (mcFilterVer) to the reference image. Then, the motion compensation images P0 and P1 are derived by further applying a horizontal motion compensation filter (mcFilterHor) to the reference image to which the vertical motion compensation filter is applied.

The integer position (xInt, yInt) and phase (xFrac, yFrac) of the reference image of the in-block coordinates (x, y) of the upper left block coordinates (xPb, yPb) are
xInt = xPb + (mvLX [0] >> 2) + x
xFrac = mvLX [0] & 3
yInt = yPb + (mvLX [1] >> 2) + y
yFrac = mvLX [1] & 3
It is. Here, the accuracy of the motion vector is assumed to be 1/4 pel accuracy, but the motion vector accuracy is not limited to this, and may be 1/8, 1/16,. When the motion vector accuracy is 1 / M pel accuracy, in the above formula, the shift value for deriving the integer positions xInt, yInt is log2 (M) and the logical sum for deriving the phases xFrac, yFrac (& The following formula A8 may be used in which the value used for () is M-1.

xInt = xPb + (mvLX [0] >> (log2 (M))) + x
xFrac = mvLX [0] & (M-1)
yInt = yPb + (mvLX [1] >> (log2 (M))) + y
yFrac = mvLX [1] & (M-1) (Formula A8)
When the bit depth of the filter coefficient is, for example, MC_ACCU, the bit depth of the filtered image is bitDepthY + MC_ACCU, which is a value obtained by adding MC_ACCU to the bit depth bitDepthY of the reference image. In order to return the bit depth of the filtered image to the bit depth of the reference image, a right shift of only MC_ACCU is required as a right shift for adjusting the dynamic range. Here, when the two filters are serial processing, it is appropriate to use the intermediate bit depth INTERNAL_BIT_DEPTH higher than bitDepth in the output of the first filter operation and return from INTERNAL_BIT_DEPTH to bitDepth in the output of the second filter operation. It is. In this case, the shift values shift1 and shift2 for adjusting the dynamic range of the first and second filters may be set as follows.

shift1 = bitDepthY-(INTERNAL_BIT_DEPTH-MC_ACCU)
shift2 = MC_ACCU (= 6) (Formula A9)
Here, bitDepthY represents the bit depth of the reference image, INTERNAL_BIT_DEPTH represents the intermediate bit depth, and MC_ACCU represents the accuracy of the motion compensation filter mcFilter. Note that MC_ACCU is not limited to 6, and a value such as 3 to 10 can be used.

With vertical motion compensation filter mcFilterVer, x = 0 .. BLKW-1, y = 0 .. BLKH-1, k
= 0 .. TAP-1, offset1 = 1 << (shift1-1), the product sum of the coefficient mcFilter [] [] of the motion compensation filter and the reference image refImg [] [] )
temp [x] [y] = (ΣmcFilterVer [yFrac] [k] * refImg [xInt] [yInt + k-NTAPS / 2 + 1] + offset1) >> shift1
bitDepth (temp [] []) = bitDepthY + MC_ACCU-shift1 = INTERNAL_BIT_DEPTH (= 14) (Formula A10)
Thus, a temporary image temp [] [] used as intermediate data is derived. Here, bitDepth (temp [] []) indicates the bit depth of the temporary image temp [] []. The bit depth of the temporary image temp [] [] is a value obtained by subtracting the right shift value shift1 in the filter processing from the sum of the bit depth bitDepthY of the reference image and the accuracy MC_ACCU of the motion compensation filter. This value is referred to herein as the intermediate bit depth INTERNAL_BIT DEPTH.

On the other hand, the horizontal motion compensation filter mcFilterHor allows vertical motion at x = 0 .. BLKW-1, y = 0 .. BLKH-1, k = 0 .. TAP-1, offset2 = 1 << (shift2-1) The product sum of the temporary image temp [] [] derived by the compensation filter mcFilterVer is used as the value obtained by adjusting the range by the shift value shift2.
PX [x] [y] = (ΣmcFilterHor [xFrac] [k] * temp [x + k-NTAPS / 2 + 1] [y] + offset2) >> shift2
bitDepth (PX [] []) = INTERNAL_BIT DEPTH + MC_ACCU-shift2 = INTERNAL_BIT_DEPTH (Formula A11)
Is used to derive a motion compensated image PX [] [] (PX is P0 or P1). As a specific example of the 8-tap motion compensation filters mcFilterVer and mcFilterHor used by this processing, a motion compensation filter mcFilter [nFrac] [pos] (nFrac = 0..NPHASES-1, pos = 0..NTAPS-1) is used. As shown in FIG. NPHASES indicates the number of phases, and NTAPS indicates the number of taps.

(Details of S111b in STEP 111)
Next, the case where the gradient deriving unit 309112 derives the horizontal gradient images lx0 and lx1 will be described.

The horizontal gradient image lxX is obtained by the gradient derivation unit 309112 applying the horizontal gradient filter gradFilterHor after applying the vertical motion compensation filter mcFilterVer. The gradient deriving unit 309112 applies the horizontal gradient filter gradFilterHor to the temporary image temp [] [] derived by the above equation A10 to derive the horizontal gradient image lxX as in the following equation A12.

lxX [x] [y] = (ΣgradFilterHor [xFrac] [k] * temp [x + k-NTAPS / 2 + 1] [y] + offset2) >> shift2
bitDepth (lxX [] []) = INTERNAL_BIT DEPTH + GRAD_ACCU-shift2 = INTERNAL_BIT_DEPTH (Formula A12)
Here, x = 0..BLKW-1, y = 0..BLKH-1, k = 0..NTAPS-1, offset2 = 1 << (shift2-1), lxX is lx0 or lx1. bitDepth (lxX [] []) indicates the bit depth of the filtered image lxX [] []. GRAD_ACCU represents the accuracy of the gradient filter gradFilter (gradFilterHor or gradFilterVer). As a specific example of the 8-tap gradient filter gradFilter set in this way, the gradient filter gradFilter [nFrac] [pos] (nFrac = 0..NPHASES-1, pos = 0..NTAPS-1) is shown in FIG. Shown in b).

Next, the case where the gradient deriving unit 309112 derives the vertical gradient images ly0 and ly1 will be described.

The gradient derivation unit 309112 applies the vertical gradient filter (gradFilterVer) with reference to the L0 reference image refImgL0 and the L1 reference image refImgL1, and then applies the horizontal motion compensation filter (mcFilterHor) to thereby obtain the vertical gradient image ly0 and Derive ly1.

temp [x] [y] = (ΣgradFilterVer [yFrac] [i] * refImg [xInt] [yInt + i-NTAPS / 2 + 1] + offset1) >> shift1
bitDepth (temp [] []) = bitDepthY + GRAD_ACCU-shift1 = INTERNAL_BIT DEPTH (= 14) (Formula A13)
Thus, a temporary image temp [] [] is derived.

On the other hand, a vertical gradient image lyX is derived by applying a horizontal motion compensation filter mcFilterHor to this temp [x] [y].

lxX [x] [y] = (ΣmcFilterHor [xFrac] [i] * temp [x + i-NTAPS / 2 + 1] [y] + offset2) >> shift2
bitDepth (PX [] []) = INTERNAL_BIT_DEPTH + MC_ACCU-shift2 = INTERNAL_BIT_DEPTH (Formula A14)
(Details of STEP112)
The gradient product deriving unit 309121 may derive the correction weight vector (u, v) that minimizes the value obtained by the above equation A5 using the least square method.

(Details of S112b in STEP112)
The gradient correction coefficient deriving unit 309122 is, for example, an integer value obtained by shifting u to the left by shift BIO bits instead of the correction weight vector (u, v) that actually requires calculation of decimal point precision. << shiftBIO), vh may be an integer value obtained by shifting v to the left by shiftBIO bits (that is, vh = v << shiftBIO). When shiftBIO = 5, the precision is 1/32. For example, in coordinate descent, which is one of the optimization methods, derive u (uh) from which to assume a certain v, and then derive v (vh) using the derived u (uh). repeat. Assuming v = 0 first and applying coordinate descent to equation A8, uh = (s3 << 5) / s1
vh = ((s6 << 5)-s2 * uh) / s5 (Formula A15)
And a correction weight vector (here, uh, vh) are obtained.

Since s2 = s4, s4 may be used instead of s2 in the above formula A15.

The gradient correction coefficient deriving unit 309122 performs clip processing on uh and vh in a predetermined range (-rangeBIO to rangeBIO). That is, the gradient correction coefficient derivation unit 309122
rangeBIO = (1 << shiftBIO) * MVTH
uh = clip3 (-rangeBIO, rangeBIO, uh)
vh = clip3 (-rangeBIO, rangeBIO, vh) (Formula A16)
Clip processing represented by the expression In addition, MVTH is 2 / 3pel, for example.

(Details of STEP113)
The gradient correction bi-prediction derivation unit 30913 derives the gradient correction term modBIO by substituting the correction weight vector (u, v) derived in STEP 112 into the equation A3, and the pixel value Pred of the predicted image is calculated using the equation A6. To derive. The slope correction term modBIO may be weakened to 1/2 and used as in the above equation A7.

In the integer calculation by the gradient correction bi-prediction derivation unit 30913, the pixel value Pred of the predicted image is derived using the prediction formula shown in the above formula A2. The gradient correction bi-prediction derivation unit 30913 calculates the gradient correction term modBIO,
modBIO = {(lx0-lx1) * uh + (ly0-ly1) * vh} >> shiftBIO2 (Formula A17)
It is derived using the following formula. In addition,
shiftBIO2 = shiftBIO + bitDepth (mcImg)-bitDepth (gradImg) + 1 (Formula A18)
It is.

Here, bitDepth (mcImg) is the bit depth of the motion compensated image, and bitDepth (gradImg) is the bit depth of the gradient image.

When the accuracy MV_ACCU of the motion compensation filter is equal to the accuracy GRAD_ACCU of the gradient filter, bitDepth (mcImg) = bitDepth (gradImg). From this, equation A18 is simplified and
shiftBIO2 = shiftBIO + 1 (Formula A19)
It becomes.

(Specific example of function of motion compensation unit 3091)
The function of the motion compensation unit 3091 described above will be described in more detail below with reference to FIGS. 19 and 20.

FIG. 19 is a diagram illustrating an area where the motion compensation unit 3091 performs BIO padding. FIG. 20 is a diagram illustrating an example when the motion compensation unit 3091 performs off-screen padding. Note that “padding” refers to a process of generating a pixel in an area without a pixel. In this embodiment, as an example of “padding”, a method is used in which pixels at the boundary between the reading area and the reference image are copied vertically and horizontally to generate pixels outside the boundary. Hereinafter, generating a pixel outside the boundary of the reference image is referred to as “off-screen padding”, and generating a pixel outside the boundary of the reading area (MC reading area) necessary for motion compensation processing is referred to as “BIO”. This is called “padding”.

N in FIG. 19 indicates the width and height of a corresponding block that is a block at the same position as the encoding target block in the reference image. T is the number of filter taps of the motion compensation filter, and is the number of pixels on which the motion compensation filter is applied to the reference image. The pixel area of (N + T-1) * (N + T-1) indicates an MC reading area necessary for motion compensation. TB is the number of filter taps of the gradient filter used in the BIO mode. For example, values such as T = 8 and TB = 6 are used.

The above-mentioned “outside of the reference image” means the outside of the reference image. Specifically, the region outside the reference image width pic_width and height pic_hight shown in FIG. 20 vertical line areas). Further, the case where the above-described reference block is outside the screen of the reference image can also be expressed as a case where at least a part of the reference block indicated by the motion vector protrudes beyond the boundary of the reference image, as shown in FIG. it can. Here, an example in which the width and height of the corresponding block are both N will be described, but the width and height may be different values.

Here, in the motion compensation using a prediction mode other than the BIO mode such as the uni-prediction mode and the bi-prediction mode, when the reference block is only in the screen, the motion compensation unit 3091 is the MC reading area in FIG. N + T-1) * (N + T-1) pixels are read.

On the other hand, when generating a gradient image using the BIO mode, the motion compensation unit 3091 obtains an MC reading area (N + T−1) * (N +) in order to obtain a weight coefficient (gradient correction coefficient) of the corresponding block. Read pixel of T-1). Here, when a gradient image originally using the BIO mode is generated, in addition to the pixels in the MC reading area (N + T-1) * (N + T-1) in FIG. 19, the diagonal lines in FIG. Partial area pixels are required. For this reason, as shown in FIG. 19, the motion compensation unit 3091 generates padding pixels using the pixels at the boundary of the MC reading area in the portion that protrudes outside the MC reading area, which is the area indicated by the hatched portion of the reference block (BIO Execute padding) to generate a gradient image.

As described above, in the BIO mode, even if the reference block does not protrude from the screen of the reference image, the motion compensation unit 3091 performs the BIO padding on the region of the reference block that protrudes outside the reading region. I do.

Therefore, as shown in FIG. 20, when the reference block protrudes outside the screen of the reference image, off-screen padding for generating a pixel outside the screen and a pixel for generating a pixel outside the MC reading area are generated. Both BIO padding is required. Therefore, two padding modules, that is, off-screen padding and BIO padding are required, and the mounting scale becomes large. Also, if both off-screen padding and BIO padding are required, the amount of padding processing increases.

On the other hand, the motion compensation unit 3091 prohibits the BIO mode when a part of the reference block is outside the screen of the reference image. That is, when a part of the reference block is outside the screen of the reference image, the BIO mode is not performed, and thus the motion compensation unit 3091 does not need to perform both the off-screen padding and the BIO padding.

Hereinafter, a specific example of generation of a predicted image by the motion compensation unit 3091 will be described.

(Specific example 1 of generation of predicted image by motion compensation unit 3091)
First, a specific example 1 of generation of a predicted image by the motion compensation unit 3091 will be described with reference to FIGS. FIG. 21 is a flowchart illustrating an example of a flow of processing for determining whether or not the motion compensation unit 3091 executes (permits) generation of a predicted image using the BIO mode.

In STEP 121, the motion compensation unit 3091 determines whether at least a part of the reference block is outside the screen of the reference image. When the motion compensation unit 3091 determines that at least a part of the reference block is outside the screen (YES in STEP 121), the motion compensation unit 3091 prohibits generation of a predicted image using the BIO mode (STEP 123). In this case, in the processing following FIG. 21, the motion compensation unit 3091 reads the pixels in the MC reading area (N + T-1) * (N + T-1) as shown in FIG. Alternatively, a prediction image is generated in the bi-prediction mode, and the process ends.

In STEP 121, when the motion compensation unit 3091 determines that the reference block is not outside the screen of the reference image (No in STEP 121), that is, when the motion compensation unit 3091 determines that the reference block is within the screen of the reference image. The process proceeds to STEP 122 in FIG.

In STEP122, the motion compensation unit 3091 permits the BIO mode. In this case, in the processing subsequent to FIG. 21 (for example, the processing in FIG. 18), the motion compensation unit 3091 stores the MC reading area (N + T−1) * (N + T−1) as shown in FIG. Pixels are read and a motion compensated image is generated (S111a in FIG. 18). Subsequently, the motion compensation unit 3091 reads pixels in the padding area (shaded area) outside the MC reading area, executes S111b to S113b in FIG. 18, generates a predicted image in the BIO mode, and ends the process.

Note that, as shown in Equation A20, which will be described later, the motion compensation unit 3091 determines whether at least a part of the reference block is outside the screen by determining the following (1) to (4). Also good.
(1) Whether the upper left and lower left x coordinates of the reference block are less than the x coordinate of the pixel at the left boundary of the reference image. (2) The upper right and lower right x coordinates of the reference block are the right boundary of the reference image. (3) Whether the upper left coordinate and the upper right coordinate of the reference block are less than the y coordinate of the upper boundary pixel of the reference image (4) The lower left coordinate of the reference block And whether the y coordinate of the lower right coordinate is larger than the y coordinate of the lower boundary pixel of the reference image (specific example 2 of generation of a predicted image by the motion compensation unit 3091)
In the above example, the motion compensation unit 3091 prohibits the BIO mode if at least a part of the reference block indicated by the motion vector protrudes from the boundary of the reference image, but in the present embodiment, the motion compensation unit 3091 is limited to this. Not.

Hereinafter, specific example 2 of generation of a prediction image by the motion compensation unit 3091 will be described with reference to FIG. FIG. 22 is a flowchart illustrating another example of a flow of processing for determining whether or not the motion compensation unit 3091 executes (permits) prediction image generation using the BIO mode.

In STEP 131, the motion compensation unit 3091 determines whether or not at least a part of the reference block is outside a certain range from the boundary of the reference image.

For example, as shown in FIG. 20, when a part of the reference block indicated by the motion vector protrudes from the boundary of the reference image, the motion compensation unit 3091 has a length outW that the reference block protrudes from the reference image in the horizontal direction, The length outH protruding in the vertical direction is measured. Subsequently, the motion compensation unit 3091 compares the length outW that protrudes in the horizontal direction with padW that indicates the length in the horizontal direction within a predetermined range. In addition, the motion compensation unit 3091 compares the length outH protruding in the vertical direction with padH indicating the length in a predetermined range in the vertical direction.

PadW and padH are distances in the horizontal and vertical directions within a certain range from the boundary of the reference image as shown in FIG. Further, for example, padW = T / 2-1 and padH = T / 2-1 may be set. Further, the upper distance, the lower distance, the right distance, and the right distance in a certain range from the boundary of the reference image may be different values. For example, assume that the upper distance, lower distance, right distance, and right distance within a certain range from the boundary of the reference image are padH_upper, padH_lower, padW_left, and padW_right, respectively. In this case, padH_upper = 64, padH_lower = 63, padW_left = 128, padW_right = 127, and the like.

When the length protruding in the horizontal direction is longer than padW (outW> padW) and / or when the length protruding in the vertical direction is longer than padH (outH> padH), the motion compensation unit 3091 is: It is determined that at least a part of the reference block is outside a certain range from the boundary of the reference image.

Note that, as shown in Expression A21 described later, the motion compensation unit 3091 determines the following (1) to (4), so that at least a part of the reference block is outside a certain range from the boundary of the reference image. It may be determined whether or not there is.
(1) Whether the upper left and lower left x coordinates of the reference block are less than the x coordinate of a pixel outside a certain range from the boundary of the reference image (2) the upper right and lower right x coordinates of the reference block are Whether or not the x coordinate of the pixel outside the certain range from the boundary of the reference image is larger than (3) the y coordinate of the upper left coordinate and the upper right coordinate of the reference block is y of the pixel outside the certain range from the boundary of the reference image Whether or not it is less than the coordinates (4) Whether or not the y coordinate of the lower left coordinate and the lower right coordinate of the reference block is larger than the y coordinate of a pixel outside a certain range from the boundary of the reference image At least a part of the reference block When the motion compensation unit 3091 determines that is outside a certain range from the boundary of the reference image (Yes in STEP 131), the motion compensation unit 3091 prohibits the BIO mode. That is, the motion compensation unit 3091 does not execute the BIO mode (STEP 133). In this case, the motion compensation unit 3091 generates a prediction image in the uni-prediction mode or the bi-prediction mode, as in the first specific example, and ends the process.

In STEP 131, when the motion compensation unit 3091 determines that at least a part of the reference block is not out of a certain range from the boundary of the reference image (No in STEP 131), the process proceeds to STEP 132.

In STEP132, the motion compensation unit 3091 generates a predicted image using the BIO mode in the same manner as in the first specific example, and ends the process.

Thus, the motion compensation unit 3091 may prohibit the BIO mode when at least a part of the reference block is outside a certain range from the boundary of the reference image. That is, the motion compensation unit 3091 may not execute the BIO mode.

As a result, when at least a part of the reference block protrudes from the boundary of the reference image but does not protrude outside the fixed range, the motion compensation unit 3091 generates a highly accurate predicted image using the BIO mode. can do. Further, when at least a part of the reference block is outside a certain range from the boundary of the reference image, the motion compensation unit 3091 prohibits the BIO mode. For this reason, if the motion compensation unit 3091 generates a padding area in advance for an out-of-screen area that is outside a certain range from the boundary of the reference image, the motion compensation unit 3091 reduces the padding to one type (when the BIO mode is on, Only). In other words, the motion compensation unit 3091 derives an enlarged reference image in which off-screen padding is performed in advance, so that the BIO is prohibited in advance when it is outside a certain range from the boundary of the reference image. Therefore, BIO processing can be simplified.

As described above, when the BIO is not performed, the motion compensation unit 3091 can refer to an off-screen region that is outside a certain range from the boundary of the reference image. In this case, only one padding ( When the BIO mode is off, only off-screen padding can be limited.

(Determining whether the reference block in STEP 111 is outside the screen of the reference image)
Hereinafter, the determination by the motion compensation unit 3091 whether or not the reference block is outside the screen of the reference image will be specifically described.

In the determination of “at least a part of the reference block is outside the screen” in STEP 121 of FIG. 21, the motion compensation unit 3091 determines that the reference block is a reference image when at least one of the following expressions A20-1 to A20-4 is satisfied. It is determined that it falls outside the screen.

The following equation A20-1 is an equation used when determining whether or not the left end of the reference block indicated by the motion vector is outside the screen of the reference image. Expression A20-2 is an expression used when determining whether the right end of the reference block is outside the screen of the reference image. Expression A20-3 is an expression used when determining whether or not the upper end of the reference block is outside the screen of the reference image. Expression A20-4 is an expression used when determining whether or not the lower end of the reference block is outside the screen of the reference image.

xInt-NTAPS / 2 + 1 <0 (Formula A20-1)
xInt + BLKW + NTAPS / 2-1> pic_width-1 (Formula A20-2)
yInt-NTAPS / 2 + 1 <0 (Formula A20-3)
yInt + BLKH + NTAPS / 2-1> pic_height-1 (Formula A20-4)
Here, xInt and yInt in the equations A20-1 to A20-4 are the upper left block coordinates (xPb, yPb) when the motion vector accuracy is 1 / M pel accuracy, as shown in the above equation A8. Points to the integer position (xInt, yInt) and phase (xFrac, yFrac) in the reference image of the in-block coordinates (x, y). NTAPS in Expressions A20-1 to A20-4 indicates the number of filter taps of the motion compensation filter. BLKW and BLKH in the expressions A20-1 to A20-4 indicate the horizontal width and height of the corresponding block, respectively. Further, pic_width and pic_height in the expressions A20-1 to A20-4 indicate the horizontal width and height of the reference image, respectively.

For example, in the case of FIG. 19, NTAPS corresponds to T, and BLKW and BLKH each correspond to N.

(Determining whether the reference block in STEP 111 is outside a certain range from the screen boundary)
The motion compensation unit 3091 may determine whether the reference block is outside a certain range from the boundary of the reference image, instead of determining whether the reference block is outside the screen of the reference image.

In this case, in STEP 131 of FIG. 22, in the determination that “at least a part of the reference block is out of a certain range from the screen boundary”, the motion compensation unit 3091 satisfies at least one of the following formulas A21-1 to 21-4: In this case, it may be determined that at least a part of the reference block is outside a certain range from the boundary of the reference image.

The following expression A21-1 is an expression used when determining whether or not the left end of the reference block indicated by the motion vector is outside a certain range from the boundary of the reference image. Expression A21-2 is an expression used when determining whether or not the right end of the reference block is outside a certain range from the boundary of the reference image. Expression A21-3 is an expression used when determining whether or not the upper end of the reference block is outside a certain range from the boundary of the reference image. Expression A21-4 is an expression used when determining whether or not the lower end of the reference block is outside a certain range from the boundary of the reference image.

xInt-NTAPS / 2 + 1 <-padW (Formula A21-1)
xInt + BLKW + NTAPS / 2-1> pic_width + padW-1 (Formula A21-2)
yInt-NTAPS / 2 + 1 <-padH (Formula A21-3)
yInt + BLKH + NTAPS / 2-1> pic_height + padH-1 (Formula A21-4)
Here, padW and padH in formulas A21-1 to A21-4 mean padW indicating a predetermined range in the horizontal direction and padH indicating a predetermined range in the vertical direction, respectively. .

(Configuration of image encoding device)
Next, the configuration of the image encoding device 11 according to the first embodiment will be described. FIG. 4 is a block diagram illustrating a configuration of the image encoding device 11 according to the first embodiment. The image encoding device 11 includes a predicted image generation unit 101, a subtraction unit 102, a transform / quantization unit 103, an entropy encoding unit 104, an inverse quantization / inverse transform unit 105, an addition unit 106, a loop filter 107, and a prediction parameter memory. 108, a reference picture memory 109, an encoding parameter determination unit 110, and a prediction parameter encoding unit 111. The prediction parameter encoding unit 111 includes an inter prediction parameter encoding unit 112 and an intra prediction parameter encoding unit 113.

The predicted image generation unit 101 generates, for each picture of the image T, a predicted image P of the prediction unit PU for each coding unit CU that is an area obtained by dividing the picture. Here, the predicted image generation unit 101 reads a decoded block from the reference picture memory 109 based on the prediction parameter input from the prediction parameter encoding unit 111. The prediction parameter input from the prediction parameter encoding unit 111 is, for example, a motion vector in the case of inter prediction. The predicted image generation unit 101 reads a block at a position on the reference image indicated by the motion vector with the target PU as a starting point. In the case of intra prediction, the prediction parameter is, for example, an intra prediction mode. A pixel value of an adjacent PU used in the intra prediction mode is read from the reference picture memory 109, and a predicted image P of the PU is generated. The predicted image generation unit 101 generates a predicted image P of the PU using one prediction method among a plurality of prediction methods for the read reference picture block. The predicted image generation unit 101 outputs the generated predicted image P of the PU to the subtraction unit 102.

Note that the predicted image generation unit 101 performs the same operation as the predicted image generation unit 308 already described. For example, FIG. 6 is a block diagram illustrating a configuration of the inter predicted image generation unit 1011 included in the predicted image generation unit 101. The inter prediction image generation unit 1011 includes a motion compensation unit 10111 and a weight prediction unit 10112. Since the motion compensation unit 10111 and the weight prediction unit 10112 have the same configurations as the motion compensation unit 3091 and the weight prediction unit 3094 described above, description thereof is omitted here.

The prediction image generation unit 101 generates a prediction image P of the PU based on the pixel value of the reference block read from the reference picture memory, using the parameter input from the prediction parameter encoding unit. The predicted image generated by the predicted image generation unit 101 is output to the subtraction unit 102 and the addition unit 106.

The subtraction unit 102 subtracts the signal value of the predicted image P of the PU input from the predicted image generation unit 101 from the pixel value of the corresponding PU of the image T, and generates a residual signal. The subtraction unit 102 outputs the generated residual signal to the transform / quantization unit 103.

The transform / quantization unit 103 performs frequency transform on the residual signal input from the subtraction unit 102 and calculates a transform coefficient. The transform / quantization unit 103 quantizes the calculated transform coefficient to obtain a quantized coefficient. The transform / quantization unit 103 outputs the obtained quantization coefficient to the entropy coding unit 104 and the inverse quantization / inverse transform unit 105.

The entropy encoding unit 104 receives the quantization coefficient from the transform / quantization unit 103 and receives the encoding parameter from the prediction parameter encoding unit 111. Examples of input encoding parameters include codes such as a reference picture index refIdxLX, a prediction vector index mvp_LX_idx, a difference vector mvdLX, a prediction mode predMode, and a merge index merge_idx.

The entropy encoding unit 104 generates an encoded stream Te by entropy encoding the input quantization coefficient and encoding parameter, and outputs the generated encoded stream Te to the outside.

The inverse quantization / inverse transform unit 105 inversely quantizes the quantization coefficient input from the transform / quantization unit 103 to obtain a transform coefficient. The inverse quantization / inverse transform unit 105 performs inverse frequency transform on the obtained transform coefficient to calculate a residual signal. The inverse quantization / inverse transform unit 105 outputs the calculated residual signal to the addition unit 106.

The addition unit 106 adds the signal value of the prediction image P of the PU input from the prediction image generation unit 101 and the signal value of the residual signal input from the inverse quantization / inverse conversion unit 105 for each pixel, and performs decoding. Generate an image. The adding unit 106 stores the generated decoded image in the reference picture memory 109.

The loop filter 107 performs a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) on the decoded image generated by the adding unit 106.

The prediction parameter memory 108 stores the prediction parameter generated by the encoding parameter determination unit 110 at a predetermined position for each encoding target picture and CU.

The reference picture memory 109 stores the decoded image generated by the loop filter 107 at a predetermined position for each picture and CU to be encoded.

The encoding parameter determination unit 110 selects one set from among a plurality of sets of encoding parameters. The encoding parameter is a parameter to be encoded that is generated in association with the above-described prediction parameter and the prediction parameter. The predicted image generation unit 101 generates a predicted image P of the PU using each of these encoding parameter sets.

The encoding parameter determination unit 110 calculates a cost value indicating the amount of information and the encoding error for each of a plurality of sets. The cost value is, for example, the sum of a code amount and a square error multiplied by a coefficient λ. The code amount is the information amount of the encoded stream Te obtained by entropy encoding the quantization error and the encoding parameter. The square error is the sum between pixels regarding the square value of the residual value of the residual signal calculated by the subtracting unit 102. The coefficient λ is a real number larger than a preset zero. The encoding parameter determination unit 110 selects a set of encoding parameters that minimizes the calculated cost value. As a result, the entropy encoding unit 104 outputs the selected set of encoding parameters to the outside as the encoded stream Te, and does not output the set of unselected encoding parameters. The encoding parameter determination unit 110 stores the determined encoding parameter in the prediction parameter memory 108.

The prediction parameter encoding unit 111 derives a format for encoding from the parameters input from the encoding parameter determination unit 110 and outputs the format to the entropy encoding unit 104. Deriving the format for encoding is, for example, deriving a difference vector from a motion vector and a prediction vector. Also, the prediction parameter encoding unit 111 derives parameters necessary for generating a prediction image from the parameters input from the encoding parameter determination unit 110 and outputs the parameters to the prediction image generation unit 101. The parameter necessary for generating the predicted image is, for example, a motion vector in units of sub-blocks.

The inter prediction parameter encoding unit 112 derives an inter prediction parameter such as a difference vector based on the prediction parameter input from the encoding parameter determination unit 110. The inter prediction parameter encoding unit 112 derives parameters necessary for generating a prediction image to be output to the prediction image generating unit 101, and an inter prediction parameter decoding unit 303 (see FIG. 5 and the like) derives inter prediction parameters. Some of the configurations are the same as the configuration to be performed. The configuration of the inter prediction parameter encoding unit 112 will be described later.

The intra prediction parameter encoding unit 113 derives a format (for example, MPM_idx, rem_intra_luma_pred_mode) for encoding from the intra prediction mode IntraPredMode input from the encoding parameter determination unit 110.

(Configuration of inter prediction parameter encoding unit)
Next, the configuration of the inter prediction parameter encoding unit 112 will be described. The inter prediction parameter encoding unit 112 is a unit corresponding to the inter prediction parameter decoding unit 303 in FIG. 11, and the configuration is shown in FIG. 9.

The inter prediction parameter encoding unit 112 includes an inter prediction parameter encoding control unit 1121, an AMVP prediction parameter derivation unit 1122, a subtraction unit 1123, a sub-block prediction parameter derivation unit 1125, and a partition mode derivation unit and a merge flag derivation unit (not shown). , An inter prediction identifier deriving unit, a reference picture index deriving unit, a vector difference deriving unit, and the like. The partition mode deriving unit, the merge flag deriving unit, the inter prediction identifier deriving unit, the reference picture index deriving unit, and the vector difference deriving unit are respectively a PU partition mode part_mode, a merge flag merge_flag, an inter prediction identifier inter_pred_idc, a reference picture index refIdxLX, and a difference vector Derives mvdLX. The inter prediction parameter encoding unit 112 outputs the motion vector (mvLX, subMvLX), the reference picture index refIdxLX, the PU partition mode part_mode, the inter prediction identifier inter_pred_idc, or information indicating these to the predicted image generation unit 101. Also, the inter prediction parameter encoding unit 112 entropy PU partition mode part_mode, merge flag merge_flag, merge index merge_idx, inter prediction identifier inter_pred_idc, reference picture index refIdxLX, prediction vector index mvp_LX_idx, difference vector mvdLX and sub-block prediction mode flag subPbMotionFlag. The data is output to the encoding unit 104.

The inter prediction parameter encoding control unit 1121 includes a merge index deriving unit 11211 and a vector candidate index deriving unit 11212. The merge index derivation unit 11211 compares the motion vector and reference picture index input from the encoding parameter determination unit 110 with the motion vector and reference picture index of the merge candidate PU read from the prediction parameter memory 108, and performs merge An index merge_idx is derived and output to the entropy encoding unit 104. The merge candidate is a reference PU (for example, a reference PU in contact with the lower left end, upper left end, and upper right end of the encoding target block) within a predetermined range from the encoding target CU to be encoded. The PU has been processed. The vector candidate index deriving unit 11212 derives a prediction vector index mvp_LX_idx.

When the encoding parameter determination unit 110 determines to use the sub-block prediction mode, the sub-block prediction parameter derivation unit 1125 performs spatial sub-block prediction, temporal sub-block prediction, affine prediction, and matching motion derivation according to the value of subPbMotionFlag. A motion vector and a reference picture index for any sub-block prediction are derived. As described in the description of the image decoding apparatus, the motion vector and the reference picture index are derived by reading out the motion vector such as the adjacent PU and the reference picture block and the reference picture index from the prediction parameter memory 108.

The AMVP prediction parameter derivation unit 1122 has the same configuration as the AMVP prediction parameter derivation unit 3032 (see FIG. 11).

That is, when the prediction mode predMode indicates the inter prediction mode, the motion vector mvLX is input from the encoding parameter determination unit 110 to the AMVP prediction parameter derivation unit 1122. The AMVP prediction parameter derivation unit 1122 derives a prediction vector mvpLX based on the input motion vector mvLX. The AMVP prediction parameter derivation unit 1122 outputs the derived prediction vector mvpLX to the subtraction unit 1123. Note that the reference picture index refIdx and the prediction vector index mvp_LX_idx are output to the entropy encoding unit 104.

The subtraction unit 1123 subtracts the prediction vector mvpLX input from the AMVP prediction parameter derivation unit 1122 from the motion vector mvLX input from the coding parameter determination unit 110 to generate a difference vector mvdLX. The difference vector mvdLX is output to the entropy encoding unit 104.

[Embodiment 2]
The function of the inter prediction image generation unit in the prediction image generation unit of the image decoding device is not limited to the function of the inter prediction image generation unit 309 in the prediction image generation unit 308 of the image decoding device 31 of the image transmission system 1 according to the first embodiment. . The inter-predicted image generation unit functions as an inter-predicted image generation unit 309a in the predicted image generation unit 308a of the image decoding device 31a of the image transmission system 1a (not shown) instead of or in addition to the predicted image generation unit 308. You may have the function of.

Embodiment 2 will be described with reference to FIGS. For convenience of explanation, members having the same functions as those described in the first embodiment are denoted by the same reference numerals and description thereof is omitted.

(Configuration of image transmission system)
The image transmission system 1a includes an image encoding device 11a (not shown) and an image decoding device 31a instead of the image encoding device 11 and the image decoding device 31 in the first embodiment.

(Configuration of image decoding device)
FIG. 23 is a block diagram showing a main configuration of the image decoding device 31a according to the present embodiment. As illustrated in FIG. 23, the image decoding device 31a according to the present embodiment includes a predicted image generation unit 308a instead of the predicted image generation unit 308 in the first embodiment.

23, the predicted image generation unit 308a includes an inter predicted image generation unit 309a instead of the inter predicted image generation unit 309 in the first embodiment. Except for this point, the predicted image generation unit 308a has the same configuration as the predicted image generation unit 308 in the first embodiment.

(Inter prediction image generation unit 309a)
FIG. 24 is a block diagram illustrating a configuration of an inter predicted image generation unit 309a included in the predicted image generation unit 308a according to the second embodiment.

24, the inter prediction image generation unit 309a includes a motion compensation unit 3091a instead of the motion compensation unit 3091 in the first embodiment.

(Motion compensation unit 3091a)
The motion compensation unit 3091a generates a pixel value outside the reading region, which is a pixel value outside the MC reading region regarding the corresponding block in at least one of the first reference image and the second reference image. Here, when generating a predicted image using the BIO mode, the motion compensation unit 3091a prohibits the generation process of the pixel value outside the MC reading area along the vertical direction or the horizontal direction. That is, the motion compensation unit 3091a prohibits BIO padding in the vertical direction or the horizontal direction when generating a predicted image using the BIO mode.

(Specific example of function of motion compensation unit 3091a)
The function of the motion compensation unit 3091a described above will be described in more detail below with reference to FIGS.

25 and 26 are diagrams illustrating regions in which the motion compensation unit 3091a executes BIO padding. FIG. 27 is a diagram illustrating motion vectors in adjacent blocks.

For example, when the motion compensation unit 3091a prohibits the generation process of the pixel value outside the MC reading area along the horizontal direction, the motion compensation unit 3091a performs BIO padding only in the vertical direction in the padding region indicated by the diagonal lines in FIG. .

According to this configuration, since BIO padding in the horizontal direction is not performed, when blocks adjacent in the horizontal direction have the same motion vector ((a) in FIG. 27), these blocks are merged to form one corresponding block. Can be considered. Therefore, when the horizontally adjacent blocks have the same motion vector, the motion compensation unit 3091a merges these blocks, regards them as one corresponding block, and refers to the reference block of the corresponding block to obtain a predicted image. It may be generated.

As described above, the motion compensation unit 3091a, when configured to prohibit horizontal BIO padding, is a corresponding block and a block adjacent to the corresponding block in the horizontal direction and having the same motion vector as the corresponding block. And may be processed by merging. Thereby, the entire processing amount can be reduced.

Also, according to the above-described configuration, the memory bandwidth can be reduced as compared with the case where the motion compensation unit 3091a performs BIO padding separately for each block boundary. That is, the memory bandwidth can be reduced as compared with the case where the motion compensation unit 3091a performs BIO padding on two blocks separately.

Further, the motion compensation unit 3091a may prohibit the BIO padding in the vertical direction instead of the configuration for prohibiting the BIO padding in the horizontal direction. In this case, the motion compensation unit 3091a performs BIO padding in the padding area indicated by hatching in FIG.

In this configuration, since BIO padding in the vertical direction is not performed, when adjacent blocks in the vertical direction have the same motion vector ((b) in FIG. 27), these blocks are merged and regarded as one corresponding block. be able to. Therefore, when the adjacent blocks in the vertical direction have the same motion vector, the motion compensation unit 3091a merges these blocks, regards them as one corresponding block, and refers to the reference block of the corresponding block to obtain a predicted image. It may be generated.

As described above, the motion compensation unit 3091a, when configured to prohibit vertical BIO padding, includes a corresponding block and a neighboring block that is adjacent to the corresponding block in the vertical direction and has the same motion vector as the corresponding block. Merge and process. Thereby, the entire processing amount can be reduced.

Note that the motion compensation unit 3091a may include a selection unit 4000 (not shown) that selects whether the direction in which BIO padding is prohibited is the horizontal direction or the vertical direction. For example, as shown in the following specific example, the selection unit 4000 sets the direction for prohibiting the BIO padding to the horizontal direction depending on whether the block having the same motion vector is adjacent to the horizontal direction or the vertical direction. Alternatively, it may be selected whether or not the vertical direction is used.

(Specific Example 1 of Generation of Pixel Value Outside Reading Area by Motion Compensation Unit 3091a)
First, specific example 1 of generation of a pixel value outside the reading area by the motion compensation unit 3091a will be described with reference to FIG. FIG. 28 is a flowchart illustrating an example of a processing flow for determining whether to prohibit BIO padding in the horizontal direction when the motion compensation unit 3091a generates a predicted image using the BIO mode.

In STEP 211, the motion compensation unit 3091a is allowed to execute (permit) both horizontal and vertical BIO padding.

In STEP 212, the motion compensation unit 3091a determines whether or not the motion vector of the corresponding block is the same as the motion vector of the block adjacent to the corresponding block in the horizontal direction.

When the motion compensation unit 3091a determines that the motion vector of the corresponding block and the motion vector of the block adjacent to the corresponding block in the horizontal direction are the same (YES in STEP 212), the process proceeds to STEP 213. When the motion compensation unit 3091a determines that the motion vector of the corresponding block is not the same as the motion vector of a block adjacent to the corresponding block in the horizontal direction (No in STEP 212), the motion compensation unit 3091a BIO padding is performed in both directions, and the process ends.

In STEP 213, the motion compensation unit 3091a prohibits BIO padding in the horizontal direction, executes only BIO padding in the vertical direction, and ends the process.

(Specific Example 2 of Generation of Pixel Value Outside Reading Area by Motion Compensation Unit 3091a)
Next, specific example 2 of generation of a pixel value outside the reading area by the motion compensation unit 3091a will be described with reference to FIG. FIG. 29 is a flowchart illustrating an example of a process flow for determining whether to prohibit BIO padding in the vertical direction when the motion compensation unit 3091a generates a predicted image using the BIO mode. Since STEP 221 in FIG. 29 is the same as STEP 211 in FIG. 28, description thereof is omitted.

In STEP 222, the motion compensation unit 3091a determines whether or not the motion vector of the corresponding block is the same as the motion vector of the block adjacent to the corresponding block in the vertical direction.

When the motion compensation unit 3091a determines that the motion vector of the corresponding block and the motion vector of the block adjacent to the corresponding block in the vertical direction are the same (Yes in STEP 222), the process proceeds to STEP 223. When the motion compensation unit 3091a determines that the motion vector of the corresponding block and the motion vector of the block adjacent to the corresponding block in the vertical direction are not the same (NO in STEP 222), the motion compensation unit 3091a BIO padding is performed in both directions, and the process ends.

In STEP 223, the motion compensation unit 3091a prohibits the BIO padding in the vertical direction, executes only the BIO padding in the horizontal direction, and ends the process.

(Specific Example 3 of Generation of Pixel Value Outside Reading Area by Motion Compensation Unit 3091a)
In the specific examples 1 and 2 described above, the motion compensation unit 3091a determines whether the motion vector of the corresponding block is the same as the motion vector of the block adjacent to the corresponding block in one of the horizontal and vertical directions. Judging. However, in the present embodiment, the present invention is not limited to this, and the motion compensation unit 3091a determines whether or not the motion vectors of blocks adjacent in the horizontal direction are the same, and the motion vector of blocks adjacent in the vertical direction. It may be determined whether or not the same.

Hereinafter, specific example 3 of generation of a pixel value outside the reading area by the motion compensation unit 3091a will be described with reference to FIG. FIG. 30 is a flowchart illustrating an example of a processing flow for determining whether to prohibit BIO padding in the horizontal direction or the vertical direction when the motion compensation unit 3091a generates a predicted image using the BIO mode.

30 are the same as STEP 211 and STEP 213 in FIG. 28, and thus description thereof is omitted. STEP 234 and STEP 235 in FIG. 30 are the same as STEP 222 and STEP 223 in FIG.

In STEP232, the motion compensation unit 3091a determines whether or not the motion vector of the corresponding block is the same as the motion vector of the block adjacent to the corresponding block in the horizontal direction.

When the motion compensation unit 3091a determines that the motion vector of the corresponding block and the motion vector of the block adjacent to the corresponding block in the horizontal direction are the same (Yes in STEP 232), the process proceeds to STEP 233. When the motion compensation unit 3091a determines that the motion vector of the corresponding block and the motion vector of the block adjacent to the corresponding block in the horizontal direction are not the same (No in STEP 232), the process proceeds to STEP 234.

As shown in the specific example 3 above, the motion compensation unit 3091a determines whether or not the motion vectors of the blocks adjacent to the corresponding block are the same in the order of the horizontal direction and the vertical direction, and the BIO padding in the horizontal direction and the vertical direction. It may be determined whether or not to prohibit. Here, when blocks adjacent in the horizontal direction or the vertical direction have the same motion vector, the motion compensation unit 3091a prohibits BIO padding. In this case, the motion compensation unit 3091a may regard blocks adjacent in the horizontal direction or the vertical direction as one corresponding block and merge them to generate a predicted image of the corresponding block.

In FIG. 30, the motion compensation unit 3091a determines whether or not the motion vectors of the blocks adjacent to the corresponding block are the same in the order of the horizontal direction and the vertical direction. Not. The motion compensation unit 3091a may determine whether or not the motion vectors of the blocks adjacent to the corresponding block are the same in the order of the vertical direction and the horizontal direction.

(Modification of motion compensation unit 3091a)
The motion compensation unit 3091a may prohibit the BIO padding in the horizontal direction or the vertical direction described above when referring to the reference block of the corresponding block illustrated in FIG.

FIG. 31 is a schematic diagram of a corresponding block used when the motion compensation unit 3091a uses the BIO mode and a block adjacent to the corresponding block in the horizontal direction or the vertical direction.

For example, the motion compensation unit 3091a may prohibit horizontal BIO padding when using the BIO mode for a vertically corresponding block. In this case, if the vertically corresponding block and the block adjacent in the horizontal direction of the vertically corresponding block have the same motion vector, the motion compensation unit 3091a merges these blocks and regards it as one corresponding block. A predicted image may be generated from the reference block of the corresponding block. Thus, even when the BIO mode is used for the vertical corresponding block and the block horizontally adjacent to the vertical corresponding block, the motion compensation unit 3091a reduces the overall processing amount in the same manner as in the above example. Can be reduced. In addition, the memory bandwidth can be reduced.

Here, as an example of the vertically corresponding block shown in FIG. 31A and the size of the block adjacent to the vertically corresponding block in the horizontal direction, 4 × 8 pixels can be cited. The motion compensation unit 3091a uses 4x8 pixels as the minimum unit block size, and when these blocks are merged and the BIO mode is used, processing is performed for each 8x8 pixel unit, and the overall processing amount is large. You do n’t have to. As a result, the memory bandwidth can be suppressed from becoming larger.

Also, the motion compensation unit 3091a may prohibit BIO padding in the vertical direction when the BIO mode is used for a horizontally long corresponding block. In this case, the motion compensation unit 3091a merges the horizontally long corresponding block and the block adjacent to the horizontally long corresponding block in the vertical direction into the same motion vector and regards it as one corresponding block. The predicted image may be generated with reference to the reference block of the corresponding block.

Here, as an example of the horizontally corresponding block and the size of the block adjacent to the horizontally corresponding block shown in FIG. 31B in the vertical direction, 8 × 4 pixels can be cited. The motion compensation unit 3091a uses 8x4 pixels as the minimum unit block size, and when these blocks are merged and the BIO mode is used, processing is performed for each 8x8 pixel unit, and the overall processing amount is large. It ’s not necessary. As a result, the memory bandwidth can be suppressed from becoming larger.

Note that the motion compensation unit 3091a may refer to a syntax indicating a block division method when determining whether to prohibit horizontal padding or vertical padding. The motion compensation unit 3091a refers to the syntax indicating the block division method, and prohibits horizontal padding when it finally becomes a block of 4x8 pixels, and prohibits vertical padding when it becomes a block of 8x4 pixels. It may be a configuration.

(Configuration of image encoding device)
The image encoding device 11a includes a predicted image generation unit 101a (not shown) instead of the predicted image generation unit 101 in the first embodiment. Moreover, the predicted image generation unit 101a includes an inter predicted image generation unit 1011a (not shown) instead of the inter predicted image generation unit 1011 in the first embodiment.

The inter predicted image generation unit 1011a includes a motion compensation unit 10111a instead of the motion compensation unit 10111 in the first embodiment. Except for this point, the inter prediction image generation unit 1011a has the same configuration as the inter prediction image generation unit 1011 in the first embodiment.

Since the motion compensation unit 10111a has the same configuration as the motion compensation unit 3091a described above, description thereof is omitted here.

[Embodiment 3]
The function of the prediction parameter decoding unit of the image decoding device is not limited to the function of the inter prediction parameter decoding unit 303 in the prediction parameter decoding unit 302 of the image decoding device 31 of the image transmission system 1 according to the first embodiment. The function of the inter prediction parameter decoding unit is the function of the inter prediction parameter decoding unit 303b of the image transmission system 1b (not shown) image decoding device 31b (not shown) instead of or in addition to the inter prediction parameter decoding unit 303. May be provided.

Embodiment 3 will be described with reference to FIGS. For convenience of explanation, members having the same functions as those described in the first embodiment are denoted by the same reference numerals and description thereof is omitted.

(Configuration of image transmission system)
The image transmission system 1b includes an image decoding device 31b (not shown) and an image encoding device 11b (not shown) instead of the image decoding device 31 and the image encoding device 11 in the first embodiment. The image decoding device 31b according to the present embodiment includes a prediction parameter decoding unit 302b (not shown) instead of the prediction parameter decoding unit 302 in the first embodiment. Further, the prediction parameter decoding unit 302 includes an inter prediction parameter decoding unit 303b instead of the inter prediction parameter decoding unit 303 in the first embodiment. Except for this point, the image decoding device 31b has the same configuration as the image decoding device 31 according to the first embodiment. The image encoding device 11b will be described later.

(Inter prediction parameter decoding unit 303b)
FIG. 34 is a block diagram illustrating a configuration of an inter prediction parameter decoding unit 303b included in the prediction parameter decoding unit 302b. As illustrated in FIG. 34, the inter prediction parameter decoding unit 302b includes a sub-block prediction parameter deriving unit 3037b instead of the sub-block prediction parameter deriving unit 3037 in the first embodiment. Except for this point, the inter prediction parameter decoding unit 302b has the same configuration as the inter prediction parameter decoding unit 302 in the first embodiment. As shown in FIG. 34, the sub-block prediction parameter derivation unit 3037b performs at least one of the spatio-temporal sub-block prediction unit 30371, the affine prediction unit 30372, and the matching motion derivation unit 30373 that performs sub-block prediction in the sub-block prediction mode. With one.
(Subblock prediction mode flag)
Here, a method of deriving a sub-block prediction mode flag subPbMotionFlag indicating whether or not a prediction mode of a certain PU is a sub-block prediction mode in the image decoding device 31b and the image encoding device 11 will be described. The image decoding device 31b and the image encoding device 11 set the sub-block prediction mode flag subPbMotionFlag based on which of spatial sub-block prediction SSUB, temporal sub-block prediction TSUB, affine prediction AFFINE, and matching motion derivation MAT described later is used. To derive. For example, when the prediction mode selected in a certain PU is N (for example, N is a label indicating the selected merge candidate), the sub-block prediction mode flag subPbMotionFlag may be derived by the following equation.

subPbMotionFlag = (N == TSUB) || (N == SSUB) || (N == AFFINE) || (N == MAT)
Here, || represents a logical sum (the same applies hereinafter).

Also, the image decoding device 31b and the image encoding device 11 may be configured to perform some predictions among the spatial sub-block prediction SSUB, temporal sub-block prediction TSUB, affine prediction AFFINE, and matching motion derivation MAT. That is, when the image decoding device 31b and the image encoding device 11 are configured to perform spatial subblock prediction SSUB or affine prediction AFFINE, the subblock prediction mode flag subPbMotionFlag may be derived as follows.

subPbMotionFlag = (N == SSUB) || (N == AFFINE)
(Sub-block prediction unit)
Next, the sub-block prediction unit will be described.

(Spatio-temporal sub-block prediction unit 30371)
The spatio-temporal sub-block prediction unit 30371 calculates the target PU from the motion vector of the PU on the reference image temporally adjacent to the target PU (for example, the immediately preceding picture) or the motion vector of the PU spatially adjacent to the target PU. The motion vector of the sub-block obtained by dividing is derived. Specifically, the motion vector spMvLX [xi] [yj] (xi = xPb) of each sub-block in the target PU is obtained by scaling the motion vector of the PU on the reference image according to the reference picture referenced by the target PU. + nSbW * i, yj = yPb + nSbH * j, i = 0, 1, 2, ..., nPbW / nSbW-1, j = 0, 1, 2, ..., nPbH / nSbH-1) Derived (temporal sub-block prediction). Here, (xPb, yPb) is the upper left coordinate of the target PU, nPbW, nPbH are the size of the target PU, and nSbW, nSbH are the sizes of the sub-blocks.
Also, by calculating a weighted average according to the distance between the motion vector of the PU adjacent to the target PU and the sub-block obtained by dividing the target PU, the motion vector spMvLX [ xi] [yj] (xi = xPb + nSbW * i, yj = yPb + nSbH * j, i = 0, 1, 2, ..., nPbW / nSbW-1, j = 0, 1, 2, ... , NPbH / nSbH-1) may be derived (spatial sub-block prediction).

The above-described temporal sub-block prediction candidate TSUB and spatial sub-block prediction candidate SSUB are selected as one mode (merge candidate) of the merge mode.

(Affine prediction unit 30372)
The affine prediction unit 30372 derives the affine prediction parameters of the target PU. In this embodiment, motion vectors of two or three control points of the target PU are derived as affine prediction parameters. For example, in the case of three control points (V0, V1, and V2), motion vectors (mv0_x, mv0_y), (mv1_x, mv1_y), and (mv2_x, mv2_y) are derived.

Specifically, the motion vector of each control point may be derived by predicting from the motion vector of the adjacent PU of the target PU. Alternatively, the motion vector of each control point may be derived from the sum of the control point prediction vector mvpLX and the difference vector mvdLX derived from the encoded data.

FIG. 35 shows an example in which the motion vector spMvLX of each sub-block constituting the target PU (nPbW × nPbH) is derived from the motion vector (mv0_x, mv0_y) of the control point V0 and the motion vector (mv1_x, mv1_y) of V1. FIG. The motion vector spMvLX of each subblock is derived as a motion vector for each point located at the center of each subblock, as shown in FIG.

The affine prediction unit 30372 is based on the affine prediction parameters of the target PU, and the motion vector spMvLX [xi] [yj] (xi = nSbW * i, yj = nSbH * j, i = 0, 1 of the target PU , 2,..., NPbW / nSbW-1, j = 0, 1, 2, ..., nPbH / nSbH-1) are derived using the following equations.

spMvLX [xi] [yj] [0] = mv0_x + (mv1_x-mv0_x) / nPbW * (xi + nSbW / 2)-(mv1_y-mv0_y) / nPbH * (yj + nSbH / 2)
spMvLX [xi] [yj] [1] = mv0_y + (mv1_y-mv0_y) / nPbW * (xi + nSbW / 2) + (mv1_x-mv0_x) / nPbH * (yj + nSbH / 2)
Here, xPb and yPb are the upper left coordinates of the target PU, nPbW and nPbH are the width and height of the target PU, and nSbW and nSbH are the width and height of the sub-block.

[Sub-block motion vector derivation processing]
Hereinafter, as an example of a more specific implementation configuration, the process flow in which the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 derives the motion vector mvLX of each sub-block using affine prediction is described in steps. To do. The process in which the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 derives the sub-block motion vector mvLX using affine prediction includes the following four steps (STEP 1) to (STEP 4).

(STEP 1) Derivation of control point vector The affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 uses representative points (here, V0 and V1) of the target block as two or more control points used for affine prediction for deriving candidates. ) To derive the respective motion vectors. Note that a point on the target block or a point near the target block is used as the representative point of the block. In this specification, a block representative point used as a control point for affine prediction is referred to as a “block control point”.

(STEP2) Derivation of sub-block vector The affine prediction unit 30372 or AMVP prediction parameter derivation unit 3032 determines the target block from the motion vector of the block control points (control points V0 and V1) that are representative points of the target block derived in STEP1. Is a step of deriving a motion vector of each sub-block included in. From (STEP 1) and (STEP 2), the motion vector mvLX of each sub-block is derived.

(STEP 3) Sub-block motion compensation The motion compensation unit 3091 receives the reference picture from the reference picture memory 306 based on the prediction list use flag predFlagLX, the reference picture index refIdxLX, and the motion vector mvLX input from the inter prediction parameter decoding unit 303. Performs motion compensation in units of sub-blocks to generate motion compensated image predSamplesLX by reading out and filtering a block at a position shifted by motion vector mvLX from the position of the target block on the reference picture specified by index refIdxLX It is a process.

(STEP 4) Storage of motion vectors of sub-blocks In the case of the AMVP mode, the motion vector mvLX of each sub-block derived by the AMVP prediction parameter deriving unit 3032 in (STEP 2) is stored in the prediction parameter memory 307. Similarly, also in the merge mode, the motion vector mvLX of each sub-block derived by the affine prediction unit 30372 in (STEP 2) is stored in the prediction parameter memory 307.

Note that the derivation of the sub-block motion vector mvLX using affine prediction can be performed in both the AMVP mode and the merge mode. In the following, some processes of (STEP 1) to (STEP 4) will be described for the AMVP mode and the merge mode, respectively.

(STEP1 details)
First, regarding the processing of (STEP 1), the AMVP mode and the merge mode will be described below with reference to FIG. FIG. 37 is a diagram illustrating an example of the position of a prediction unit (reference block) used for deriving a motion vector of a control point in the AMVP mode and the merge mode.

(Derivation of motion vector of control point in AMVP mode)
The AMVP prediction parameter derivation unit 3032 decodes the motion vector mvLX of two or more control points (for example, V0 and V1, or V0, V1, and V2). Specifically, the prediction vector mvpLX and the difference vector are added for each control point.

More specifically, the AMVP prediction parameter derivation unit 3032 derives a prediction vector candidate of the control point VN (N = 0..2) and stores it in the prediction vector candidate list mvpListVNLX []. Further, the AMVP prediction parameter deriving unit 3032 derives a motion vector (mvi_x, mvi_y) of the control point VN from the encoded data from the prediction vector index mvpVN_LX_idx of the point VN and the difference vector mvdVNLX by the following equation.

mvi_x = mvLX [0] = mvpListVNLX [mvpVN_LX_idx] [0] + mvdVNLX [0]
mvi_y = mvLX [1] = mvpListVNLX [mvpVN_LX_idx] [1] + mvdVNLX [1]
Details of the prediction vector derivation of each control point will be described with reference to FIG. As shown in FIG. 37A, the AMVP prediction parameter derivation unit 3032 is adjacent to one of the representative points, and blocks A, B, and C (reference blocks) that share the representative point (vertex) with the target block. ) Is selected with reference to mvp_LX_idx to obtain a prediction vector mvpLX of the representative point V0. Further, the AMVP prediction parameter derivation unit 3032 sets one of the motion vectors of the blocks D and E as the prediction vector mvpLX of the representative point V1.

Similarly, as shown in FIG. 37 (b), the AMVP prediction parameter derivation unit 3032 derives a prediction vector mvpLX of the representative point V2 from any of the motion vectors of the blocks F and G (reference block). May be.

In addition, the position of the control point in STEP1 is not limited to the above. A vertex at the lower right of the target block or points around the target block as described later may be used.

(General formula for affine prediction)
A general equation for the four-parameter affine will be described below. The motion vector (mvi_x, mvi_y) of the sub-block coordinates (xi, yj) has four parameters (mv_x, mv_y, ev, rv), that is, the motion vector of the expansion and rotation center (translation vector mv_x, mv_y) and the expansion parameter Using ev and the rotation parameter rv, the following general formula (eq1) can be obtained.

mvi_x = mv_x + ev * xi-rv * yj
mvi_y = mv_y + rv * xi + ev * yj (eq1)
Note that (mv_x, mv_y) may be (mv0_x, mv0_y). Using the motion vector (mv0_x, mv0_y) of the control point V0 and the motion vector (mv1_x, mv1_y) of the control point V1, (ev, rv) is expressed by the following equation.

ev = (mv1_x-mv0_x) / W
rv = (mv1_y-mv0_y) / W
Subsequently, a general formula of a six-parameter affine will be described below. The motion vector (mvi_x, mvi_y) of the point Vi at the position (xi, yj) starting from the point V0 of the position (0, 0) and the motion vector mv_x, my_y has four parameters (mv_x, mv_y, ev1, rv1, ev2, rv2) can be obtained by the following general formula (eq2).

mvi_x = mv_x + ev1 * xi + rv2 * yj
mvi_y = mv_y + rv1 * xi + ev2 * yj (eq2)
(Derivation of control point motion vectors in merge mode)
The affine prediction unit 30372 refers to the prediction parameter memory 307 for a prediction unit including blocks A to E as shown in (c) of FIG. 37, and confirms whether or not affine prediction is used. A prediction unit that is first found using affine prediction (here, referred to as reference block A in FIG. 37C) is selected as an adjacent block (merge reference block), and a motion vector is derived.

The affine prediction unit 30372 moves the control point (for example, V0, V1) from the motion vector of the two or three points (point v0, point v1, point v2 in (d) of FIG. 37) of the selected merge reference block. Derive a vector. In the example shown in (d) of FIG. 37, the horizontal width of the block for which the motion vector is predicted is W, the height is H, and adjacent blocks (adjacent blocks including the block A in the example in the figure). The width is w and the height is h.

Specifically, affine parameters (ev, rv), (evBW, rvBW), (evBH, rvBH), (ev1, rv1, ev2, rv2) and (ev1BW, rv1BW, ev2BH, rv2BH) and sub block motion vectors are derived from the derived affine parameters. The derivation method will be described later in (Step 2 details).

(Limit of motion vector difference)
The image encoding device 11 (more specifically, the encoding parameter determination unit 110) or the affine prediction unit 30372 limits the range of the difference between the motion vectors at the plurality of derived control points.

Hereinafter, the limitation on the range of motion vector difference will be described in more detail, taking as an example the case where the motion vector (mv0_x, mv0_y) of the control point V0 and the motion vector (mv1_x, mv1_y) of the control point V1 are derived. To do.

In the case of the AMVP mode, the encoding parameter determination unit 110 derives a motion vector so that the difference value (dmv_x, dmv_y) between (mv0_x, mv0_y) and (mv1_x, mv1_y) satisfies (Formula XX1), and the prediction parameter code The encoding unit 111 encodes the difference vector. For example, the encoding parameter determination unit 110 sets the motion vector search range so that the motion vectors of the control point V0 and the control point V1 satisfy Expression XX1 in the motion vector search.

dmv_x <= THMVX
dmv_y <= THMVY (Formula XX1)
Here, THMVX and THMVY are predetermined threshold values. Hereinafter, the restricted motion vector at the control point V1 is denoted as (mv1_x ′, mv1_y ′).

The AMVP prediction parameter derivation unit 3032 derives a prediction vector of the motion vector selected by the image encoding device 11. A motion vector is derived by the addition unit 3038 adding the derived prediction vector and the difference vector.

In the merge mode, the affine prediction unit 30372 limits the motion vector of the other control point according to the magnitude of the motion vector of one control point of the control point V0 and the control point V1.

For example, a case will be described in which the affine prediction unit 30372 restricts the motion vector (mv1_x, mv1_y) of the control point V1 before the restriction to the range from the motion vector (mv0_x, mv0_y) of the control point V0 to (rangeW, rangeH). In this case, the motion vector (mv1_x ′, mv1_y ′) of the control point V1 after the restriction is expressed as the following equation.

mv1_x '= clip3 (mv0_x-rangeW, mv0_x + rangeW, mv1_x)
mv1_y '= clip3 (mv0_y-rangeH, mv0_y + rangeH, mv1_y)
When the encoding parameter determination unit 110 or the affine prediction unit 30372 does not limit the range of such a motion vector difference, if the motion vector difference between the control points is large, a reference indicated by the motion vector of each sub-block described later The block becomes a wide range. In this case, the predicted image generation unit 101 needs to generate a predicted image by reading for each reference block indicated by the motion vector of each sub-block, which increases the data transfer capacity, that is, the memory bandwidth.

On the other hand, when the range of motion vector difference is limited as described above, the reference block indicated by the motion vector of each sub-block falls within a specific range. Therefore, the predicted image generation unit 101 can read a region including the reference block group at a time and generate a predicted image. As a result, the memory bandwidth can be reduced.

(STEP2 details)
In (STEP 2), the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 calculates an affine parameter, and derives a motion vector of each sub-block using the derived affine parameter.

(STEP2: Derivation of affine parameters)
The affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 refers to the motion vectors of two or three control points among the control points V0, V1, and V2, which are representative points on the block, derived in (STEP 1). , (STEP 2), first, affine parameters are calculated. That is, the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 performs any one of the following 4-parameter affine and 6-parameter affine.
-Four-parameter affine The affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 derives affine parameters (ev, rv) from the motion vectors of two control points among the control points V0, V1, and V2.

For example, as described above, the affine prediction unit 30372 calculates the affine parameters from the two motion vectors of the motion vector (mv0_x, mv0_y) of the control point V0 and the motion vector (mv1_x ′, mv1_y ′) after the restriction of the control point V1. Deriving (ev, rv).
6-parameter affine The affine prediction unit 30372 or AMVP prediction parameter derivation unit 3032 derives affine parameters (ev1, rv1, ev2, rv2) from the three motion vectors of the control points V0, V1, and V2.

Here, when the affine predictor 30372 derives the affine parameters (ev1, rv1, ev2, rv2), the affine predictor 30372 determines the magnitude of two of the three motion vectors as the motion vector of a certain control point. Limit according to size. About the method of control, since a clip function is used similarly to the above-mentioned, description is abbreviate | omitted.

For example, the affine prediction unit 30372 includes the motion vector (mv0_x, mv0_y) of the control point V0, the motion vector after restriction of the control point V1 (mv1_x ′, mv1_y ′), and the motion vector after restriction of the control point V2 (mv2_x ′, The affine parameters (ev1, rv1, ev2, rv2) are derived from the three motion vectors mv2_y ′).

In the following description, the conversion process consisting of the translation vector (mv_x, mv_y) and the affine parameters (ev, rv) is referred to as “4-parameter affine”, the translation vector (mv_x, mv_y) and the affine parameters (ev1, rv1, ev2, rv2) is referred to as “6-parameter affine”.

(Integer precision affine parameters)
Affine parameters can be handled as integers instead of decimal numbers. The decimal affine parameters are (ev, rv), (ev1, rv1, ev2, rv2), and the integer affine parameters are (evBW, rvBW), (evBH, rvBH), (ev1BW, rv1BW, ev2BH, rv2BH) It expresses.

The integer affine parameter is obtained by multiplying the decimal affine parameter by a constant (constant for integerization) according to the sub-block size (BW, BH). Specifically, the following relationship exists between the affine parameters for decimal numbers and the affine parameters for integers.

evBW = ev * BW = ev << log2 (BW)
rvBW = rv * BW = rv << log2 (BW)
evBH = ev * BH = ev << log2 (BH)
rvBH = rv * BH = rv << log2 (BH)
ev1BW = ev1 * BW = ev1 << log2 (BW)
rv1BW = rv1 * BH = rv1 << log2 (BW)
ev2BH = ev2 * BW = ev2 << log2 (BH)
rv2BH = rv2 * BH = rv2 << log2 (BH)
When the width BW and the height BH of the sub-block are equal, (evBW, rvBW) = (evBH, rvBH), so it is not necessary to distinguish (evBW, rvBW) from (evBH, rvBH). That is, in derivation of the following affine parameters, it is sufficient to derive only (evBW, rvBW) or only (evBH, rvBH).

(Decimal point precision affine parameter derivation)
Hereinafter, derivation of the decimal precision affine parameter by the merge mode will be described.

The affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 uses the affine parameters (ev, rv), the motion vector (mv0_x, mv0_y) of the control point V0, and the magnitude of the motion vector of the control point V0 at the control point V1. Ev = (mv1_x '-mv0_x) / W from motion vector (mv1_x', mv1_y ') restricted according to
rv = (mv1_y '-mv0_y) / W
May be derived.

Further, the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 performs motion limited according to the motion vector (mv0_x, mv0_y) of the control point V0 and the motion vector of the control point V0 at the control point V2. From vector (mv2_x ', mv2_y') to ev = (mv2_y '-mv0_y) / H
rv =-(mv2_x '-mv0_x) / H
May be derived.

Also, the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 has a motion vector (mv2_x ′, mv2_x ′,) limited according to the motion vector of the control point V1 and the magnitude of the motion vector of the control point V1 at the control point V2. mv2_y ') to ev = ((mv1_x-mv2_x')-(mv1_y-mv2_y ')} / 2W
rv = {(mv1_x-mv2_x ') + (mv1_y-mv2_y')} / 2W
May be derived.

Also, the affine prediction unit 30372 has a motion vector (mv3_x ′, mv3_y ′) limited according to the motion vector (mv0_x, mv0_y) of the control point V0 and the magnitude of the motion vector of the control point V0 at the control point V3. ) To ev = (mv3_x '-mv0_x + mv3_y'-mv0_y) / 2W
rv = (-mv3_x '+ mv0_x + mv3_y'-mv0_y) / 2W
May be derived.

Also, the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 determines the affine parameters according to the motion vector (mv0_x, mv0_y) of the control point V0 and the magnitude of the motion vector of the control point V0 at the control points V1 and V2. (Mv1_x ', mv1_y'), (mv2_x ', mv2_y') to ev1 = (mv1_x '-mv0_x) / W
rv2 = (mv2_x '-mv0_x) / H
rv1 = (mv1_y '-mv0_y) / W
ev2 = (mv2_y '-mv0_y) / H
May be derived.

It should be noted that the affine parameter when the motion vector of the control point is not limited is derived as an expression obtained by taking “′” from the above expression.

(Derivation of integer precision affine parameters)
Hereinafter, derivation of integer precision affine parameters by the merge mode will be described.

The affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 uses the affine parameters (evBW, rvBW), (evBH, rvBH), the motion vector (mv0_x, mv0_y) of the control point V0, and the control point V0 at the control point V1. EvBW = (mv1_x '-mv0_x) >> shiftWBW from the motion vector of the restricted control point V1 limited according to the size of the motion vector
rvBW = (mv1_y '-mv0_y) >> shiftWBW
evBH = (mv1_x '-mv0_x) >> shiftWBH
rvBH = (mv1_y '-mv0_y) >> shiftWBH
shiftWBW = log2 (W)-log2 (BW)
shiftWBH = log2 (W)-log2 (BH)
May be derived.

Note that evBW, rvBW, evBH, and rvBH satisfy the following expressions, respectively.
evBW = ev * BW, rvBW = rv * BW, evBH = ev * BH, rvBH = rv * BH
Further, the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 performs motion limited according to the motion vector (mv0_x, mv0_y) of the control point V0 and the motion vector of the control point V0 at the control point V2. From vector (mv2_x ', mv2_y') to evBW = (mv2_y '-mv0_y) >> shiftHBW
rvBW =-(mv2_x '-mv0_x) >> shiftHBW
evBH = (mv2_y '-mv0_y) >> shiftHBH
rvBH =-(mv2_x '-mv0_x) >> shiftHBH
shiftHBW = log2 (H)-log2 (BW)
shiftHBH = log2 (H)-log2 (BH)
May be derived.

Further, the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 performs the motion limited according to the motion vector (mv1_x, mv1_y) of the control point V1 and the motion vector of the control point V1 at the control point V2. From vector (mv2_x ', mv2_y') to evBW = ((mv1_x-mv2_x ')-(mv1_y-mv2_y')} >> (shiftWBW + 1)
rvBW = ((mv1_x-mv2_x ') + (mv1_y-mv2_y')} >> (shiftWBW + 1)
evBH = ((mv1_x-mv2_x ')-(mv1_y-mv2_y')} >> (shiftWBH + 1)
rvBH = ((mv1_x-mv2_x ') + (mv1_y-mv2_y')} >> (shiftWBH + 1)
May be derived.

Further, the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 performs motion limited according to the motion vector (mv0_x, mv0_y) of the control point V0 and the motion vector of the control point V0 at the control point V3. From vector (mv3_x ', mv3_y') to evBW = (mv3_x '-mv0_x + mv3_y'-mv0_y) >> (shiftWBW + 1)
rvBW = (-mv3_x '+ mv0_x + mv3_y'-mv0_y) >> (shiftWBW + 1)
evBH = (mv3_x '-mv0_x + mv3_y'-mv0_y) >> (shiftHBH + 1)
rvBH = (-mv3_x '+ mv0_x + mv3_y'-mv0_y) >> (shiftHBH + 1)
May be derived.

Also, the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 determines the affine parameters according to the motion vector (mv0_x, mv0_y) of the control point V0 and the magnitude of the motion vector of the control point V0 at the control points V1 and V2. (Mv1_x ', mv1_y'), (mv2_x ', mv2_y') to ev1BW = (mv1_x '-mv0_x) >> shiftWBW
rv2BH = (mv2_x '-mv0_x) >> shiftHBH
rv1BW = (mv1_y '-mv0_y) >> shiftWBW
ev2BH = (mv2_y '-mv0_y) >> shiftHBH
May be derived.

If the motion vector of the control point is not limited, the above-described integer precision affine parameter derived by the AMVP prediction parameter deriving unit 3032 is derived as an expression obtained by taking “′” from the above expression.

(STEP2: Derivation of motion vector of sub-block)
The affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 derives the motion vector of the sub-block using the affine parameter obtained by the above formula.

In the case of 4-parameter affine, sub block motion vectors (mvij_x, mvij_y) of subblock coordinates (xi, yj) are calculated from the affine parameters (ev, rv) calculated in the AMVP mode and the merge mode, respectively, as follows: Derived from AF4P_float.

mvij_x = mv_x + ev * xi-rv * yj
mvij_y = mv_y + rv * xi + ev * yj (Formula AF4P_float)
Here, the relationship between the sub-block coordinates (xi, yj), which is a pixel unit position, and the sub-block position (i, j), which is a sub-block unit position, is as shown in the following formula XYIJ.

xi = BW / 2 + BW * i
yj = BH / 2 + BH * j (formula XYIJ)
i = 0, 1, 2, ... (W / BW) -1, j = 0, 1, 2, ... (H / BH) -1
i = (xi-BW / 2) / BW
j = (xj-BH / 2) / BH
Although the motion vector (mv0_x, mv0_y) of the control point V0 may be used as the affine parameters mv_x, mv_y of the translation vector, the present invention is not limited to this. For example, as a motion vector other than the control point V0, a motion vector at the control point V1, a motion vector at the control point V2, and a motion vector at the control point V3 may be used as translation vectors.

If the affine parameters are integer affine parameters (evBW, rvBW) and (evBH, rvBH), the motion vector (mvij_x, mvij_y) of the subblock at the subblock position (i, j) is derived by the following formula AF4P_integer To do.

mvij_x = mv_x + of_x + evBW * i-rvBH * j
mvij_y = mv_y + of_y + rvBW * i + evBH * j (expression AF4P_integer)
Where ofx = ((evBW-rvBH) >> 1), ofy = ((rvBW + evBH) >> 1)
(Mv0_x, mv0_y) can also be used as the translation vector (mv_x, mv_y).

Note that, as shown below, the expression AF4P_float can be transformed into the expression AF4P_integer.

mvij_x = mv_x + ev * xi-rv * yj
mvij_y = mv_y + rv * xi + ev * yj (Formula AF4P_float)
From the formula XYIJ, xi = BW / 2 + BW * i, yj = BH / 2 + BH * j, so these values are substituted into the formula AF4P_float,
mvij_x = mv_x + ev * (BW / 2 + BW * i)-rv * (BH / 2 + BH * j)
mvij_y = mv_y + rv * (BW / 2 + BW * i) + ev * (BH / 2 + BH * j)
When the above formula is transformed,
mvij_x = mv_x + (BW * ev-BH * rv) >> 1 + BW * ev * i-BH * rv * j
mvij_y = mv_y + (BW * rv + BH * ev) >> 1 + BW * rv * i + BH * ev * j
It becomes. Also, since BW * ev = evBW, BH * rv = rvBH, BW * rv = rvBW, BH * ev = evBH, if the above equation is further modified,
mvij_x = mv_x + (evBW-rvBH) >> 1 + evBW * i-rvBH * j
mvij_y = mv_y + (rvBW + evBH) >> 1 + rvBW * i + evBH * j (expression AF4P_integer)
It becomes.

If the width BW and height BH of the sub block are equal,
(evBW, rvBW) = (evBH, rvBH)
Therefore, it is not necessary to distinguish (evBW, rvBW) from (evBH, rvBH) in the above derivation formula. That is, it may be derived only from (evBW, rvBW).

mvij_x = mv_x + ((evBW-rvBW) >> 1) + evBW * i-rvBW * j
mvij_y = mv_y + ((rvBW + evBW) >> 1) + rvBW * i + evBW * j
Alternatively, it may be derived only from (evBH, rvBH).

mvij_x = mv_x + ((evBH + rvBH) >> 1) + evBH * i-rvBH * j
mvij_y = mv_y + ((rvBH + evBH) >> 1) + rvBH * i + evBH * j
FIG. 38 is a diagram illustrating an example in which the control point V0 of the target block (horizontal width W, height H) is located at the upper left vertex and is divided into sub-blocks having a width BW and a height BH. (W, H) and (BW, BH) correspond to the aforementioned (nPbW, nPbH) and (nSbW, nSbH).

The points of the subblock position (i, j) and the subblock coordinates (xi, yj) are the intersections of the broken line parallel to the x axis and the broken line parallel to the y axis in FIG. In FIG. 38, as an example, the point of the sub-block position (i, j) = (1, 1) and the sub-block coordinates (xi, jyj) = (x1, y1) with respect to the sub-block position (1, 1) = (1 * BW + BW / 2, 1 * BH + BH / 2) is shown.

In the case of 6-parameter affine, the motion vector (mvij_x, mvij_y) of the sub-block coordinates (xi, yj) from the affine parameters (ev1, rv1, ev2, rv2) is expressed by the following formula AF6P_float:
mvij_x = mv_x + ev1 * xi + rv2 * yj
mvij_y = mv_y + rv1 * xi + ev2 * yj (Formula AF6P_float)
And derived.

When the affine parameters (ev1, rv1, ev2, rv2) are integer affine parameters (ev1BW, rv1BH, ev2BW, ev2BH), the motion vector (mvij_x, mvij_y) of the subblock at the subblock position (i, j) ) Is derived by the following expression AF6P_integer.

mvij_x = mv_x + ((ev1BW + rv2BH) >> 1) + ev1BW * i + rv2BH * j
mvij_y = mv_y + ((rv1BW + ev2BH) >> 1) + rv1BW * i + ev2BH * j (expression AF6P_integer)
[Modification Example 1 of Subblock Motion Vector Derivation Processing]
In the above example, when the motion vector of the sub-block in the merge mode is derived, the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 limits the range of the motion vector difference between the control points. The present embodiment is not limited to this.

In this embodiment, when the affine prediction unit 30372 or the AMVP prediction parameter derivation unit 3032 derives the motion vector of the subblock, the size of the subblock may be limited. In the first modification and the second modification, the case where the affine prediction unit 30372 executes the process will be described. However, instead of the affine prediction unit 30372, the AMVP prediction parameter derivation unit 3032 may execute the process.

For example, when the difference between the motion vectors at the plurality of control points is larger than a predetermined value, the affine prediction unit 30372 has a sub size larger than that when the difference is equal to or smaller than the predetermined value. A block may be set and a motion vector may be derived.

Hereinafter, the process in which the affine prediction unit 30372 determines the size of the sub-block will be described with reference to FIG. FIG. 39 is a flowchart showing an outline of the sub-block size determination flow. As shown in FIG. 39, the sub-block size determination process by the affine prediction unit 30372 includes the following three steps (Step S321) to (Step S323).

(Step S321) Determination of Motion Vector Difference The affine prediction unit 30372 determines whether or not the vector difference between control points is large. For example, when the vector difference between the control point V0 and the control point V1 in FIG. 37 satisfies the following equation (eq3), the affine prediction unit 30372 calculates the vector difference between the control point V0 and the control point V1. You may determine with it being large. Similarly, the affine prediction unit 30372 may determine that the vector difference between the control point V0 and the control point V1 is not large when the following equation (eq3) is not satisfied.
| mv1_x-mv0_x | + | mv1_y-mv0_y |> M (eq3)
Note that M in the above expression represents a predetermined threshold value. The magnitude of the difference may be obtained as the sum of the absolute values of the differences as described above, or may be obtained as the sum of squares of the differences.

In addition, the vector difference between control points is not limited to the form discriminate | determined by the vector difference of control point V0 and control point V1. The vector difference between the control points may be determined by the vector difference between the control point V0 and the control point V2, as shown in the following equation (eq3 ′), or the following equation (eq3 ″): As shown, it may be determined by the vector difference between the control point V1 and the control point V2.
| mv2_x-mv0_x | + | mv2_y-mv0_y |> M (eq3 ')
| mv2_x-mv1_x | + | mv2_y-mv1_y |> M (eq3 '')
If the affine prediction unit 30372 determines that the vector difference between the control points is large (Y in step S321), the process proceeds to step S322. If the affine prediction unit 30372 determines that the vector difference between the control points is not large (N in step S321), the process proceeds to step S323.

(Step S322) Determination to Large Subblock Size When the affine prediction unit 30372 derives a motion vector of a subblock, the affine prediction unit 30372 determines the size of the subblock to be the size of a large subblock (large subblock size). For example, the affine prediction unit 30372 may determine the size of the sub-block to be a large sub-block size of 8 × 8. The large sub-block size only needs to be larger than the small sub-block size (small sub-block size) described later. For example, when the small subblock size is set to 4 × 4, the affine prediction unit 30372 may determine the size of the large subblock to be 8 × 4 or 4 × 8.

(Step S323) Determination to Small Subblock Size When the affine prediction unit 30372 derives a motion vector of a subblock, the size of the subblock is determined to be a small subblock size. For example, the affine prediction unit 30372 may determine the size of the subblock to be a 4 × 4 small subblock size. Further, the small sub-block size only needs to be smaller than the large sub-block size. For example, when the large sub-block size is set to a large size of 8 × 8, the affine prediction unit 30372 may determine the size of the small sub-block as 8 × 4 or 4 × 8.

Note that the affine prediction unit 30372 sets the combination of the large sub-block size and the small sub-block size (large sub-block size, small sub-block size), for example, (8 × 8, 4 × 4), (8 × 8, に 8 × 4) or (8 × 4, 4 × 4), etc.

After step S322 or step S323, the affine prediction unit 30372 ends the sub-block size determination process.

In the above example, the affine prediction unit 30372 determines that the vector difference between the controls is large when the expression (eq3) is satisfied, but the present embodiment is not limited to this. In the present embodiment, the affine prediction unit 30372 may determine that the vector difference between the control points is large when the following equation (eq4) is satisfied, or when the following equation (eq4 ′) is satisfied: It may be determined that the vector difference between the control points is large.

| evBW | + | rvBW |> M (eq4)
| evBH | + | rvBH |> M (eq4 ')
That is, when the affine parameter is large, the vector difference between the control points is large. That is, the expression (eq3) and the expression (eq4) are equivalent. Therefore, in step S321, the affine prediction unit 30372 may use equation (eq4) instead of equation (eq3) as an equation for determining whether or not the affine parameter is large.

Note that the AMVP prediction parameter derivation unit 3032 may determine the size of the sub-block in the same manner as the affine prediction unit 30372.

[Modification Example 2 of Subblock Motion Vector Derivation Processing]
Also, the affine prediction unit 30372 determines the aspect ratio of the sub-block used for deriving the motion vector according to the aspect ratio of the target block that is the target of motion vector prediction, instead of the above example. Also good. That is, the affine prediction unit 30372 may derive a motion vector of a sub-block having an aspect ratio corresponding to the aspect ratio of the target block.

(Determination of the aspect ratio of sub-blocks when the target block is horizontally long or vertically long)
For example, the affine prediction unit 30372 may determine that the aspect ratio of the sub-block is horizontally long when the target block is horizontally long, and may determine the aspect ratio of the sub-block vertically when the target block is vertically long.

As an example, in the target block having the width Wx and the height H shown in FIG. 37, when the horizontal width W of the target block is larger than the height H (W> H), the affine prediction unit 30372 BH may be determined as BW = 8 and BH = 4, respectively.

Further, when the height H of the target block is larger than the width W (H> W), the affine prediction unit 30372 may determine that BW and BH of the sub-block are BW = 4 and BH = 8, respectively. Good.

When the target block is horizontally long, the aspect ratio of the sub-block is horizontally long. When the target block is vertically long, the aspect ratio of the vertically long sub-block is vertically long, so that the sub-block having the optimum aspect ratio can be obtained.

(Determination of aspect ratio of sub-block when the width and height of the target block are equal 1)
Further, when the horizontal width and the height of the target block are equal, the affine prediction unit 30372 may determine the aspect ratio of the sub-block to be horizontal.

As an example, in the target block having the width Wx and the height H shown in FIG. 37, when the horizontal width W of the target block is equal to the height H (W = H), the affine prediction unit 30372 May be determined as BW = 8 and BH = 4, respectively.

If the change in the horizontal motion vector between the sub-blocks is larger than the change in the vertical motion vector, the cache is more likely to hit when the sub-block is horizontally long than vertically. Therefore, according to the above configuration, the memory bandwidth can be made smaller when the horizontal width and height of the sub-block are set equal when the horizontal width and height of the target block are equal.

(Determining the aspect ratio of the sub-block when the width and height of the target block are equal 2)
Conversely, when the horizontal width and height of the target block are equal, the affine prediction unit 30372 may determine the aspect ratio of the sub-block to be vertically long.

As an example, in the target block having the width Wx and the height H shown in FIG. 37, when the horizontal width W of the target block is equal to the height H (W = H), the affine prediction unit 30372 May be determined as BW = 4 and BH = 8, respectively.

From the viewpoint of reducing the memory bandwidth, it is better to make the sub-block horizontally long when the width and height of the target block are equal. However, in general, a vertically long boundary (edge) is detected more frequently than a horizontally long edge. Therefore, when the width and height of the target block are the same, when the width and height of the sub-block are equal, and when the sub-block is vertically long, the shape of the object is better. It is easy to match.

Note that the AMVP prediction parameter derivation unit 3032 may also determine the aspect ratio of the sub-block in the same manner as the affine prediction unit 30372.

(Matching motion deriving unit 30373)
The matching motion deriving unit 30373 derives a motion vector spMvLX of a block or sub-block constituting the PU by performing a matching process of bilateral matching or template matching. FIG. 36 is a diagram for explaining (a) bilateral matching and (b) template matching. The matching motion derivation mode is selected as one merge candidate (matching candidate) in the merge mode.

The matching motion deriving unit 30373 derives a motion vector by matching regions in a plurality of reference images, assuming that the object moves at a constant velocity. In bilateral matching, it is assumed that a certain object passes through a certain region of the reference image A, a target PU of the target picture Cur_Pic, and a certain region of the reference image B with constant velocity motion, and between the reference images A and B. The motion vector of the target PU is derived from the matching. In template matching, it is assumed that the motion vector of the target PU adjacent region and the target PU are equal, and motion is performed by matching motion compensated images between the target PU adjacent region Temp_Cur (template) and the reference block adjacent region Temp_L0 on the reference picture. Derive a vector. The matching motion derivation unit divides the target PU into a plurality of sub-blocks, and performs bilateral matching or template matching (to be described later) in units of the divided sub-blocks, whereby sub-block motion vectors spMvLX [xi] [yj] j (xi = XPb + nSbW * i, yj = yPb + nSbH * j, i = 0, 1, 2, ..., nPbW / nSbW-1, j = 0, 1, 2, ..., nPbH / nSbH-1 ) Is derived.

As shown in FIG. 36 (a), in bilateral matching, two reference images are referred to in order to derive a motion vector of the sub-block Cur_block in the current picture Cur_Pic. More specifically, first, when the coordinates of the sub-block Cur_block are expressed as (xCur, yCur), an area in the reference image Ref0 (referred to as reference picture A) specified by the reference picture index refIdxL0,
(XPos0, yPos0) = (xCur + mv0_x, yCur + mv0_y)
Block_A having the upper left coordinates (xPos0, yPos0) specified by, and a reference image Ref1 (referred to as reference picture B) specified by a reference picture index refIdxL1, for example,
(XPos1, yPos1) = (xCur + mv1_x, xCur + mv1_y) = (xCur-mv0_x * TD1 / TD0, yCur-mv0_y * TD1 / TD0)
Block_B having the upper left coordinates (xPos1, yPos1) specified by is set. Here, TD0 and TD1 represent the inter-picture distance between the target picture Cur_Pic and the reference picture A and the inter-picture distance between the target picture Cur_Pic and the reference picture B, respectively, as shown in FIG. ing.

Next, (mv0_x, mv0_y) is determined so that the matching cost between Block_A and Block_B is minimized. The (mv0_x, mv0_y) derived in this way is a motion vector assigned to the sub-block.

On the other hand, (b) of FIG. 36 is a figure for demonstrating a template matching (Template | matching) among the said matching processes.

36 (b), in template matching, one reference picture is referred to at a time in order to derive a motion vector of the sub-block Cur_block in the target picture Cur_Pic.

More specifically, first, for example, an area in a reference image Ref0 (referred to as reference picture A) designated by a reference picture index refIdxL0,
(XPos0, yPos0) = (xCur + mv0_x, yCur + mv0_y)
The reference block Block_A having the upper left coordinates (xPos, yPos) specified by is specified. Here, (xCur, yCur) is the upper left coordinate of the sub-block Cur_block.

Next, a template region Temp_Cur adjacent to the sub-block Cur_block in the target picture Cur_Pic and a template region Temp_L0 adjacent to the Block_A in the reference picture A are set. In the example shown in (b) of FIG. 36, the template region Temp_Cur is composed of a region adjacent to the upper side of the sub-block Cur_block and a region adjacent to the left side of the sub-block Cur_block. The template area Temp_L0 is composed of an area adjacent to the upper side of Block_A and an area adjacent to the left side of Block_A.

Next, it is determined that the matching cost between Temp_Cur and TempL0 is minimum (mv0_x, mv0_y), and the motion vector spMvL0 assigned to the sub-block is obtained.

In the template matching, the same processing may be performed on the reference image Ref1 different from the reference image Ref0. In this case, an area in the reference image Ref1 (referred to as reference picture A) designated by the reference picture index refIdxL1,
(XPos1, yPos1) = (xCur + mv1_x, yCur + mv1_y)
The reference block Block_A having the upper left coordinates (xPos1, yPos1) specified by is specified, and the template region Temp_L1 adjacent to Block_A in the reference picture A is set. Finally, it is determined that the matching cost between Temp_Cur and TempL1 is minimum (mv1_x, mv1_y), and the motion vector spMvL1 assigned to the sub-block is obtained.

Also, template matching may be performed on two reference images Ref0 and Ref1. In this case, the matching of one reference image Ref0 and the matching of one reference image Ref1 described above are sequentially performed.

<Motion vector decoding process>
Below, with reference to FIG. 32, the motion vector decoding process which concerns on this embodiment is demonstrated concretely.

As is clear from the above description, the motion vector decoding process according to the present embodiment includes a process of decoding syntax elements related to inter prediction (also referred to as motion syntax decoding process) and a process of deriving a motion vector ( Motion vector derivation process).

(Motion syntax decoding process)
FIG. 32 is a flowchart showing the flow of inter prediction syntax decoding processing performed by the inter prediction parameter decoding control unit 3031. In the following description in the description of FIG. 32, each process is performed by the inter prediction parameter decoding control unit 3031 unless otherwise specified.

First, in step S301, the merge flag merge_flag is decoded, and in step S302,
merge_flag! = 0 (whether merge_flag is not 0)
Is judged.

If merge_flag! = 0 is true (Y in step S302), the merge index merge_idx is decoded in step S303, and the motion vector derivation process (step S311) in the merge mode is executed.

If merge_flag! = 0 is false (N in step S302), the inter prediction identifier inter_pred_idc is decoded in step S304.

When inter_pred_idc is other than PRED_L1 (PRED_L0 or PRED_BI), the reference picture index refIdxL0, the difference vector parameter mvdL0, and the prediction vector index mvp_L0_idx are decoded in steps S305, S306, and S307, respectively.

When inter_pred_idc is other than PRED_L0 (PRED_L1 or PRED_BI), the reference picture index refIdxL1, the difference vector parameter mvdL1 and the prediction vector index mvp_L1_idx are decoded in S308, S309, and S310. Subsequently, a motion vector derivation process (step S312) in the AMVP mode is executed.

(Configuration of image encoding device)
Next, the configuration of the image encoding device 11b according to the present embodiment will be described. The image encoding device 11b includes a prediction parameter encoding unit 111b (not shown) instead of the prediction parameter encoding unit 111 in the first embodiment. Further, the prediction parameter encoding unit 111b includes an inter prediction parameter encoding unit (motion vector deriving unit) 112b instead of the inter prediction parameter encoding unit 112 in the first embodiment. Except for this point, the image encoding device 11b has the same configuration as the image encoding device 11 according to the first embodiment.

(Inter prediction parameter encoding unit 112b)
FIG. 33 is a block diagram illustrating a configuration of the inter prediction parameter encoding unit 112b in the prediction parameter encoding unit 111b of the image encoding device 11b according to the third embodiment. As illustrated in FIG. 33, the inter prediction parameter encoding unit 112b includes a sub-block prediction parameter deriving unit 1125b instead of the sub-block prediction parameter deriving unit 1125 in the first embodiment. Except this point, the inter prediction parameter encoding unit 112b has the same configuration as the inter prediction parameter encoding unit 112 in the first embodiment. As illustrated in FIG. 33, the inter prediction parameter encoding unit 112b includes at least one of a spatiotemporal sub-block prediction unit 11251, an affine prediction unit 11252, and a matching motion derivation unit 11253. The spatio-temporal sub-block prediction unit 11251, the affine prediction unit 11252, and the matching motion derivation unit 11253 have the same configurations as the spatio-temporal sub-block prediction unit 30371, the affine prediction unit 30372, and the matching motion derivation unit 30373, respectively. The description in is omitted.

[Embodiment 4]
The function of the inter prediction parameter decoding unit in the prediction parameter decoding unit of the image decoding device is not limited to the function of the inter prediction parameter decoding unit 303b in the prediction parameter decoding unit 302b of the image decoding device 31b of the image transmission system 1b according to Embodiment 3. . The function of the inter prediction parameter decoding unit is performed in the prediction parameter decoding unit 302c of the image decoding device 31c of the image transmission system 1c (not shown) according to Embodiment 4 instead of or in addition to the inter prediction parameter decoding unit 303b. The function of the inter prediction parameter decoding unit (motion vector deriving unit) 303c may be provided.

Embodiment 4 will be described with reference to FIGS. For convenience of explanation, members having the same functions as those described in the third embodiment are denoted by the same reference numerals and description thereof is omitted.

(Configuration of image transmission system)
The image transmission system 1c includes an image encoding device (motion vector deriving device) 11c and an image decoding device (motion vector deriving device) 31c instead of the image encoding device 11b and the image decoding device 31b in the third embodiment. .

(Configuration of image decoding device)
FIG. 41 is a block diagram showing a main configuration of the image decoding device 31c according to the present embodiment. As illustrated in FIG. 41, the image decoding device 31c according to the present embodiment includes a prediction parameter decoding unit 302c instead of the prediction parameter decoding unit 302b according to the third embodiment.

Also, as shown in FIG. 41, the prediction parameter decoding unit 302c includes an inter prediction parameter decoding unit 303c instead of the inter prediction parameter decoding unit 303b in the third embodiment. Except for this point, the prediction parameter decoding unit 302c has the same configuration as the prediction parameter decoding unit 302b in the third embodiment.

(Inter prediction parameter decoding unit 303c)
FIG. 43 is a block diagram illustrating a configuration of an inter prediction parameter decoding unit 303c included in the prediction parameter decoding unit 302c according to the second embodiment.

43, the inter prediction parameter decoding unit 303c includes a subblock prediction parameter derivation unit 3037c instead of the subblock prediction parameter derivation unit 3037b in the third embodiment.

The sub-block prediction parameter derivation unit 3037c includes a matching motion derivation unit (first motion vector search unit, second motion vector search unit) 30373c instead of the matching motion derivation unit 30373 in the third embodiment.

(Matching motion deriving unit 30373c)
The matching motion derivation unit 30373c includes a first motion vector search unit 303731 and a second motion vector search unit 303732.

The first motion vector search unit 303731 searches for a motion vector for each prediction block by matching processing. The first motion vector search unit 303731 searches for a motion vector by performing a local search after performing an initial vector search for a prediction block.

The second motion vector search unit 303732 refers to the motion vector selected by the first motion vector search unit 303731, and searches for a motion vector by a matching process for each of a plurality of sub-blocks included in the prediction block. To do. The second motion vector search unit 303732 searches for a motion vector by performing a local search after performing an initial vector search for a sub-block.

The matching motion deriving unit 30373c prohibits a search in an oblique direction in at least one of the local search by the first motion vector search unit 303731 and the local search by the second motion vector search unit 303732. That is, the matching motion deriving unit 30373c sets the search direction of the candidate vector in the horizontal direction or the vertical direction centering on the initial vector in at least one of the first motion vector search unit 303731 and the second motion vector search unit 303732. Limit to crab.

When the search direction of the motion vector is limited to the horizontal direction or the vertical direction, the image decoding device 31a may read an image by reusing an already read image. Therefore, according to the above-described configuration, it is possible to generate a predicted image while reducing the memory band, compared to a case where search for a motion vector in an oblique direction is not prohibited.

Details of the motion vector search by the matching motion deriving unit 30373c will be described later.

(Determination of motion prediction mode)
FIG. 44 is a flowchart showing an outline of the motion prediction mode determination flow. The motion prediction mode determination flow is executed by the inter prediction parameter decoding unit 303c. The motion prediction mode is a mode for determining a method for deriving a motion vector used for motion compensation prediction.

As shown in FIG. 44, in the motion prediction mode determination flow, first, the inter prediction parameter decoding control unit 3031 determines whether or not the mode is the merge mode (step S1501), and if not the merge mode (N in step S1501). AMVP mode. On the other hand, if it is determined that the mode is the merge mode (Y in step S1501), it is determined whether the mode is the matching mode (step S1502). If it is determined that the mode is the matching mode (Y in step S1502), the mode is the matching mode. If it is determined that the mode is not the matching mode (N in step S1502), the mode is the merge mode.

Next, the details of the motion prediction mode determination flow will be described with reference to FIG. FIG. 45 is a sequence diagram showing the flow of the motion prediction mode determination flow.

First, the inter prediction parameter decoding control unit 3031 decodes the merge flag merge_flag in step S401, and determines merge_flag == 1 in step S402.

If merge_flag == 1 is true (Y in step S402), the parameter fruc_mode_idx indicating the matching mode is decoded in step S403, and fruc_mode_idx! = 0 is determined in step S404.

When fruc_mode_idx! = 0 is true (Y in step S404), the matching mode (also called FRUC (Frame Rate Up Up Conversion) merge mode) is selected as the motion vector derivation method. In step S205, the matching motion deriving unit 30373c derives a pattern match vector by bilateral matching when fruc_mode_idx is MODE_BM (eg, 1). In step S205, if fruc_mode_idx is MODE_TM (for example, 2), a pattern match vector is derived by template matching.

In the above, the parameter fruc_mode_idx indicating the matching mode serves as both a flag indicating whether the matching mode is used and a parameter indicating the matching method in the matching mode, but the present invention is not limited to this. That is, instead of the parameter fruc_mode_idx indicating the matching mode, a flag fruc_merge_flag indicating whether to use the matching mode and a parameter fruc_merge_param indicating the matching method may be used. In this case, the determination of fruc_mode_idx! = 0 is equivalent to fruc_merge_flag! = 0, and the determination of fruc_mode_idx == 1 is equivalent to fruc_merge_param == 0. Note that fruc_merge_param is decoded when fruc_merge_flag is 1.

In step S404, if fruc_mode_idx! = 0 is false, the merge prediction parameter deriving unit 3036 decodes the merge index merge_idx in step S411. Subsequently, in step S412, merge candidates mergeCand are derived, and in step S413, a motion vector mvLX is derived by the following equation.

mvLX = mergeCand [merge_idx]
On the other hand, if merge_flag == 1 is false (N in step S402) in step S402, the AMVP mode is selected. More specifically, the AMVP prediction parameter derivation unit 3032 decodes the difference vector mvdLX in step S421, and the prediction vector index mvp_LX_idx is decoded in step S422. Further, in step S423, a prediction vector candidate pmvCand is derived. Subsequently, in step S424, a motion vector mvLX is derived from the following equation.

mvLX = pmvCand [mvp_LX_idx] + mvdLX
(Motion vector derivation process by matching process)
Hereinafter, a flow of motion vector derivation (pattern match vector derivation) processing in the matching mode will be described with reference to FIG. FIG. 46 is a flowchart showing the flow of the pattern match vector derivation process.

FIG. 46 shows the details of the process in step S405 in the sequence diagram shown in FIG. The processing shown in FIG. 46 is executed by the matching motion derivation unit 30373c.

Of steps shown in FIG. 46, steps S4051 to S4054 are block searches executed at the block level. That is, a motion vector is derived for the entire block (CU or PU) using pattern matching.

Steps S4055 to S4060 are sub-block searches executed at the sub-block level constituting the block, and a motion vector is derived for each sub-block constituting the block using pattern matching. In the sub-block search, for example, a motion vector is derived every 8 × 8 or 4 × 4 unit.

First, in step S4051, fruc_mode_idx == MODE_TM is determined. If fruc_mode_idx == MODE_TM is true (Y in S4051), a template for performing template matching is acquired in step S4052. More specifically, a template for template matching is acquired from the peripheral area of the block, and the process proceeds to step S4053. Moreover, also when fruc_mode_idx == MODE_TM is false (N in step S4051), the process proceeds to step S4053.

In step S4053, the first motion vector search unit 303731 in the matching motion derivation unit 30373c derives a block-level initial vector in the target block (initial vector search). The initial vector is a motion vector that serves as a search base. From a limited motion vector candidate (such as a spatial merge candidate, a temporal merge candidate, a combined merge candidate, a zero vector, and an ATMVP vector of the target block), a matching cost is determined. A vector that minimizes is derived as an initial vector. As described above, the initial vector candidate is a motion vector derived based on the motion vector of the processed reference point. That is, it is derived from motion vectors of already processed points (processed motion vectors) like various merge candidates, scales of processed motion vectors, or multiple processed motion vectors like ATMVP vectors. Use representative values. The ATMVP vector is a vector derived from the average (or weighted average, median) of the motion vector around the target block and the motion vector of the reference image.

In step S4054, the first motion vector search unit 303731 in the matching motion derivation unit 30373c performs a block level local search (local search) in the target block. In the local search, in step S4053, a local region centered on the derived initial vector is further searched, a vector having the lowest matching cost is searched, and a final block-level motion vector of the target block is obtained. The local search may be a step search, raster search, or spiral search. Details of the local search will be described later.

Subsequently, the following processing is performed for each sub-block included in the target block (steps S4055 to S4060).

First, in step S4056, fruc_mode_idx == MODE_TM is determined. If fruc_merge_idx == MODE_TM is true (Y in step S4056), a template for performing template matching is acquired in step S4057. More specifically, a template for template matching is acquired from the peripheral area of the sub-block. Specifically, a template is acquired from the upper adjacent region or the left adjacent region of the target sub block. Then, the process proceeds to step S4058. Also in the case where fruc_ mode_idx == MODE_TM is false (N in step S4056), the process proceeds to step S4058.

In step S4058, the second motion vector search unit 303732 in the matching motion derivation unit 30373c derives an initial vector of the sub-block in the target block (initial vector search). Specifically, vector candidates (motion vector derived in step S4054, zero vector, center collocated vector of the subblock, lower right collocated vector of the subblock, ATMVP vector of the subblock, upper adjacent vector of the subblock And the vector having the lowest matching cost among the left adjacent vectors of the sub-block and the like is set as the initial vector of the sub-block. Note that the vector candidates used for the initial vector search of the sub-block are not limited to the vectors described above.

Next, in step S4059, a local search centering on the initial vector of the sub-block selected in step S4058 is performed. Then, the matching cost of vector candidates near the initial vector of the sub-block is derived, and the minimum vector is derived as the sub-block motion vector.

Then, when the process is completed for all the sub-blocks included in the target block, the pattern match vector derivation process ends.

In both initial vector search and local search, when fruc_mode_idx is MODE_BM, the matching cost is derived by bilateral matching. When fruc_mode_idx is MODE_TM, a matching cost is derived by template matching.

The matching motion derivation unit 30373c prohibits a search in an oblique direction in at least one of the local search in step S4054 and the local search in step S4059. Details will be described later.

(Specific example of local search)
Next, a specific example of local search will be described with reference to FIG. FIG. 47 is a diagram for explaining a motion search pattern. Note that the number of steps (stepIter, maximum number of rounds) indicating how many times the method used for motion search (stepMethod) is repeated is set to a predetermined value. As will be described later, the maximum round number stepIterSubPU at the sub-block level may be less than the maximum round number stepIterPU at the block level.

The matching motion deriving unit 30373c considers the search candidate point that gives the smallest matching cost among the search candidate points evaluated for the matching cost in the motion search as the optimal search point, and selects the motion vector bestMV of the search candidate point. To do. Examples of functions used for deriving the matching cost include SAD (Sum of Absolute Difference, sum of absolute value errors), SATD (Hadamard transform absolute value error sum), SSD (Sum of Square difference), and the like. .

The local search for motion vectors performed by the matching motion deriving unit 30373c is not limited to this. For example, the local search includes motion search algorithms such as diamond search (stepMethod = DIAMOND), cross search (stepMethod = CROSS), and raster search (raster type search: stepMethod = RASTER).

<Specific example of step search>
First, diamond search and cross search will be described as an example of step search. FIG. 47A is a diagram showing a motion search pattern when diamond search is applied. FIG. 47B shows a motion search pattern when a cross search is applied.

In step search, a search candidate point is set around an initial vector (search start point), a matching cost for the set search candidate point is derived and evaluated, and a search candidate point that provides an optimal matching cost is selected. I do. This process is called “step round process”. In the step search, this “step round process” is repeatedly executed. In each step round process, the search round number numIter is incremented by one.

In FIG. 47, the initial vector startMV at each number of searches is indicated by a white diamond. Also, the optimal vector bestMV in each search round is indicated by a black diamond. In addition, search candidate points at each number of searches are indicated by black circles. In addition, points that have already been searched at each number of searches are indicated by white circles.

The matching motion deriving unit 30373c initializes the search round numIter to 0 before starting the search. Then, at the start time of each search round, the matching cost of the search start point is set to the minimum cost minCost, and the initial value (−1) is set to the optimal candidate index bestIdx.

minCost = mcost (startMV)
bestIdx = -1
Here, mcost (X) is a function for deriving a matching cost in the search vector X.

The matching motion derivation unit 30373c selects and evaluates search candidate points centered on the search start point in each search round. Specifically, for each search candidate index Idx, the matching motion derivation unit 30373c adds the value (offsetCand [Idx]) of the offset candidate (offsetCand) to the coordinate (position) startMV of the search start point. Select point coordinates.

Hereinafter, an example in which the first motion vector search unit 303731 in the matching motion deriving unit 30373c performs a diamond search in the search range of 7 × 5 pixels illustrated in FIG.

In the first search (numIter は = 0), the first motion vector search unit 303731 sequentially searches for eight points arranged in a diamond shape around the search start point P0 (in this case, (( a) Select as first stage points 0 to 7). Subsequently, the first motion vector search unit 303731 evaluates the matching cost of each search candidate point. Specifically, for Idxand = nDirectStart..nDirectEnd (here nDirectStart = 0, nDirectEnd = 7), motion vector candidates candMV are sequentially derived by the following formula, and the matching cost in each candMV is evaluated.

candMV = startMV + offsetCand [Idx]
Here, offsetCand [x] is an offset candidate that is added to the coordinates of the search start point in order to set the search candidate point.

In the diamond search, when the search in the oblique direction is not prohibited, the first motion vector search unit 303731
offsetCand [8] = ((0, 2), (1, 1), (2, 0), (1, -1), (0, -2), (-1, -1), (-2, 0), (-1, 1)} is used.

On the other hand, when the search in the oblique direction is prohibited, the first motion vector search unit 303731
offsetCand [4] = {(0, 2), (2, 0), (0, -2), (-2, 0)} may be used. In this case, the first motion vector search unit 303731 excludes

points

1, 3, 5, and 7 from the search candidate points among the points 0 to 7 in FIG. 47A, and sets the search candidate points to 0, 2 4 and 6.

The first motion vector search unit 303731 has a matching cost candCost (candCost = mcost (candMV [Idx])) of a search candidate point candMV [Idx] of a search candidate index Idx that is less than the minimum cost minCost (candCost < minCost), the optimal search candidate index bestIdx is updated to Idx, and the optimal cost minCost and the optimal vector bestMV are updated as follows.

bestIdx = Idx
minCost = candCost
bestMV = candMV [Idx]
When the processing of all search candidate points is completed, it is determined whether or not the optimal vector bestMV has been updated. If there is an update, the next step round process is performed. If there is no update, the optimal vector at this point is determined. BestMV is selected as a motion vector derived by step search.

When performing the next step round process, the search candidate point indicated by the optimal vector bestMV is used as the search start point of the next round.

For example, in FIG. 47A, the first motion vector search unit 303731 selects the point 2 shown in the first round (numIter = 0), and uses this as the search start point (initial vector startMV) of the next round. Set.

startMV = bestMV (here P (1))
Whether or not the optimal vector bestMV has been updated is determined based on whether or not the optimal vector bestMV is different from the search start point, and whether or not bestIdx is updated to a value other than the initial value (−1), or , MinCost can also be determined based on whether or not the value is updated to a value other than the initial cost of the starting point. Note that the search start index nDirectStart and the search end index nDirectEnd used in the next round may be determined by the following formulas depending on the position of the optimal vector bestMV (optimum candidate index bestIdx). This makes it possible to search efficiently without searching again for search points that have already been searched.

nStep = 2-(bestIdx & 1)
nDirectStart = bestIdx-nStep
nDirectEnd = bestIdx + nStep
Next, as shown in the second row of FIG. 47A, in the second search (numIter = 1), the first motion vector search unit 303731 optimizes the first search (numIter = 0). The search candidate point 2 is set as the initial vector startMV (search start point P1) in the current search, and is a plurality of points arranged in a diamond shape with the search start point P1 as the center, and is still selected as the search candidate point. Any of the points that are not used is set as a search candidate point.

Hereinafter, as an example, a case where the first motion vector search unit 303731 does not prohibit a search in an oblique direction will be described. In this case, the first motion vector search unit 303731 uses points 0 to 4 as search candidate points as shown in the second row of FIG. The first motion vector search unit 303731 sequentially selects these points and evaluates the matching cost. That is, the search candidate point indicated by Idx = nDirectStart..nDirectEnd (here nDirectStart = 0, nDirectEnd = 4) is evaluated.

Subsequently, as shown in the third row of FIG. 47 (a), in the third search (numIter = 2), the first motion vector search unit 303731 selects the optimum for the second search (numIter = 1). Point 1 that was a search candidate is set as an initial vector startMV (search start point P2) in the current search. The first motion vector search unit 303731 is a plurality of points arranged in a diamond shape with the search start point P2 as the center, and is not yet selected as a search candidate point but is present within the search range. One of them is set as a search candidate point.

Hereinafter, as an example, a case where the first motion vector search unit 303731 does not prohibit a search in an oblique direction will be described. In this case, the first motion vector search unit 303731 selects points 0 to 2 in the third row in FIG. 47A as search candidate points (that is, search candidate points indicated by nDirectStart = 0 and nDirectEnd = 2). evaluate).

As shown in the third row of FIG. 47A, when the matching cost of the search candidate point evaluated in the third search (numIterI = 2) is equal to or higher than the cost of the search start point P2, the optimal vector bestMV is Not updated. If the optimum vector bestMV has not been updated, one step search process (diamond search) ends here.

Note that, similarly to the first search, the first motion vector search unit 303731 may also prohibit a search in an oblique direction when selecting a search candidate point in the second search and the third search.

Also, the first motion vector search unit 303731 may prohibit a search in an oblique direction for one or both of the second search and the third search. In this case, the first motion vector search unit 303731 may determine whether or not to prohibit the search in the oblique direction for each search.

When the first motion vector search unit 303731 prohibits the search in the oblique direction, the first motion vector search unit 303731 limits the offset candidates by the same method as the method in which the search in the oblique direction is prohibited in the first search described above, and sets the search candidate points. limit.

The first motion vector search unit 303731 may perform another step search. In the cross search, the following values are used as offset candidates (offsetCand).

offsetCand [4] = ((0, 1), (1, 0), (0, -1), (-1, 0)
FIG. 47B shows an example in which the cross search is performed after the diamond search. Here, the first motion vector search unit 303731 uses the search start point (the search start point P2 in the third row in FIG. 47A) as the center, and points at the top, bottom, left, and right (cross) positions as search candidate points. Select in turn. For example, it is assumed that the search candidate points (including the search start point P2) that give the smallest matching cost among the search candidate points 0 to 3 on the upper, lower, left, and right sides of the search start point P2 are the points to the right of P2. In this case, the first motion vector search unit 303731 selects a point on the right side of P2 as the end point of the optimal vector bestMV for the prediction block PU.

Note that the second motion vector search unit 303732 may also prohibit the search in the oblique direction, like the first motion vector search unit 303731.

<Step search change example 1>
In the above-described example, the case where the search in the oblique direction is prohibited in the local search (block search) by the first motion vector search unit 303731 and the second motion vector search unit 303732 is described. However, the present embodiment is not limited to this. In the present embodiment, the search in the oblique direction is not prohibited in the block search by the first motion vector search unit 303731, and the search in the oblique direction is performed in the local search (sub-block search) by the second motion vector search unit 303732. It may be prohibited.

For example, when performing a diamond search as shown in FIG. 47A, the first motion vector search unit 303731 arranges in a diamond shape with the search start point as the center, and selects all points that have not yet been selected as search candidate points. It is set as a search candidate point.

On the other hand, the second motion vector search unit 303732 is arranged in a diamond shape with the search start point as the center, and searches for points that are oblique to the search start point among the points not yet selected as search candidate points. Exclude from candidate points.

For example, when searching for a motion vector centering on P0, the second motion vector search unit 303732 excludes points that are diagonally located with respect to P0 from the search candidate points among the points arranged in a diamond shape. Limit search candidate points. For example, the second motion vector search unit 307732 has, as offset candidates,
offsetCand [4] = {(0, 2), (2, 0), (0, -2), (-2, 0)} may be used.

The sub-block search is more detailed than the block search, so the amount of processing tends to increase. On the other hand, by providing the above configuration, the second motion vector search unit 303732 can reduce the processing amount of the sub-block search.

<Example of step search change 2>
Also, the first motion vector search unit 303731 may prohibit a search in an oblique direction in the step round process after a predetermined number of times among these local searches.

For example, the first motion vector search unit 303731 does not prohibit the search in the diagonal direction in the step round processing up to the Mth time (M is a natural number), and prohibits the search in the diagonal direction in the step round processing after the (M + 1) th time. Also good.

In this way, until the Mth time, the search in the oblique direction is allowed and the search in the oblique direction is prohibited in the step round process after approaching the optimum search point while searching for a suitable search point with a low matching cost. , Can reduce the processing amount.

Note that, similarly to the first motion vector search unit 303731, the second motion vector search unit 303732 may also prohibit a search in an oblique direction in a step round process after a predetermined number of times.

<Step search change example 3>
Further, the first motion vector search unit 303731 may prohibit a search in an oblique direction according to a search candidate point in a block search.

For example, the first motion vector search unit 303731 may not prohibit the search in the oblique direction when the search candidate point is one pixel position (full-pel position). On the other hand, when the search candidate point is a decimal pixel position, the first motion vector search unit 303731 may prohibit the search in an oblique direction and limit the search direction to up, down, left, and right. A specific example is shown below.

The first motion vector search unit 303731 uses an offset candidate offsetCand [Idx] including all search candidate points as an offset candidate.

Also, the first motion vector search unit 303731 uses the offset candidate offsetCand [Idx] to derive a motion vector candidate from the following equation.

candMV = startMV + offsetCand [Idx]
Here, as shown in the first row of FIG. 47A, when the search candidate point is the full-pel position, the first motion vector search unit 303731 moves the motion compensated image (predicted image) of the motion vector candidate candMV. And calculate the matching cost.

On the other hand, when the search candidate point is a decimal pixel position (for example, a position that is not an intersection of each pixel in the first row in FIG. 47A), the first motion vector search unit 303731 Only when the search start point startMV is in the vertical and horizontal directions, the matching cost is calculated.

When the search candidate point is oblique to the search start point startMV, the first motion vector search unit 303731 does not generate a motion compensated image (predicted image) of the motion vector candidate candMV. In this case, the first motion vector search unit 303731 may derive a predetermined large value (for example, 2 ¹⁵ −1) as the matching cost, or may not derive the matching cost. Thereby, the processing amount can be reduced.

Note that, with respect to the second motion vector search unit 303732, the matching cost may be calculated in the sub-block search as in the first motion vector search unit 303731.

<Example 4 of step search change>
In addition, in the block search, the first motion vector search unit 303731 may prohibit the search for the motion vector when at least a part of the reference block indicated by the searched motion vector is outside the screen of the reference image. In other words, the first motion vector search unit 303731 may not generate a predicted image when at least a part of the reference block indicated by the motion vector is outside the screen of the reference image.

Hereinafter, the predicted image generation determination process performed by the first motion vector search unit 303731 will be described with reference to FIG. FIG. 48 is a flowchart showing an outline of a flow for determining whether or not to generate a predicted image. As shown in FIG. 48, the predicted image generation determination process by the first motion vector search unit 303731 includes the following three steps (Step S431) to (Step S433).

(Step S431)
In step S431, the first motion vector search unit 303731 determines whether at least a part of the reference block is outside the screen. If first motion vector search section 303731 determines that at least a part of the reference block is outside the screen (Y in step S431), the first motion vector search unit 303731 proceeds to step S432. If the first motion vector search unit 303731 determines that at least a part of the reference block is not outside the screen (N in step S431), the process proceeds to step S433.

Here, the first motion vector search unit 303731 may determine that the reference block is outside the screen of the reference image when at least one of the following formulas B-1 to B-4 is satisfied.

The following equation B-1 is an equation used when determining whether or not the left end of the reference block indicated by the motion vector is outside the screen of the reference image. Expression B-2 is an expression used when determining whether the right end of the reference block is outside the screen of the reference image. Expression B-3 is an expression used when determining whether or not the upper end of the reference block is outside the screen of the reference image. Expression B-4 is an expression used when determining whether or not the lower end of the reference block is outside the screen of the reference image.

xInt-NTAPS / 2 + 1 <0 (Formula B-1)
xInt + BLKW + NTAPS / 2-1> pic_width-1 (Formula B-2)
yInt-NTAPS / 2 + 1 <0 (Formula B-3)
yInt + BLKH + NTAPS / 2-1> pic_height-1 (Formula B-4)
Here, xInt and yInt in the equations B-1 to B-4 are shifted by the motion vector (mv_x, mv_y) from the upper left block coordinates (xPb, yPb) when the motion vector accuracy is 1 / M pel accuracy. Points to the integer position (xInt, yInt) of the reference image corresponding to the in-block coordinates (x, y).

Also, NTAPS in Equations B-1 to B-4 represents the number of filter taps of the motion compensation filter. BLKW and BLKH in the formulas B-1 to B-4 indicate the horizontal width and height of the corresponding block, respectively. In addition, pic_width and pic_height in Expressions B-1 to B-4 indicate the horizontal width and height of the reference image, respectively.

(Step S432)
In step S432, the first motion vector search unit 303731 determines not to generate a motion compensated image (predicted image) of the motion vector candMV. In this case, the first motion vector search unit 303731 may derive a predetermined large value (for example, 2 ¹⁵ −1) as the matching cost, or may not derive the matching cost. This eliminates the need for padding processing that occurs when the reference block is outside the screen of the reference image, thereby reducing the amount of processing.

(Step S433)
In step S433, the first motion vector search unit 303731 determines to generate a motion compensated image (predicted image) and calculate a matching cost.

Note that the second motion vector search unit 303732 may also perform the above-described prediction image generation process in the sub-block search, similarly to the first motion vector search unit 303731.

<Example 5 of step search change>
The first motion vector search unit 303731 does not determine whether or not the reference block is outside the screen of the reference image in Step S431 of Modification 4 described above, but the reference block is within a certain range from the boundary of the reference image. You may determine whether it is outside.

Hereinafter, the predicted image generation determination process by the first motion vector search unit 303731 will be described with reference to FIG. FIG. 49 is a flowchart showing an outline of a flow for determining whether or not to generate a predicted image. As shown in FIG. 49, the predicted image generation determination process by the first motion vector search unit 303731 includes three steps (step S441) to (step S443). Steps S442 and S443 are the same as steps S432 and S433 described above, respectively, and thus description thereof is omitted.

(Step S441)
In step S441, the first motion vector search unit 303731 determines whether at least a part of the reference block is outside a certain range from the boundary of the reference image.

If the first motion vector search unit 303731 determines that at least a part of the reference block is outside the screen (Y in step S441), the process proceeds to step S442. If the first motion vector search unit 303731 determines that at least part of the reference block is not outside the screen (N in step S441), the process proceeds to step S443.

Here, in the determination of “at least a part of the reference block is out of a certain range from the screen boundary” in step S441 in FIG. 49, the first motion vector search unit 303731 uses the following expressions C-1 to C-4: When at least one of the conditions is satisfied, it may be determined that at least a part of the reference block is outside a certain range from the boundary of the reference image.

The following expression C-1 is an expression used when determining whether or not the left end of the reference block indicated by the motion vector is outside a certain range from the boundary of the reference image. Expression C-2 is an expression used when determining whether the right end of the reference block is outside a certain range from the boundary of the reference image. Expression C-3 is an expression used when determining whether or not the upper end of the reference block is outside a certain range from the boundary of the reference image. Expression C-4 is an expression used when determining whether or not the lower end of the reference block is outside a certain range from the boundary of the reference image.

xInt-NTAPS / 2 + 1 <-padW (Formula C-1)
xInt + BLKW + NTAPS / 2-1> pic_width + padW-1 (Formula C-2)
yInt-NTAPS / 2 + 1 <-padH (Formula C-3)
yInt + BLKH + NTAPS / 2-1> pic_height + padH-1 (Formula C-4)
Here, padW and padH in the formulas C-1 to C-4 mean padW indicating a predetermined range in the horizontal direction and padH indicating a predetermined range in the vertical direction, respectively. .

Accordingly, when at least a part of the reference block protrudes from the boundary of the reference image but does not protrude outside the certain range, the first motion vector search unit 303731 generates a motion compensated image (predicted image). Can be generated. In addition, when at least a part of the reference block is outside a certain range from the boundary of the reference image, it is possible to limit the area where the padding process is performed, and thus the processing amount can be reduced.

<Specific example of raster search>
Next, raster search will be described. When the matching motion deriving unit 30373c performs a motion search to which the raster search is applied, the matching motion deriving unit 30373c comprehensively selects search points within the search range at regular intervals, and the matching cost is determined by a raster scan. ) Evaluate in order. Here, the raster scan starts from the upper left of the search range, examines the pixels from the left side to the right until it reaches the right end, and if it reaches the right end, it goes down one row and goes from the left end to the right again. This is an exhaustive search method that sequentially examines the pixels.

The matching motion deriving unit 30373c selects a search vector that gives the smallest matching cost among the matching costs calculated for each search vector from the start point to the end point set in the raster scan order.

With raster scan, for the size area of blkW x blkH, first set the Y coordinate y and the X coordinate x to the initial value, then scan x from the initial value to the closing price, and when x reaches the closing price , X is returned to the initial value, y is increased, and the process of repeatedly scanning x from the initial value to the final value in the updated y is repeated. In pseudo code, this is done in the following double loop where the x loop is inside the y loop.

for (y = 0; y <blkH; y ++) {// loop for y for (x = 0; x <blkW; x ++) {// loop for x Processing in raster scan}
}
Note that an extended raster scan may be used instead of the raster scan. In the extended raster scan, each point in the block is scanned in a predetermined scan order like the raster scan. For example, a spiral scan that scans spirally from the center toward the periphery is used.

Also in the raster search, at least one of the first motion vector search unit 303731 and the second motion vector search unit 303732 may prohibit the search in the oblique direction as in the above-described step search.

(Configuration of image encoding device)
As illustrated in FIG. 40, the image encoding device 11c includes a prediction parameter encoding unit 111c instead of the prediction parameter encoding unit 111b in the third embodiment. The prediction parameter encoding unit 111c includes an inter prediction parameter encoding unit (motion vector deriving unit) 112c instead of the inter prediction parameter encoding unit 112b in the third embodiment.

FIG. 42 is a block diagram illustrating a configuration of the inter prediction parameter encoding unit 112c of the image encoding device 11c according to the fourth embodiment. As illustrated in FIG. 42, the inter prediction parameter encoding unit 112c includes a subblock prediction parameter deriving unit 1125c instead of the subblock prediction parameter deriving unit 1125b in the third embodiment. The sub-block prediction parameter deriving unit 1125c includes a matching motion deriving unit 11253c instead of the matching motion deriving unit 11253 in the third embodiment. Except for this point, the inter prediction parameter encoding unit 112c has the same configuration as the inter prediction parameter encoding unit 112b in the third embodiment.

Since the matching motion deriving unit 11253c has the same configuration as the above-described matching motion deriving unit 30373c, description thereof is omitted here.

Note that the

image encoding devices

11, 11a, 11b, and 11c and part of the

image decoding devices

31, 31a, 31b, and 31c in the above-described embodiment, for example, the entropy decoding unit 301, the prediction

parameter decoding units

302, 302b, and 302c, Loop filter 305, predicted

image generation units

308 and 308a, inverse quantization / inverse transformation unit 311, addition unit 312, predicted image generation units 101 and 101a, subtraction unit 102, transformation / quantization unit 103, entropy encoding unit 104, The inverse quantization / inverse transform unit 105, the loop filter 107, the encoding parameter determination unit 110, and the prediction

parameter encoding units

111, 111b, and 111c may be realized by a computer. In that case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. Here, the “computer system” is a computer system built in any of the

image encoding devices

11, 11a, 11b, and 11c and the

image decoding devices

31, 31a, 31b, and 31c. It shall include hardware such as equipment. The “computer-readable recording medium” refers to a storage device such as a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a hard disk built in a computer system. Further, the “computer-readable recording medium” is a medium that holds a program dynamically for a short time, such as a communication line in the case of transmitting a program via a network such as the Internet or a communication line such as a telephone line, In such a case, a volatile memory inside a computer system serving as a server or a client may be included and a program that holds a program for a certain period of time. The program may be a program for realizing a part of the above-described functions, or may be a program that can be realized by a combination with a program already recorded in the computer system.

In addition, part or all of the

image encoding devices

11, 11a, 11b, and 11c and the

image decoding devices

31, 31a, 31b, and 31c in the above-described embodiment are realized as an integrated circuit such as an LSI (Large Scale Integration). May be. Each functional block of the

image encoding devices

11, 11a, 11b, and 11c and the

image decoding devices

31, 31a, 31b, and 31c may be individually converted into a processor, or a part or all of them may be integrated into a processor. . Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. In addition, when an integrated circuit technology that replaces LSI appears due to the advancement of semiconductor technology, an integrated circuit based on the technology may be used.

As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to

[Application example]
The above-described

image encoding devices

11, 11a, 11b, and 11c and the

image decoding devices

31, 31a, 31b, and 31c can be mounted and used in various devices that perform transmission, reception, recording, and reproduction of moving images. . The moving image may be a natural moving image captured by a camera or the like, or may be an artificial moving image (including CG and GUI) generated by a computer or the like.

First, it will be described with reference to FIG. 50 that the above-described

image encoding devices

11, 11a, 11b, and 11c and the

image decoding devices

31, 31a, 31b, and 31c can be used for transmission and reception of moving images.

FIG. 50 (a) is a block diagram showing a configuration of a transmission apparatus PROD_A equipped with the

image encoding apparatuses

11, 11a, 11b, and 11c. As shown in FIG. 50 (a), the transmission apparatus PROD_A modulates a carrier wave with an encoding unit PROD_A1 that obtains encoded data by encoding a moving image, and with the encoded data obtained by the encoding unit PROD_A1. Thus, a modulation unit PROD_A2 that obtains a modulation signal and a transmission unit PROD_A3 that transmits the modulation signal obtained by the modulation unit PROD_A2 are provided. The above-described

image encoding devices

11, 11a, 11b, and 11c are used as the encoding unit PROD_A1.

Transmission device PROD_A, as a source of moving images to be input to the encoding unit PROD_A1, a camera PROD_A4 that captures moving images, a recording medium PROD_A5 that records moving images, an input terminal PROD_A6 for inputting moving images from the outside, and An image processing unit A7 that generates or processes an image may be further provided. FIG. 50A illustrates a configuration in which the transmission apparatus PROD_A includes all of these, but some of them may be omitted.

Note that the recording medium PROD_A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium PROD_A5 in accordance with the recording encoding method may be interposed between the recording medium PROD_A5 and the encoding unit PROD_A1.

(B) of FIG. 50 is a block diagram showing a configuration of a receiving device PROD_B equipped with the

image decoding devices

31, 31a, 31b, and 31c. As shown in FIG. 50B, the receiving device PROD_B includes a receiving unit PROD_B1 that receives a modulated signal, a demodulating unit PROD_B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit PROD_B1, and a demodulator A decoding unit PROD_B3 that obtains a moving image by decoding the encoded data obtained by the unit PROD_B2. The above-described

image decoding devices

31, 31a, 31b, and 31c are used as the decoding unit PROD_B3.

The receiving device PROD_B is a display destination PROD_B4 for displaying a moving image, a recording medium PROD_B5 for recording a moving image, and an output terminal for outputting the moving image to the outside as a supply destination of the moving image output by the decoding unit PROD_B3 PROD_B6 may be further provided. In FIG. 50B, a configuration in which all of these are provided in the receiving device PROD_B is illustrated, but a part may be omitted.

Note that the recording medium PROD_B5 may be for recording a non-encoded moving image, or is encoded by a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) for encoding the moving image acquired from the decoding unit PROD_B3 according to the recording encoding method may be interposed between the decoding unit PROD_B3 and the recording medium PROD_B5.

Note that the transmission medium for transmitting the modulation signal may be wireless or wired. Further, the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.

For example, a terrestrial digital broadcast broadcasting station (broadcasting equipment, etc.) / Receiving station (television receiver, etc.) is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives modulated signals by wireless broadcasting. A broadcasting station (broadcasting equipment, etc.) / Receiving station (television receiver, etc.) of cable television broadcasting is an example of a transmitting device PROD_A / receiving device PROD_B that transmits and receives a modulated signal by wired broadcasting.

Also, a server (workstation, etc.) / Client (television receiver, personal computer, smartphone, etc.) such as a VOD (Video On Demand) service and a video sharing service using the Internet transmits and receives a modulated signal by communication. This is an example of PROD_A / receiving device PROD_B (usually, either a wireless or wired transmission medium is used in a LAN, and a wired transmission medium is used in a WAN). Here, the personal computer includes a desktop PC, a laptop PC, and a tablet PC. The smartphone also includes a multi-function mobile phone terminal.

In addition to the function of decoding the encoded data downloaded from the server and displaying it on the display, the moving image sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device PROD_A and the reception device PROD_B.

Next, it will be described with reference to FIG. 51 that the above-described

image encoding devices

11, 11a, 11b, and 11c and the

image decoding devices

31, 31a, 31b, and 31c can be used for recording and reproduction of moving images.

FIG. 51A is a block diagram showing a configuration of a recording apparatus PROD_C equipped with the above-described

image encoding apparatuses

11, 11a, 11b, and 11c. As shown in (a) of FIG. 51, the recording apparatus PROD_C includes an encoding unit PROD_C1 that obtains encoded data by encoding a moving image, and the encoded data obtained by the encoding unit PROD_C1 on the recording medium PROD_M. A writing unit PROD_C2 for writing. The above-described

image encoding devices

11, 11a, 11b, and 11c are used as the encoding unit PROD_C1.

The recording medium PROD_M may be of a type built in the recording device PROD_C, such as (1) HDD (Hard Disk Drive) and SSD (Solid State Drive). Further, (2) an SD memory card, a USB (Universal Serial Bus) flash memory, or the like connected to the recording device PROD_C may be used. Further, (3) DVD (Digital Versatile Disc) and BD (Blu-ray Disc: registered trademark), etc. may be loaded into a drive device (not shown) built in the recording device PROD_C. .

In addition, the recording device PROD_C is a camera PROD_C3 that captures moving images as a source of moving images to be input to the encoding unit PROD_C1, an input terminal PROD_C4 for inputting moving images from the outside, and a reception for receiving moving images A unit PROD_C5 and an image processing unit PROD_C6 for generating or processing an image may be further provided. In FIG. 51A, a configuration in which the recording apparatus PROD_C includes all of these is illustrated, but a part may be omitted.

The receiving unit PROD_C5 may receive a non-encoded moving image, or may receive encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit PROD_C5 and the encoding unit PROD_C1.

Examples of such a recording device PROD_C include a DVD recorder, a BD recorder, and an HDD (Hard Disk Drive) recorder (in this case, the input terminal PROD_C4 or the receiving unit PROD_C5 is a main source of moving images). . In addition, a camcorder (in this case, the camera PROD_C3 is a main source of moving images), a personal computer (in this case, the receiving unit PROD_C5 or the image processing unit C6 is a main source of moving images), and a smartphone ( In this case, the camera PROD_C3 or the reception unit PROD_C5 is a main source of moving images) and the like is also an example of such a recording apparatus PROD_C.

FIG. 51 (b) is a block diagram showing a configuration of a playback device PROD_D equipped with the above-described

image decoding devices

31 and 31a. As shown in FIG. 51 (b), the playback device PROD_D reads a moving image by decoding a read unit PROD_D1 that reads encoded data written to the recording medium PROD_M and a read unit PROD_D1 that reads the encoded data. And a decoding unit PROD_D2 to obtain. The above-described

image decoding devices

31, 31a, 31b, and 31c are used as the decoding unit PROD_D2.

Note that the recording medium PROD_M may be of the type built into the playback device PROD_D, such as (1) HDD and SSD. Further, (2) a type connected to the playback device PROD_D, such as an SD memory card and a USB flash memory, may be used. Also, (3) a drive device (not shown) built in the playback device PROD_D, such as a DVD and a BD, may be loaded.

In addition, the playback device PROD_D has a display unit PROD_D3 that displays a moving image as a supply destination of the moving image output by the decoding unit PROD_D2, an output terminal PROD_D4 that outputs the moving image to the outside, and a transmission unit that transmits the moving image. PROD_D5 may be further provided. FIG. 51B illustrates a configuration in which the playback apparatus PROD_D includes all of these, but some of them may be omitted.

Note that the transmission unit PROD_D5 may transmit a non-encoded moving image, or transmits encoded data encoded by a transmission encoding scheme different from the recording encoding scheme. You may do. In the latter case, an encoding unit (not shown) that encodes a moving image by a transmission encoding method may be interposed between the decoding unit PROD_D2 and the transmission unit PROD_D5.

Examples of such a playback device PROD_D include a DVD player, a BD player, and an HDD player (in this case, the output terminal PROD_D4 to which a television receiver or the like is connected is a main moving image supply destination). . In addition, a television receiver (in this case, the display PROD_D3 is a main supply destination of moving images), a digital signage (also referred to as an electronic signboard or an electronic bulletin board), and the display PROD_D3 or the transmission unit PROD_D5 is the main supply of moving images. Desktop PC (in this case, the output terminal PROD_D4 or the transmission unit PROD_D5 is the main video image supply destination), laptop or tablet PC (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a video) An image is a main supply destination, a smartphone (in this case, the display PROD_D3 or the transmission unit PROD_D5 is a main supply destination of moving images), and the like are examples of such a playback device PROD_D.

(Hardware implementation and software implementation)
Each block of the above-described

image decoding devices

31, 31a, 31b, and 31c and the

image encoding devices

11, 11a, 11b, and 11c is realized by hardware by a logic circuit formed on an integrated circuit (IC chip). Alternatively, it may be realized by software using a CPU (Central Processing Unit).

In the latter case, each of the above devices includes a CPU that executes instructions of a program that realizes each function, a ROM (Read Memory) that stores the program, a RAM (RandomAccess Memory) that expands the program, the program, and various data A storage device (recording medium) such as a memory for storing the. The object of the embodiment of the present invention is to record the program code (execution format program, intermediate code program, and source program) of the control program for each of the above devices, which is software that realizes the above-described functions, in a computer-readable manner. This can also be achieved by supplying a medium to each of the above devices, and reading and executing the program code recorded on the recording medium by the computer (or CPU or MPU).

Examples of the recording medium include tapes such as magnetic tapes and cassette tapes, magnetic disks such as floppy (registered trademark) disks / hard disks, and CD-ROMs (Compact Disc Read-Only Memory) / MO discs (Magneto-Optical discs). ) / MD (Mini Disc) / DVD (Digital Versatile Disc) / CD-R (CD Recordable) / Blu-ray Disc (Blu-ray Disc: registered trademark) and other optical disks, IC cards (including memory cards) / Cards such as optical cards, Mask ROM / EPROM (Erasable Programmable Read-Only Memory) / EEPROM (Electrically Erasable and Programmable Read-Only Memory: registered trademark) / Semiconductor memories such as flash ROM, or PLD (Programmable logic device ) And FPGA (Field Programmable Gate Gate Array) and the like.

Further, each of the above devices may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited as long as it can transmit the program code. For example, Internet, intranet, extranet, LAN (Local Area Network), ISDN (Integrated Services Digital Network), VAN (Value-Added Network), CATV (Community Area Antenna / television / Cable Television), Virtual Private Network (Virtual Private Network) Network), telephone line network, mobile communication network, satellite communication network, and the like. The transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type. For example, IEEE (Institute of Electrical and Electronic Engineers) 1394, USB, power line carrier, cable TV line, telephone line, ADSL (Asymmetric Digital Subscriber Line) line, etc. wired such as IrDA (Infrared Data Association) and remote control , BlueTooth (registered trademark), IEEE802.11 wireless, HDR (High Data Rate), NFC (Near Field Communication), DLNA (Digital Living Network Alliance: registered trademark), mobile phone network, satellite line, terrestrial digital broadcasting network, etc. It can also be used wirelessly. The embodiment of the present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.

The embodiments of the present invention are not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. That is, embodiments obtained by combining technical means appropriately modified within the scope of the claims are also included in the technical scope of the present invention.

(Cross-reference of related applications)
This application has priority over Japanese patent application filed on June 14, 2017: Japanese Patent Application No. 2017-117273 and Japanese patent application filed on June 23, 2017: Japanese Patent Application No. 2017-123731. The contents of which are hereby incorporated by reference in their entirety.

Embodiments of the present invention are preferably applied to an image decoding apparatus that decodes encoded data in which image data is encoded, and an image encoding apparatus that generates encoded data in which image data is encoded. it can. Further, the present invention can be suitably applied to the data structure of encoded data generated by an image encoding device and referenced by the image decoding device.

1, 1a, 1b, 1c...

Image transmission system

11, 11a, 11b, 11c... Image encoding device (moving image encoding device, motion vector deriving device)
DESCRIPTION OF SYMBOLS 101, 101a ... Predictive image generation part 1011, 1011a ... Inter prediction image generation part 10111, 10111a ... Motion compensation part 102 ... Subtraction part 103 ... Transform / quantization part 104 ... Entropy encoding part 105 ... Inverse quantization / inverse conversion part 106 ... adder 108 ... prediction parameter memory 109 ... reference picture memory 110 ... encoding

parameter determination unit

111, 111b, 111c ... prediction

parameter encoding unit

112, 112b, 112c ... inter prediction parameter encoding unit (motion vector deriving unit)
DESCRIPTION OF SYMBOLS 1121 ... Merge prediction parameter derivation | leading-out part 1122 ... AMVP prediction parameter derivation | leading-out part 1123 ...

Subtraction part

1125, 1125b, 1125c ... Sub-block prediction parameter derivation | leading-out part 113 ... Intra prediction parameter encoding part 21 ...

Network

31, 31a ... Image decoding apparatus (prediction) Image generating device, moving image decoding device, motion vector deriving device)
301:

Entropy decoding units

302, 302b, 302c ... Prediction

parameter decoding units

303, 303b, 303c ... Inter prediction parameter decoding unit (motion vector deriving unit)
3032 ... AMVP prediction parameter derivation unit 3033 ... vector candidate derivation unit 3036 ... merge prediction parameter derivation unit 3038 ... addition unit 30361 ... merge candidate derivation unit 303611 ... merge candidate storage unit 30362 ... merge candidate selection unit 304 ... intra prediction parameter decoding unit 306 ... Reference picture memory 307 ...

Prediction parameter memories

308 and 308a ... Prediction

image generation units

309 and 309a ... Inter prediction

image generation units

3091 and 3091a ... Motion compensation unit 30911 ... Motion compensation gradient unit 309122 ... Slope correction coefficient derivation unit 309112 ... Slope derivation Unit 3094 ... weight prediction unit 310 ... intra prediction image generation unit 311 ... inverse quantization / inverse conversion unit 312 ... addition unit 41 ...

image display devices

11253, 11253c, 30373, 30373c matching motion deriving unit (first motion) Vector search unit, the second motion vector search unit)
303731 First motion vector search unit 303732 Second motion vector search unit

Claims

A prediction image generation device that generates a prediction image by performing motion compensation on one or a plurality of reference images,
A single prediction mode for generating a prediction image with reference to the first reference image;
A bi-prediction mode for generating a predicted image with reference to the first reference image and the second reference image; and
A prediction image generation unit that generates a prediction image using at least one of the BIO modes that generate a prediction image with reference to the first reference image, the second reference image, and the gradient correction term;
The predicted image generation unit
A prediction image generation device that prohibits generation of a prediction image using the BIO mode when a reference block in at least one of the first reference image and the second reference image is outside the screen of the reference image .
The predicted image generation unit
Producing a predicted image using the BIO mode is prohibited when a reference block in at least one of the first reference image and the second reference image is outside a certain range from the boundary of the reference image. The predicted image generation apparatus according to claim 1.
A predicted image generation device that generates a predicted image by performing motion compensation on a plurality of reference images,
A prediction image generation unit that generates a prediction image using at least a plurality of modes including a BIO mode that generates a prediction image with reference to the first reference image, the second reference image, and the gradient correction term;
The predicted image generation unit
Generating a pixel value outside the reading region, which is a pixel value outside the reading region with respect to a corresponding block in at least one of the first reference image and the second reference image;
When a predicted image is generated using the BIO mode, the predicted image generation apparatus prohibits a generation process of a pixel value outside a reading area along a vertical direction or a horizontal direction by the predicted image generation unit.
When the predicted image generation unit uses the BIO mode with respect to a vertically corresponding block and a block adjacent to the vertically corresponding block in the horizontal direction, the predicted image generation unit uses the horizontal direction by the predicted image generation unit. The prediction image generation device according to claim 3, wherein the generation processing of the pixel value outside the reading area along the line is prohibited.
The predicted image generation apparatus according to claim 4, wherein a block size of the vertically corresponding block and a block adjacent to the vertically corresponding block in the horizontal direction is 4 × 8 pixels.
The predicted image generation unit uses the BIO mode for the horizontally corresponding block and a block adjacent to the horizontally corresponding block in the vertical direction. The prediction image generation device according to claim 3, wherein the generation processing of the pixel value outside the reading area along the line is prohibited.
The predicted image generation apparatus according to claim 6, wherein a block size of the horizontally corresponding block and a block adjacent to the horizontally corresponding block in the vertical direction is 8x4 pixels.
A predictive image generation apparatus according to any one of claims 1 to 7,
A moving picture decoding apparatus that decodes an encoding target picture by adding or subtracting a residual picture to a predicted picture.
A predictive image generation apparatus according to any one of claims 1 to 7,
A moving picture coding apparatus which codes a residual between a predicted picture and a picture to be coded.
In the motion vector deriving device for deriving the motion vectors of the sub-blocks constituting the target block,
A motion vector deriving unit that calculates a motion vector of each of a plurality of sub-blocks included in the target block with reference to motion vectors at a plurality of control points set in a reference block sharing a vertex with the target block; Prepared,
The motion vector deriving unit, wherein the motion vector deriving unit limits a range of a difference between motion vectors at the plurality of control points.
When the difference between the motion vectors at the plurality of control points is larger than a predetermined value, the motion vector deriving unit has a size larger than that when the difference is not more than a predetermined value. The motion vector deriving device according to claim 10, wherein a motion vector of a sub-block is derived.
The motion vector deriving unit
The motion vector deriving device according to claim 10 or 11, wherein a motion vector of a sub-block having an aspect ratio corresponding to an aspect ratio of the target block is derived.
In a motion vector deriving device for generating a motion vector referred to for generating a predicted image used for encoding or decoding a moving image,
A first motion vector search unit that searches for a motion vector for each prediction block by matching processing;
A second motion vector search unit that searches for a motion vector by a matching process for each of a plurality of sub-blocks included in the prediction block with reference to the motion vector selected by the first motion vector search unit; ,
With
The first motion vector search unit searches for a motion vector by performing a local search after performing an initial vector search for a prediction block;
The second motion vector search unit searches for a motion vector by performing a local search after performing an initial vector search for a sub-block,
An apparatus for deriving a motion vector, wherein a search in an oblique direction is prohibited in at least one of a local search by the first motion vector search unit and a local search by the second motion vector search unit.
The local search by the first motion vector search unit does not prohibit an oblique search, and the local search by the second motion vector search unit prohibits an oblique search. Item 14. The motion vector deriving device according to Item 13.
Each of the local search by the first motion vector search unit and the local search by the second motion vector search unit includes a plurality of step round processes,
In at least one of the local search by the first motion vector search unit and the local search by the second motion vector search unit, a diagonal search is prohibited in the step round process up to the Mth (M is a natural number). 14. The motion vector deriving device according to claim 13, wherein a search in an oblique direction is prohibited in the M + 1th and subsequent step round processes.
In at least one of the local search by the first motion vector search unit and the local search by the second motion vector search unit,
When at least one of the first motion vector search unit and the second motion vector search unit is a one-pixel position as a search candidate point to be a search destination candidate point of the motion vector, Do not ban search,
The at least one of the first motion vector search unit and the second motion vector search unit prohibits a search in an oblique direction when the search candidate point is a decimal pixel position. Item 14. The motion vector deriving device according to Item 13.
In at least one of the local search by the first motion vector search unit and the local search by the second motion vector search unit,
When at least a part of the reference block indicated by the motion vector searched by at least one of the first motion vector search unit and the second motion vector search unit is outside the screen of the reference image, the motion vector search is performed. The motion vector deriving device according to claim 13, wherein the motion vector deriving device is prohibited.
At least one of the first motion vector search unit and the second motion vector search unit is indicated by a motion vector searched by at least one of the first motion vector search unit and the second motion vector search unit. 18. The motion vector deriving device according to claim 17, wherein search of the motion vector is prohibited when at least a part of the reference block is outside a certain range from a boundary of the reference image.
A motion vector deriving device according to any one of claims 10 to 18,
A predicted image generation device that generates a predicted image with reference to a motion vector generated by the motion vector deriving device.
A prediction image generation device according to claim 19,
A moving picture decoding apparatus, wherein a coding target picture is restored by adding or subtracting a residual picture to the predicted picture.
A prediction image generation device according to claim 19,
A moving picture coding apparatus, wherein a residual between the predicted picture and a picture to be coded is coded.