CN116389749A

CN116389749A - Moving image decoding device and method, and moving image encoding device and method

Info

Publication number: CN116389749A
Application number: CN202310342830.1A
Authority: CN
Inventors: 福岛茂; 中村博哉; 坂爪智; 熊仓彻; 仓重宏之; 竹原英树
Original assignee: JVCKenwood Corp
Current assignee: JVCKenwood Corp
Priority date: 2018-12-28
Filing date: 2019-12-20
Publication date: 2023-07-04
Also published as: CN113068038A; CN116389750A; CN113068038B; CN116389751A

Abstract

The present invention relates to a moving picture decoding apparatus and method, and a moving picture encoding apparatus and method. In order to provide a low-load and efficient encoding technique, an image decoding device includes: a spatial motion information candidate deriving unit that derives a spatial motion information candidate from motion information of a block spatially close to the block to be decoded; a temporal motion information candidate deriving unit that derives a temporal motion information candidate from motion information of a block that is temporally close to the block to be decoded; and a history motion information candidate deriving unit that derives a history motion information candidate from a memory that holds motion information of the decoded block, the temporal motion information candidate not comparing motion information with any one of the spatial motion information candidate and the history motion information candidate.

Description

Moving image decoding device and method, and moving image encoding device and method

The present application is a divisional application filed based on patent application numbers 201980050793.9, application dates 2019, 12 months and 20 days, application names Wu Zhushi by JVC, and "moving picture decoding device, moving picture decoding method, moving picture decoding program, moving picture encoding device, moving picture encoding method, and moving picture encoding program".

Technical Field

The present invention relates to an image encoding and decoding technique for dividing an image into blocks and predicting the blocks.

Background

In image encoding and decoding, an image to be processed is divided into a set of a predetermined number of pixels, i.e., blocks, and processed in units of blocks. By dividing the block into appropriate blocks, intra-picture prediction (intra-frame prediction) and inter-picture prediction (inter-frame prediction) are appropriately set, thereby improving coding efficiency.

In encoding/decoding of a moving picture, encoding efficiency is improved by inter prediction according to picture prediction that has been encoded/decoded. Patent document 1 discloses a technique of applying affine transformation at the time of inter prediction. In a moving image, it is not uncommon for an object to undergo deformation such as enlargement, reduction, or rotation, and efficient encoding can be performed by applying the technique of patent document 1.

Prior art literature

Patent literature

Patent document 1: japanese patent laid-open No. 9-172644.

Disclosure of Invention

Problems to be solved by the invention

However, the technique of patent document 1 involves a problem of a large processing load, because of the image conversion. In view of the above, the present invention provides a low-load and efficient coding technique.

Means for solving the problems

In order to solve the above problems, a moving picture decoding apparatus according to an embodiment of the present invention includes: a spatial motion information candidate deriving unit that derives a spatial motion information candidate from motion information of a block spatially close to the block to be decoded; a temporal motion information candidate deriving unit that derives a temporal motion information candidate from motion information of a block that is temporally close to the block to be decoded; and a history motion information candidate deriving unit that derives a history motion information candidate from a memory that holds motion information of the decoded block, the temporal motion information candidate not comparing motion information with any one of the spatial motion information candidate and the history motion information candidate.

A moving image decoding method according to another aspect of the present invention includes the steps of: deriving spatial motion information candidates from motion information of a block spatially close to the decoding object block; deriving temporal motion information candidates from motion information of a block temporally close to the decoding object block; and deriving a historical motion information candidate from a memory holding motion information of the decoded block, the temporal motion information candidate not being compared with any of the spatial motion information candidate and the historical motion information candidate for motion information.

A moving image decoding program according to another aspect of the present invention is a moving image decoding program for causing a computer to function as: a spatial motion information candidate deriving unit that derives a spatial motion information candidate from motion information of a block spatially close to the block to be decoded; a temporal motion information candidate deriving unit that derives a temporal motion information candidate from motion information of a block that is temporally close to the block to be decoded; and a history motion information candidate deriving unit that derives a history motion information candidate from a memory that holds motion information of the decoded block, the temporal motion information candidate not comparing motion information with any one of the spatial motion information candidate and the history motion information candidate.

ADVANTAGEOUS EFFECTS OF INVENTION

According to the present invention, efficient image encoding/decoding processing can be realized with a low load.

Drawings

Fig. 1 is a block diagram of an image encoding device according to an embodiment of the present invention;

fig. 2 is a block diagram of an image decoding apparatus according to an embodiment of the present invention;

FIG. 3 is a flow chart for explaining the actions of splitting a tree block;

fig. 4 is a diagram showing a case where an input image is divided into tree blocks;

FIG. 5 is a diagram illustrating z-scanning;

Fig. 6A is a diagram showing a division shape of a block;

fig. 6B is a diagram showing a division shape of a block;

fig. 6C is a diagram showing a division shape of a block;

fig. 6D is a diagram showing a division shape of a block;

fig. 6E is a diagram showing a division shape of a block;

FIG. 7 is a flowchart for explaining the operations of 4-splitting a block;

FIG. 8 is a flow chart for explaining the actions of 2-splitting or 3-splitting a block;

FIG. 9 is a syntax for describing the shape of a block partition;

fig. 10A is a diagram for explaining intra prediction;

fig. 10B is a diagram for explaining intra prediction;

fig. 11 is a diagram for explaining a reference block of inter prediction;

fig. 12 is a syntax for describing a prediction mode of an encoded block;

fig. 13 is a diagram showing correspondence between syntax elements and modes related to inter prediction;

fig. 14 is a diagram for explaining affine transformation motion compensation with two control points;

fig. 15 is a diagram for explaining affine transformation motion compensation with three control points;

fig. 16 is a block diagram showing the detailed structure of the inter prediction unit 102 in fig. 1;

fig. 17 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 301 in fig. 16;

fig. 18 is a block diagram showing the detailed configuration of the normal merge mode deriving unit 302 in fig. 16;

Fig. 19 is a flowchart for explaining the normal prediction motion vector pattern derivation process by the normal prediction motion vector pattern derivation unit 301 in fig. 16;

fig. 20 is a flowchart showing the processing steps of the normal prediction motion vector pattern derivation processing;

fig. 21 is a flowchart illustrating the processing steps of the normal merge mode derivation processing;

fig. 22 is a block diagram showing the detailed structure of the inter prediction unit 203 in fig. 2;

fig. 23 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 401 in fig. 22;

fig. 24 is a block diagram showing the detailed configuration of the normal merge mode deriving unit 402 in fig. 22;

fig. 25 is a flowchart for explaining the normal prediction motion vector pattern derivation process by the normal prediction motion vector pattern derivation unit 401 in fig. 22;

fig. 26 is a diagram illustrating a history prediction motion vector candidate list initialization/update processing procedure;

fig. 27 is a flowchart of the same element confirmation processing step in the history prediction motion vector candidate list initialization/update processing step;

fig. 28 is a flowchart of an element shift processing step in the history prediction motion vector candidate list initialization/update processing step;

fig. 29 is a flowchart illustrating the steps of the historic predicted motion vector candidate derivation processing;

Fig. 30 is a flowchart illustrating the history merge candidate derivation processing steps;

fig. 31A is a diagram for explaining an example of the history prediction motion vector candidate list update processing;

fig. 31B is a diagram for explaining an example of the history prediction motion vector candidate list update processing;

fig. 31C is a diagram for explaining an example of the history prediction motion vector candidate list update processing;

fig. 32 is a diagram for explaining motion compensation prediction in the case where the reference picture (RefL 0 Pic) of L0 is at a time before the processing target picture (CurPic) in L0 prediction;

fig. 33 is a diagram for explaining motion compensation prediction in the case where a reference picture for L0 prediction in L0 prediction is at a time point after a processing target picture;

fig. 34 is a diagram for explaining the prediction direction of motion compensation prediction in the case where the reference picture for L0 prediction is at a time before the processing target picture and the reference picture for L1 prediction is at a time after the processing target picture in bi-prediction;

fig. 35 is a diagram for explaining the prediction direction of motion compensation prediction in the case where the reference picture for L0 prediction and the reference picture for L1 prediction in bi-prediction are at a time instant before the processing target picture;

fig. 36 is a diagram for explaining the prediction direction of motion compensation prediction in the case where the reference picture for L0 prediction and the reference picture for L1 prediction in bi-prediction are at a time instant after the processing target picture;

Fig. 37 is a diagram for explaining an example of a hardware configuration of the codec device according to the embodiment of the present invention;

fig. 38 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 301 in fig. 16 according to the second embodiment of the present invention;

fig. 39 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 401 in fig. 22 according to the second embodiment of the present invention;

fig. 40 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 301 in fig. 16 according to the third embodiment of the present invention;

fig. 41 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 401 in fig. 22 according to the third embodiment of the present invention.

Detailed Description

The technology and technical terms used in this embodiment are defined.

< Tree Block >

In the embodiment, the encoding/decoding processing target image is equally divided by a predetermined size. This unit is defined as a treeblock. In fig. 4, the size of the tree block is 128×128 pixels, but the size of the tree block is not limited thereto, and any size may be set. The tree blocks as processing objects (corresponding to the encoding objects in the encoding process and the decoding objects in the decoding process) are switched in raster scan order, that is, from left to right, from top to bottom. Further recursive partitioning of the interior of each tree block may be performed. The block that is the object of encoding and decoding after recursively dividing the tree block is defined as an encoded block. The tree blocks and the code blocks are collectively referred to as blocks. By performing appropriate block division, efficient encoding can be performed. The size of the tree block may be a fixed value predetermined in the encoding device and the decoding device, or may be a size determined by the encoding device and transmitted to the decoding device. Here, the maximum size of the tree block is set to 128×128 pixels, and the minimum size of the tree block is set to 16×16 pixels. The maximum size of the encoded block is 64×64 pixels, and the minimum size of the encoded block is 4×4 pixels.

< prediction mode >

In units of a processing target coding block, INTRA prediction (mode_intra) that predicts from a processed image signal of a processing target image and INTER prediction (mode_inter) that predicts from an image signal of a processed image are switched.

The processed image is used in the encoding process for an image, an image signal, a tree block, a block, an encoded block, or the like obtained by decoding the encoded signal, and is used in the decoding process for an image, an image signal, a tree block, a block, an encoded block, or the like obtained by decoding the encoded signal.

MODEs that identify the INTRA prediction (mode_intra) and INTER prediction (mode_inter) are defined as prediction MODEs (PredMode). The prediction MODE (PredMode) expresses INTRA prediction (mode_intra) or INTER prediction (mode_inter) in the form of a value.

< inter prediction >

In inter prediction, which predicts from an image signal of a processed image, a plurality of processed images can be used as reference pictures. In order to manage a plurality of reference pictures, two reference lists, L0 (reference list 0) and L1 (reference list 1), are defined, and reference indices are used to determine the reference pictures, respectively. In the P slice (P slice), L0 prediction (pred_l0) can be used. In the B slice (B slice), L0 prediction (pred_l0), L1 prediction (pred_l1), and BI prediction (pred_bi) can be used. The L0 prediction (pred_l0) is inter prediction referring to a reference picture managed by L0, and the L1 prediction (pred_l1) is inter prediction referring to a reference picture managed by L1. The BI-prediction (pred_bi) is inter-prediction that performs L0 prediction and L1 prediction simultaneously and refers to individual reference pictures managed by each of L0 and L1. Information determining L0 prediction, L1 prediction, and bi-prediction is defined as an inter prediction mode. The constants and variables to which the subscript LX is added to the output in the subsequent processing are assumed to be processed for L0 and L1.

< predictive motion vector mode >

The prediction motion vector mode is a mode for transmitting an index for determining a prediction motion vector, a differential motion vector, an inter prediction mode, a reference index, and inter prediction information for determining a processing target block. The prediction motion vector is derived from a prediction motion vector candidate derived from a processed block adjacent to the processing target block or a block located at the same position as or in the vicinity (vicinity) of the processing target block in a block belonging to the processed image and an index for determining the prediction motion vector.

< merge mode >

The merge mode is the following: the inter prediction information of the processing target block is derived from inter prediction information of a processed block adjacent to the processing target block or a block located at the same position as or in the vicinity of (in the vicinity of) the processing target block among blocks belonging to the processed image without transmitting the differential motion vector and the reference index.

The processed block adjacent to the processing target block and inter prediction information of the processed block are defined as spatial merge candidates. A block located at the same position as or in the vicinity of (adjacent to) the processing target block in the block belonging to the processed image, and inter prediction information derived from inter prediction information of the block are defined as temporal merging candidates. Each merge candidate is registered in a merge candidate list, and a merge candidate used for prediction of a processing target block is determined by a merge index.

< adjacent Block >

Fig. 11 is a diagram illustrating a reference block referred to for deriving inter prediction information in a prediction motion vector mode and a merge mode. A0, A1, A2, B0, B1, B2, and B3 are processed blocks adjacent to the processing target block. T0 is a block located at the same position as or in the vicinity of (adjacent to) the processing target block in the processing target image among the belonging processed image blocks.

A1 and A2 are blocks located on the left side of the processing target encoding block and adjacent to the processing target encoding block. B1 and B3 are blocks located above the processing target encoding block and adjacent to the processing target encoding block. A0, B0, and B2 are blocks located at the lower left, upper right, and upper left of the processing target encoding block, respectively.

Details of how neighboring blocks are handled in the prediction motion vector mode and the merge mode are described later.

< affine transformation motion Compensation >

The affine transformation motion compensation is performed by dividing a coded block into sub-blocks of a predetermined unit and determining motion vectors for the divided sub-blocks individually. The motion vector of each sub-block is derived based on one or more control points derived from inter prediction information of a processed block adjacent to the processing target block or a block located at the same position as or in the vicinity (vicinity) of the processing target block among blocks belonging to the processed image. In the present embodiment, the size of the sub-block is set to 4×4 pixels, but the size of the sub-block is not limited to this, and the motion vector may be derived in units of pixels.

Fig. 14 shows an example of affine transformation motion compensation when the control points are two. In this case, the two control points have two parameters, a horizontal direction component and a vertical direction component. Therefore, affine transformation when the control points are two is referred to as four-parameter affine transformation. CP1 and CP2 in fig. 14 are control points.

Fig. 15 shows an example of affine transformation motion compensation when the control points are three. In this case, the three control points have two parameters, a horizontal direction component and a vertical direction component. Therefore, affine transformation when the control points are three is referred to as six-parameter affine transformation. CP1, CP2, CP3 of fig. 15 are control points.

Affine transformation motion compensation can be used in either one of the prediction motion vector mode and the merge mode. A mode in which affine transformation motion compensation is applied in the prediction motion vector mode is defined as a sub-block prediction motion vector mode, and a mode in which affine transformation motion compensation is applied in the merge mode is defined as a sub-block merge mode.

Syntax of inter prediction

The syntax related to inter prediction will be described with reference to fig. 12 and 13.

The merge_flag of fig. 12 is a flag indicating whether the processing target coded block is set to the merge mode or the prediction motion vector mode. The merge_affine_flag is a flag indicating whether or not to apply the subblock merge mode in the processing object encoding block of the merge mode. The inter_affine_flag is a flag indicating whether or not to apply a subblock prediction motion vector mode in a processing object encoding block of a prediction motion vector mode. The cu_affine_type_flag is a flag for deciding the number of control points in the sub-block prediction motion vector mode.

Fig. 13 shows values of respective syntax elements and prediction methods corresponding thereto. The merge_flag=1 and the merge_affine_flag=0 correspond to the normal merge mode. The merge mode is typically a merge mode that is not a sub-block merge. The merge_flag=1 and the merge_affine_flag=1 correspond to the sub-block merging mode. The merge_flag=0 and the inter_affine_flag=0 correspond to a normal prediction motion vector mode. The usual prediction motion vector mode is a combination of prediction motion vectors that is not a sub-block prediction motion vector mode. The merge_flag=0 and the inter_affine_flag=1 correspond to a sub-block prediction motion vector mode. In case of merge_flag=0 and inter_affine_flag=1, the cu_affine_type_flag is further transmitted, and the number of control points is decided.

<POC>

POC (Picture Order Count ) is a variable associated with a picture to be encoded, and a value of increment 1 corresponding to the output order of the pictures is set. Whether or not the pictures are identical, the front-back relationship between the pictures in the output order, and the distance between the pictures can be discriminated from the POC value. For example, if POC of two pictures have the same value, it may be determined to be the same picture. When the POC of the two pictures has different values, it can be determined that the picture whose POC value is small is the picture that is output first, and the difference between the POC of the two pictures indicates the distance between the pictures in the time axis direction.

(first embodiment)

An image encoding device 100 and an image decoding device 200 according to a first embodiment of the present invention will be described.

Fig. 1 is a block diagram of an image encoding device 100 according to a first embodiment. The image encoding device 100 according to the embodiment includes a block dividing unit 101, an inter-frame predicting unit 102, an intra-frame predicting unit 103, a decoded image memory 104, a prediction method determining unit 105, a residual generating unit 106, an orthogonal transform/quantizing unit 107, a bit string encoding unit 108, an inverse quantization/inverse orthogonal transform unit 109, a decoded image signal superimposing unit 110, and an encoded information storage memory 111.

The block dividing section 101 recursively divides an image that has been input to generate an encoded block. The block dividing unit 101 includes a 4 dividing unit that divides a block to be divided in the horizontal direction and the vertical direction, respectively, and a 2-3 dividing unit that divides a block to be divided in either the horizontal direction or the vertical direction. The block dividing unit 101 sets the generated encoded block as a processing target encoded block, and supplies an image signal of the processing target encoded block to the inter prediction unit 102, the intra prediction unit 103, and the residual error generating unit 106. The block dividing unit 101 supplies information indicating the determined recursive division structure to the bit string encoding unit 108. The detailed operation of the block dividing unit 101 will be described later.

The inter prediction unit 102 performs inter prediction of the processing target coded block. The inter-prediction unit 102 derives a plurality of candidates of inter-prediction information from the inter-prediction information stored in the encoded information storage memory 111 and the decoded image signal stored in the decoded image memory 104, selects an appropriate inter-prediction mode from the derived plurality of candidates, and supplies the selected inter-prediction mode and the predicted image signal corresponding to the selected inter-prediction mode to the prediction method decision unit 105. The detailed structure and operation of the inter prediction section 102 will be described later.

The intra prediction unit 103 performs intra prediction of the processing target coded block. The intra-frame prediction unit 103 refers to the decoded image signal stored in the decoded image memory 104 as a reference pixel, and generates a predicted image signal by intra-frame prediction based on the encoding information such as the intra-frame prediction mode stored in the encoding information storage memory 111. In intra prediction, the intra prediction unit 103 selects an appropriate intra prediction mode from among a plurality of intra prediction modes, and supplies the selected intra prediction mode and a predicted image signal corresponding to the selected intra prediction mode to the prediction method determination unit 105.

Fig. 10A and 10B show an example of intra prediction. Fig. 10A is a diagram showing a correspondence relationship between the prediction direction of intra prediction and the intra prediction mode number. For example, the intra prediction mode 50 generates an intra prediction image by copying reference pixels in the vertical direction. The intra prediction mode 1 is a DC mode, and is a mode in which all pixel values of the processing target block are set as an average value of reference pixels. The intra prediction mode 0 is a Planar mode (two-dimensional mode), and is a mode in which a two-dimensional intra prediction image is generated from reference pixels in the vertical direction and the horizontal direction. Fig. 10B is an example of an intra-prediction image in the case of generating the intra-prediction mode 40. The intra prediction unit 103 copies the value of the reference pixel in the direction indicated by the intra prediction mode to each pixel of the processing target block. In the case where the reference pixel in the intra prediction mode is not an integer position, the intra prediction unit 103 decides a reference pixel value by interpolation from reference pixel values at surrounding integer positions.

The decoded image memory 104 stores the decoded image generated by the decoded image signal superimposing unit 110. The decoded image memory 104 supplies the stored decoded image to the inter prediction unit 102 and the intra prediction unit 103.

The prediction method determining unit 105 determines an optimal prediction mode by evaluating each of intra prediction and inter prediction using coding information, a coding amount of a residual, a distortion amount between a predicted image signal and a processing target image signal, and the like. In the case of intra prediction, the prediction method determination unit 105 supplies intra prediction information such as an intra prediction mode to the bit string encoding unit 108 as encoding information. In the case of the merge mode of inter prediction, the prediction method determination unit 105 supplies inter prediction information such as a merge index, information indicating whether or not the mode is a sub-block merge mode (sub-block merge flag), and the like, as encoding information to the bit string encoding unit 108. In the case of the inter-prediction motion vector mode, the prediction method determination unit 105 supplies inter-prediction information such as the inter-prediction mode, the prediction motion vector index, the reference indices of L0 and L1, the differential motion vector, and information indicating whether or not the inter-prediction mode is the sub-block prediction motion vector mode (sub-block prediction motion vector flag) as encoding information to the bit string encoding unit 108. The prediction method determining unit 105 supplies the determined encoded information to the encoded information storage memory 111. The prediction method determining unit 105 supplies the predicted image signal to the residual generating unit 106 and the decoded image signal superimposing unit 110.

The residual generation unit 106 generates a residual by subtracting the predicted image signal from the image signal to be processed, and supplies the generated residual to the orthogonal transformation/quantization unit 107.

The orthogonal transform/quantization section 107 performs orthogonal transform and quantization on the residual according to the quantization parameter to generate an orthogonal transformed/quantized residual, and supplies the generated residual to the bit string encoding section 108 and the inverse quantization/inverse orthogonal transform section 109.

The bit string encoding unit 108 encodes, for each of the encoded blocks, the encoded information corresponding to the prediction method determined by the prediction method determining unit 105, in addition to the information of the sequence, picture, slice, and encoded block units. Specifically, the bit string encoding unit 108 encodes the prediction mode PredMode for each encoded block. When the prediction MODE is INTER prediction (mode_inter), the bit string encoding unit 108 encodes encoding information (INTER prediction information) such as a flag for discriminating whether or not the MODE is a merge MODE, a sub-block merge flag, a merge index in the case of the merge MODE, an INTER prediction MODE in the case of the non-merge MODE, a prediction motion vector index, information on a differential motion vector, and a sub-block prediction motion vector flag, according to a predetermined syntax (syntax rule of a bit string), and generates a first bit string. When the prediction MODE is INTRA prediction (mode_intra), encoding information (INTRA prediction information) such as the INTRA prediction MODE is encoded according to a predetermined syntax (syntax rule of a bit string), and a first bit string is generated. The bit string encoding unit 108 entropy encodes the residual after the orthogonal transformation and quantization according to a predetermined syntax, and generates a second bit string. The bit string encoding unit 108 multiplexes the first bit string and the second bit string according to a predetermined syntax, and outputs a bit stream.

The inverse quantization/inverse orthogonal transformation section 109 performs inverse quantization and inverse orthogonal transformation on the orthogonally transformed/quantized residual supplied from the orthogonal transformation/quantization section 107 to calculate a residual, and supplies the calculated residual to the decoded image signal superimposition section 110.

The decoded image signal superimposing unit 110 superimposes the predicted image signal corresponding to the determination by the prediction method determining unit 105 and the residual error obtained by the inverse quantization and inverse orthogonal transformation by the inverse quantization/inverse orthogonal transformation unit 109, and generates a decoded image, which is stored in the decoded image memory 104. The decoded image signal superimposing unit 110 may apply a filter process to the decoded image to reduce distortion such as block distortion caused by encoding, and store the result in the decoded image memory 104.

The encoding information storage memory 111 stores encoding information such as the prediction mode (inter prediction or intra prediction) determined by the prediction method determining unit 105. In the case of inter prediction, the encoding information stored in the encoding information storage memory 111 includes inter prediction information such as the determined motion vector, reference indexes of the reference lists L0 and L1, and a history prediction motion vector candidate list. In the case of the merge mode of inter prediction, the encoded information stored in the encoded information storage memory 111 includes, in addition to the above-described pieces of information, a merge index and inter prediction information indicating whether or not the information is the sub-block merge mode (sub-block merge flag). In the case of the inter-prediction motion vector mode, the coded information stored in the coded information storage memory 111 includes inter-prediction information such as the inter-prediction mode, the prediction motion vector index, the differential motion vector, and information indicating whether or not the mode is the sub-block prediction motion vector mode (sub-block prediction motion vector flag), in addition to the above-described information. In the case of intra prediction, the encoding information stored in the encoding information storage memory 111 includes intra prediction information such as the determined intra prediction mode.

Fig. 2 is a block diagram showing the configuration of an image decoding device according to an embodiment of the present invention corresponding to the image encoding device of fig. 1. The image decoding device of the embodiment includes a bit string decoding unit 201, a block dividing unit 202, an inter prediction unit 203, an intra prediction unit 204, an encoded information storage memory 205, an inverse quantization/inverse orthogonal transform unit 206, a decoded image signal superimposing unit 207, and a decoded image memory 208.

The decoding process of the image decoding apparatus of fig. 2 corresponds to the decoding process provided inside the image encoding apparatus of fig. 1, and therefore each configuration of the encoded information storage memory 205, the inverse quantization/inverse orthogonal transformation unit 206, the decoded image signal superimposing unit 207, and the decoded image memory 208 of fig. 2 has a function corresponding to each configuration of the encoded information storage memory 111, the inverse quantization/inverse orthogonal transformation unit 109, the decoded image signal superimposing unit 110, and the decoded image memory 104 of the image encoding apparatus of fig. 1.

The bit stream supplied to the bit string decoding section 201 is separated according to a predetermined syntax rule. The bit string decoder 201 decodes the separated first bit string to obtain information on a sequence, picture, slice, coded block unit, and coded information on a coded block unit. Specifically, the bit string decoder 201 decodes the prediction MODE PredMode in units of encoded blocks, and the prediction MODE PredMode distinguishes between INTER prediction (mode_inter) and INTRA prediction (mode_intra). When the prediction MODE is INTER prediction (mode_inter), the bit string decoder 201 decodes coding information (INTER prediction information) on a flag for discriminating whether the MODE is the merge MODE, a merge index in the case of the merge MODE, a sub-block merge flag, an INTER prediction MODE in the case of the prediction motion vector MODE, a prediction motion vector index, a differential motion vector, a sub-block prediction motion vector flag, and the like, according to a predetermined syntax, and supplies the coding information (INTER prediction information) to the coding information storage 205 via the INTER prediction unit 203 and the block divider 202. When the prediction MODE is INTRA prediction (mode_intra), encoding information (INTRA prediction information) such as the INTRA prediction MODE is decoded according to a predetermined syntax, and the encoding information (INTRA prediction information) is supplied to the encoding information storage memory 205 via the inter prediction unit 203 or the INTRA prediction unit 204, and the block division unit 202. The bit string decoder 201 decodes the separated second bit string, calculates an orthogonal transformed/quantized residual, and supplies the orthogonal transformed/quantized residual to the inverse quantization/inverse orthogonal transformer 206.

When the prediction MODE PredMode of the coded block to be processed is a predicted motion vector MODE in INTER prediction (mode_inter), the INTER prediction unit 203 derives candidates of a plurality of predicted motion vectors using the coded information of the decoded image signal stored in the coded information storage memory 205, and registers the derived candidates of a plurality of predicted motion vectors in a predicted motion vector candidate list described later. The inter-frame prediction unit 203 selects a predicted motion vector corresponding to the predicted motion vector index supplied by decoding by the bit string decoding unit 201 from among a plurality of predicted motion vector candidates registered in the predicted motion vector candidate list, calculates a motion vector from the differential motion vector decoded by the bit string decoding unit 201 and the selected predicted motion vector, and stores the calculated motion vector in the encoded information storage memory 205 together with other encoded information. Here, the coding information of the coding block to be supplied/stored is a prediction mode PredMode, a flag predflag L0[ xP ] [ yP ] representing whether L0 prediction and L1 prediction are used, predflag L1[ xP ] [ yP ], reference indices refIdxL0[ xP ] [ yP ] of L0, L1, refIdxL1[ xP ] [ yP ], a motion vector mvL0[ xP ] [ yP ] of L0, L1, mvL1[ xP ] [ yP ], or the like. Here, xP and yP are indexes indicating the position of the upper left pixel of the encoded block within the picture. In the case where the prediction MODE PredMode is INTER prediction (mode_inter) and the INTER prediction MODE is L0 prediction (pred_l0), a flag predflag L0 indicating whether L0 prediction is used is 1, and a flag predflag L1 indicating whether L1 prediction is used is 0. In the case where the inter prediction mode is L1 prediction (pred_l1), a flag predflag L0 indicating whether L0 prediction is used is 0, and a flag predflag L1 indicating whether L1 prediction is used is 1. In the case where the inter prediction mode is BI-prediction (pred_bi), both the flag predflag L0 indicating whether L0 prediction is used and the flag predflag L1 indicating whether L1 prediction is used are 1. When the prediction MODE PredMode of the target coded block is a merge MODE in INTER prediction (mode_inter), a merge candidate is derived. Using the encoded information of the decoded encoded block stored in the encoded information storage memory 205, a plurality of merging candidates are derived and registered in a later-described merging candidate list, a merging candidate corresponding to the merging index supplied by decoding by the bit string decoding unit 201 is selected from the plurality of merging candidates registered in the merging candidate list, and inter-frame prediction information such as a flag predflag L0[ xP ] [ yP ], predflag L1[ xP ] [ yP ], a reference index refIdxL0[ xP ] [ yP ], refIdxL1[ xP ] [ yP ], a motion vector mvL0[ xP ] [ yP ], mvL1[ xP ] [ yP ] of L0, L1, and the like indicating whether or not the selected merging candidate is used for L0 prediction and L1 prediction is stored in the encoded information storage memory 205. Here, xP and yP are indexes indicating the position of the upper left pixel of the encoded block within the picture. The detailed configuration and operation of the inter prediction unit 203 will be described later.

When the prediction MODE PredMode of the processing target coded block is INTRA prediction (mode_intra), the INTRA prediction unit 204 performs INTRA prediction. The encoded information decoded by the bit string decoder 201 includes an intra prediction mode. The intra-prediction section 204 generates a predicted image signal by intra-prediction according to an intra-prediction mode included in the decoding information decoded by the bit string decoding section 201 and according to the decoded image signal stored in the decoded image memory 208, and supplies the generated predicted image signal to the decoded image signal superimposing section 207. The intra-prediction unit 204 corresponds to the intra-prediction unit 103 of the image encoding apparatus 100, and thus performs the same processing as the intra-prediction unit 103.

The inverse quantization/inverse orthogonal transform unit 206 performs inverse orthogonal transform and inverse quantization on the residual error after the orthogonal transform/quantization decoded by the bit string decoding unit 201, and obtains the residual error after the inverse orthogonal transform/inverse quantization.

The decoded image signal superimposing unit 207 superimposes the predicted image signal obtained by the inter prediction unit 203 or the predicted image signal obtained by the intra prediction unit 204 and the residual error obtained by the inverse orthogonal transform/inverse quantization by the inverse quantization/inverse orthogonal transform unit 206, decodes the decoded image signal, and stores the decoded image signal in the decoded image memory 208. When stored in the decoded image memory 208, the decoded image signal superimposing unit 207 may perform a filtering process for reducing block distortion or the like caused by encoding on the decoded image, and then store the decoded image in the decoded image memory 208.

Next, an operation of the block dividing unit 101 in the image encoding device 100 will be described. Fig. 3 is a flowchart showing an operation of dividing an image into tree blocks and further dividing each tree block. First, an input image is divided into tree blocks of a predetermined size (step S1001). Each tree block is scanned in a predetermined order, that is, in raster scan order (step S1002), and the inside of the tree block to be processed is divided (step S1003).

Fig. 7 is a flowchart showing the detailed operation of the segmentation process in step S1003. First, it is determined whether or not the block to be processed is 4-divided (step S1101).

When it is determined that the processing target block 4 is divided, the processing target block 4 is divided (step S1102). Each block obtained by dividing the processing target block is scanned in the Z scanning order, that is, in the order of upper left, upper right, lower left, and lower right (step S1103). Fig. 5 shows an example of the Z scanning procedure, and 601 in fig. 6A shows an example of the block 4 to be processed divided. The numbers 0 to 3 of 601 in fig. 6A indicate the sequence of processing. Then, the division processing of fig. 7 is recursively executed for each block divided in step S1101 (step S1104).

If it is determined that the processing target block is not 4-segmented, 2-3 segmentation is performed (step S1105).

Fig. 8 is a flowchart showing the detailed operation of the 2-3 division processing in step S1105. First, it is determined whether or not to perform 2-3 division on the block to be processed, that is, whether or not to perform either of 2 division and 3 division (step S1201).

If it is determined that the processing target block is not divided by 2-3, that is, if it is determined that the processing target block is not divided, the division is terminated (step S1211). That is, no further recursive partitioning process is performed on the blocks partitioned by the recursive partitioning process.

If it is determined that the block to be processed is divided into 2-3 blocks, it is determined whether or not the block to be processed is further divided into 2 blocks (step S1202).

When it is determined that the processing target block is divided into 2 segments, it is determined whether or not the processing target block is divided into up and down (vertical direction) (step S1203), and based on the result, the processing target block is divided into 2 segments up and down (vertical direction) (step S1204), or the processing target block is divided into 2 segments in the left and right (horizontal direction) (step S1205). As a result of step S1204, the processing target block is divided into upper and lower (vertical direction) portions as shown by 602 in fig. 6B. As a result of step S1205, the processing target block is divided into left and right (horizontal direction) portions as shown by 604 in fig. 6D.

In step S1202, when it is not determined that the processing target block is divided by 2, that is, when it is determined that the processing target block is divided by 3, it is determined whether the processing target block is divided by up-down (vertical direction) (step S1206), and based on the result, the processing target block is divided by 3 in up-down (vertical direction) (step S1207), or the processing target block is divided by 3 in left-middle-right (horizontal direction) (step S1208). In the result of step S1207, the processing target block is divided into upper, middle and lower (vertical direction) 3 portions as shown in 603 of fig. 6C, and in the result of step S1208, the processing target block is divided into left, middle and right (horizontal direction) 3 portions as shown in 605 of fig. 6E.

After any one of step S1204, step S1205, step S1207, and step S1208 is executed, each block obtained by dividing the processing target block is scanned in the order from left to right and from top to bottom (step S1209). The numbers 0 to 2 from 602 to 605 in fig. 6B to 6E indicate the order of processing. For each of the divided blocks, the 2-3 division processing of fig. 8 is recursively executed (step S1210).

The recursive block division described here may limit whether or not division is necessary depending on the number of divisions, the size of the block to be processed, or the like. The restriction of whether or not the information needs to be divided may be realized by a configuration in which the information is not transferred by being agreed in advance between the encoding device and the decoding device, or may be realized by a configuration in which the encoding device decides whether or not the information needs to be divided and records the information in a bit string to transfer the information to the decoding device.

When a block is divided, the block before division is referred to as a parent block, and each block after division is referred to as a child block.

Next, an operation of the block dividing unit 202 in the image decoding apparatus 200 will be described. The block dividing unit 202 divides the tree blocks in the same processing steps as the block dividing unit 101 of the image encoding apparatus 100. However, the block dividing unit 101 of the image encoding apparatus 100 is different in that an optimization method such as estimation of an optimal shape based on image recognition or distortion optimization is applied to determine an optimal block divided shape, whereas the block dividing unit 202 of the image decoding apparatus 200 decodes block divided information recorded in a bit string to determine a block divided shape.

Fig. 9 shows a syntax (syntax rule of a bit string) related to the block division of the first embodiment. coding_quadtree () represents the syntax involved in the 4-partition processing of a block. The multi_type_tree () represents a syntax involved in 2-partition or 3-partition processing of a block. qt_split is a flag indicating whether or not to divide a block by 4. Let qt_split=1 in the case of 4 division of the block, and qt_split=0 in the case of no 4 division. In the case of 4 division (qt_split=1), 4 division processing (coding_quadtree (0), coding_quadtree (1), coding_quadtree (2), coding_quadtree (3) is recursively performed on each block after 4 division, and 0 to 3 of the argument corresponds to the number of 601 in fig. 6A. If 4 division is not performed (qt_split=0), the subsequent division is determined according to multi_type_tree (). mtt _split is a flag indicating whether or not division is further performed. In the case of division (mtt _split=1), mtt _split_vertical, which is a flag indicating whether division is performed in the vertical direction or in the horizontal direction, and mtt _split_bin, which is a flag for determining whether division is performed by 2 or 3, are transmitted. mtt _split_vertical=1 indicates division in the vertical direction, and mtt _split_vertical=0 indicates division in the horizontal direction. mtt _split_bin=1 indicates that 2 split is performed, mtt _split_bin=0 indicates that 3 split is performed. In the case of 2 division (mtt _split_bin=1), division processing (multi_type_tree (0), multi_type_tree (1)) is recursively performed on each block after 2 division, and 0 to 1 of the argument corresponds to the number of 602 or 604 in fig. 6B to 6D. In the case of 3 division (mtt _split_bin=0), division processing (multi_type_tree (0), multi_type_tree (1), multi_type_tree (2)) is recursively performed on each block after 3 division, and 0 to 2 correspond to numbers 603 in fig. 6B or 605 in fig. 6E. Hierarchical block partitioning is performed by recursively calling multi_type_tree until mtt _split=0.

< inter prediction >

The inter prediction method according to the embodiment is implemented in the inter prediction unit 102 of the image encoding apparatus of fig. 1 and the inter prediction unit 203 of the image decoding apparatus of fig. 2.

The inter prediction method according to the embodiment will be described with reference to the drawings. The inter prediction method is implemented in any one of encoding processing and decoding processing in units of encoded blocks.

< description of inter prediction unit 102 on encoding side >

Fig. 16 is a diagram showing a detailed configuration of the inter prediction unit 102 of the image encoding device of fig. 1. The normal prediction motion vector pattern deriving unit 301 derives a plurality of normal prediction motion vector candidates to select a prediction motion vector, and calculates a differential motion vector between the selected prediction motion vector and the detected motion vector. The detected inter prediction mode, the reference index, the motion vector, and the calculated differential motion vector are inter prediction information of a general prediction motion vector mode. The inter prediction information is supplied to the inter prediction mode determination unit 305. The detailed structure and processing of the normal prediction motion vector pattern deriving unit 301 will be described later.

The normal merge mode deriving unit 302 derives a plurality of normal merge candidates, and selects the normal merge candidates to obtain inter prediction information of the normal merge mode. The inter prediction information is supplied to the inter prediction mode determination unit 305. The detailed structure and processing of the normal merge mode derivation unit 302 will be described later.

The sub-block prediction motion vector pattern deriving unit 303 derives a plurality of sub-block prediction motion vector candidates to select a sub-block prediction motion vector, and calculates a differential motion vector between the selected sub-block prediction motion vector and the detected motion vector. The detected inter prediction mode, the reference index, the motion vector, and the calculated differential motion vector are inter prediction information of the sub-block prediction motion vector mode. The inter prediction information is supplied to the inter prediction mode determination unit 305.

The sub-block merge mode deriving unit 304 derives a plurality of sub-block merge candidates, and selects the sub-block merge candidates to obtain inter prediction information of the sub-block merge mode. The inter prediction information is supplied to the inter prediction mode determination unit 305.

The inter prediction mode determination unit 305 determines inter prediction information based on the inter prediction information supplied from the normal prediction motion vector mode derivation unit 301, the normal merge mode derivation unit 302, the sub-block prediction motion vector mode derivation unit 303, and the sub-block merge mode derivation unit 304. The inter prediction information corresponding to the determination result is supplied from the inter prediction mode determination unit 305 to the motion compensation prediction unit 306.

The motion compensation prediction unit 306 performs inter prediction on the reference image signal stored in the decoded image memory 104 based on the determined inter prediction information. The detailed structure and processing of the motion compensation prediction unit 306 will be described later.

< description of inter prediction unit 203 on decoding side >

Fig. 22 is a diagram showing a detailed configuration of the inter prediction unit 203 of the image decoding apparatus of fig. 2.

The normal prediction motion vector pattern deriving unit 401 derives a plurality of normal prediction motion vector candidates to select a prediction motion vector, and calculates an addition value of the selected prediction motion vector and the decoded differential motion vector as a motion vector. The decoded inter prediction mode, reference index, and motion vector are inter prediction information of a normal prediction motion vector mode. The inter prediction information is supplied to the motion compensation prediction unit 406 via the switch 408. The detailed structure and processing of the normal prediction motion vector pattern deriving unit 401 will be described later.

The normal merge mode deriving unit 402 derives a plurality of normal merge candidates to select the normal merge candidates, thereby obtaining inter prediction information of the normal merge mode. The inter prediction information is supplied to the motion compensation prediction unit 406 via the switch 408. The detailed structure and processing of the normal merge mode derivation unit 402 will be described later.

The sub-block prediction motion vector pattern deriving unit 403 derives a plurality of sub-block prediction motion vector candidates to select a sub-block prediction motion vector, and calculates an addition value of the selected sub-block prediction motion vector and the decoded differential motion vector as a motion vector. The decoded inter prediction mode, reference index, and motion vector become inter prediction information of the sub-block prediction motion vector mode. The inter prediction information is supplied to the motion compensation prediction unit 406 via the switch 408.

The sub-block merge mode deriving unit 404 derives a plurality of sub-block merge candidates to select the sub-block merge candidates, thereby obtaining inter prediction information of the sub-block merge mode. The inter prediction information is supplied to the motion compensation prediction unit 406 via the switch 408.

The motion compensation prediction unit 406 performs inter prediction on the reference image signal stored in the decoded image memory 208 based on the determined inter prediction information. The detailed configuration and processing of the motion compensation prediction unit 406 are the same as those of the motion compensation prediction unit 306 on the encoding side.

< usual prediction motion vector mode derivation unit (usual AMVP) >)

The normal predicted motion vector pattern deriving unit 301 of fig. 17 includes a spatial predicted motion vector candidate deriving unit 321, a temporal predicted motion vector candidate deriving unit 322, a history predicted motion vector candidate deriving unit 323, a predicted motion vector candidate supplementing unit 325, a normal motion vector detecting unit 326, a predicted motion vector candidate selecting unit 327, and a motion vector subtracting unit 328.

The normal predicted motion vector pattern deriving unit 401 of fig. 23 includes a spatial predicted motion vector candidate deriving unit 421, a temporal predicted motion vector candidate deriving unit 422, a history predicted motion vector candidate deriving unit 423, a predicted motion vector candidate supplementing unit 425, a predicted motion vector candidate selecting unit 426, and a motion vector adding unit 427.

The processing steps of the normal prediction motion vector pattern deriving unit 301 on the encoding side and the normal prediction motion vector pattern deriving unit 401 on the decoding side will be described with reference to flowcharts of fig. 19 and 25, respectively. Fig. 19 is a flowchart showing a procedure of the normal prediction motion vector pattern derivation process by the normal motion vector pattern derivation unit 301 on the encoding side, and fig. 25 is a flowchart showing a procedure of the normal prediction motion vector pattern derivation process by the normal motion vector pattern derivation unit 401 on the decoding side.

< usual prediction motion vector mode derivation unit (usual AMVP): description of coding side-

The procedure of the normal prediction motion vector pattern derivation processing on the encoding side will be described with reference to fig. 19. In the description of the processing steps of fig. 19, the term "normal" shown in fig. 19 may be omitted.

First, the normal motion vector detection unit 326 detects a normal motion vector for each inter prediction mode and reference index (step S100 in fig. 19).

Next, the spatial prediction motion vector candidate derivation section 321, the temporal prediction motion vector candidate derivation section 322, the history prediction motion vector candidate derivation section 323, the prediction motion vector candidate supplementing section 325, the prediction motion vector candidate selection section 327, and the motion vector subtraction section 328 calculate, for each L0 and L1, a differential motion vector of a motion vector used in inter prediction in the normal prediction motion vector mode (steps S101 to S106 in fig. 19). Specifically, when the prediction MODE PredMode of the processing object block is INTER prediction (mode_inter) and the INTER prediction MODE is L0 prediction (pred_l0), a prediction motion vector candidate list mvplstl 0 of L0 is calculated, a prediction motion vector mvpL0 is selected, and a differential motion vector mvdL0 of the motion vector mvL0 of L0 is calculated. When the inter prediction mode of the processing target block is L1 prediction (pred_l1), a prediction motion vector candidate list mvpListL1 of L1 is calculated, a prediction motion vector mvpL1 is selected, and a differential motion vector mvdL1 of the motion vector mvL1 of L1 is calculated. When the inter prediction mode of the processing target block is BI-prediction (pred_bi), L0 prediction and L1 prediction are performed simultaneously, a predicted motion vector candidate list mvplstl 0 of L0 is calculated, a predicted motion vector mvpL0 of L0 is selected, a differential motion vector mvdL0 of the motion vector mvL0 of L0 is calculated, a predicted motion vector candidate list mvplstl 1 of L1 is calculated, a predicted motion vector mvpL1 of L1 is calculated, and a differential motion vector mvdL1 of the motion vector mvL1 of L1 is calculated, respectively.

The differential motion vector calculation process is performed on each of L0 and L1, but both L0 and L1 are common processes. Therefore, in the following description, L0 and L1 are denoted as a common LX. In the process of calculating the differential motion vector of L0, X of LX is 0, and in the process of calculating the differential motion vector of L1, X of LX is 1. In addition, in the process of calculating the differential motion vector of LX, another list is denoted as LY without referring to LX but referring to information of another list.

When the motion vector mvLX of LX is used (yes in step S102 in fig. 19), candidates of the predicted motion vector of LX are calculated, and a predicted motion vector candidate list mvpllistlx of LX is constructed (step S103 in fig. 19). The spatial prediction motion vector candidate deriving unit 321, temporal prediction motion vector candidate deriving unit 322, history prediction motion vector candidate deriving unit 323, and prediction motion vector candidate supplementing unit 325 in the normal prediction motion vector pattern deriving unit 301 derive a plurality of candidates of the prediction motion vector, and a prediction motion vector candidate list mvpListLX is constructed. The detailed processing procedure of step S103 of fig. 19 will be described later using the flowchart of fig. 20.

Next, the predicted motion vector candidate selecting unit 327 selects the predicted motion vector mvpllx of LX from the predicted motion vector candidate list mvplstlx of LX (step S104 in fig. 19). Here, in the predicted motion vector candidate list mvpListLX, a certain element (i-th element from 0) is expressed as mvpListLX [ i ]. Each differential motion vector is calculated as a difference between the motion vector mvLX and the candidate mvplstlx [ i ] of each predicted motion vector stored in the predicted motion vector candidate list mvplstlx. For each element (predicted motion vector candidate) of the predicted motion vector candidate list mvpListLX, the encoding amount when encoding these differential motion vectors is calculated. Then, among the elements registered in the predicted motion vector candidate list mvplstlx, a candidate mvpllistlx [ i ] of the predicted motion vector whose code amount is the smallest for each candidate of the predicted motion vector is selected as the predicted motion vector mvpllx, and the index i is acquired. When there are a plurality of candidates of a predicted motion vector that becomes the smallest generated code amount in the predicted motion vector candidate list mvplstlx, a candidate mvpllitlx [ i ] of the predicted motion vector indicated by a number having a small index i in the predicted motion vector candidate list mvpllitlx is selected as the optimal predicted motion vector mvpllx, and the index i is acquired.

Next, the motion vector subtracting unit 328 subtracts the predicted motion vector mvpllx of the selected LX from the motion vector mvLX of LX, and calculates a differential motion vector mvdLX of LX, assuming that mvdlx=mvlx-mvpllx (step S105 in fig. 19).

< usual prediction motion vector mode derivation unit (usual AMVP): description of decoding side-

Next, a procedure of processing the normal prediction motion vector mode on the decoding side will be described with reference to fig. 25. On the decoding side, the spatial prediction motion vector candidate derivation section 421, the temporal prediction motion vector candidate derivation section 422, the history prediction motion vector candidate derivation section 423, and the prediction motion vector candidate supplementation section 425 calculate, for each L0 and L1, a motion vector used in inter prediction in the normal prediction motion vector mode (steps S201 to S206 in fig. 25). Specifically, when the prediction MODE PredMode of the processing object block is INTER prediction (mode_inter) and the INTER prediction MODE of the processing object block is L0 prediction (pred_l0), a prediction motion vector candidate list mvpListL0 of L0 is calculated, a prediction motion vector mvpL0 is selected, and a motion vector mvL0 of L0 is calculated. When the inter prediction mode of the processing target block is L1 prediction (pred_l1), a prediction motion vector candidate list mvpListL1 of L1 is calculated, a prediction motion vector mvpL1 is selected, and a motion vector mvL1 of L1 is calculated. When the inter prediction mode of the processing target block is BI-prediction (pred_bi), L0 prediction and L1 prediction are simultaneously performed, a predicted motion vector candidate list mvplstl 0 of L0 is calculated, a predicted motion vector mvpL0 of L0 is selected, a motion vector mvL0 of L0 is calculated, a predicted motion vector candidate list mvplstl 1 of L1 is calculated, a predicted motion vector mvpL1 of L1 is calculated, and a motion vector mvL1 of L1 is calculated, respectively.

Similarly to the encoding side, the decoding side performs motion vector calculation processing on L0 and L1, respectively, but both L0 and L1 are common processing. Therefore, in the following description, L0 and L1 are denoted as a common LX. LX denotes an inter prediction mode for inter prediction of a coded block of a processing object. In the process of calculating the motion vector of L0, X is 0, and in the process of calculating the motion vector of L1, X is 1. In addition, in the process of calculating the motion vector of LX, in the case where the same reference list as LX of the calculation target is not referred to, but information of another reference list is referred to, the other reference list is denoted as LY.

When the motion vector mvLX of LX is used (yes in step S202 of fig. 25), candidates of the predicted motion vector of LX are calculated, and a predicted motion vector candidate list mvpllistlx of LX is constructed (step S203 of fig. 25). The spatial prediction motion vector candidate deriving unit 421, the temporal prediction motion vector candidate deriving unit 422, the history prediction motion vector candidate deriving unit 423, and the prediction motion vector candidate supplementing unit 425 in the normal prediction motion vector pattern deriving unit 401 calculate the candidates of the plurality of prediction motion vectors, and a prediction motion vector candidate list mvpListLX is constructed. The detailed processing procedure of step S203 of fig. 25 will be described later using the flowchart of fig. 20.

Next, the predicted motion vector candidate selecting unit 426 extracts, from the predicted motion vector candidate list mvplstlx, a candidate mvplstlx [ mvpldxlx ] of the predicted motion vector corresponding to the index mvpldxlx of the predicted motion vector decoded by the bit string decoding unit 201, as the selected predicted motion vector mvpllx (step S204 in fig. 25).

Next, the motion vector adder 427 adds the differential motion vector mvdLX of LX and the predicted motion vector mvpllx of LX supplied from the bit string decoder 201, and calculates the motion vector mvLX of LX assuming that mvlx=mvpllx+mvdlx (step S205 in fig. 25).

< usual prediction motion vector mode derivation unit (usual AMVP): prediction method of motion vector >

Fig. 20 is a flowchart showing the processing procedure of the normal prediction motion vector pattern derivation process having a common function in the normal prediction motion vector pattern derivation unit 301 of the image encoding apparatus and the normal prediction motion vector pattern derivation unit 401 of the image decoding apparatus according to the embodiment of the present invention.

The normal prediction motion vector pattern deriving unit 301 and the normal prediction motion vector pattern deriving unit 401 have a prediction motion vector candidate list mvpllistlx. The predicted motion vector candidate list mvpListLX forms a list structure, and a storage area is provided for storing, as elements, a predicted motion vector index indicating a position within the predicted motion vector candidate list and a predicted motion vector candidate corresponding to the index. The number of the predicted motion vector index starts from 0, and the predicted motion vector candidate is held in the storage area of the predicted motion vector candidate list mvplstlx. In the present embodiment, it is assumed that the predicted motion vector candidate list mvpListLX is capable of registering at least two predicted motion vector candidates (inter prediction information). Further, a variable numCurrMvpCand indicating the number of predicted motion vector candidates registered in the predicted motion vector candidate list mvpListLX is set to 0.

The spatial prediction motion vector

candidate derivation sections

321 and 421 derive candidates of prediction motion vectors from blocks adjacent to the left. In this process, the prediction motion vector mvLXA is derived by referring to inter prediction information of a block adjacent to the left (A0 or A1 in fig. 11), that is, a flag indicating whether or not the prediction motion vector candidate can be used, a motion vector, a reference index, and the like, and the derived mvLXA is added to the prediction motion vector candidate list mvpListLX (step S301 in fig. 20). In addition, X is 0 in L0 prediction and 1 in L1 prediction (the same applies hereinafter). Next, the spatial prediction motion vector

candidate derivation sections

321 and 421 derive candidates of the prediction motion vector from the block adjacent to the upper side. In this process, the prediction motion vector mvLXB is derived by referring to inter prediction information of a block (B0, B1, or B2 in fig. 11) adjacent to the upper side, that is, a flag indicating whether or not the prediction motion vector candidate can be used, a motion vector, a reference index, or the like, and if mvLXA and mvLXB derived respectively are not equal, mvLXB is added to the prediction motion vector candidate list mvplstlx (step S302 in fig. 20). The processing of steps S301 and S302 of fig. 20 is common except that the positions and the number of the adjacent blocks to be referred to are different, and a flag availableglaglxn indicating whether the predicted motion vector candidates of the encoded block can be utilized, and a motion vector mvLXN, a reference index refIdxN (N indicates a or B, the same applies hereinafter) are derived.

Next, the history prediction motion vector

candidate derivation units

323 and 423 add the history prediction motion vector candidates registered in the history prediction motion vector candidate list hmvpcndlist to the prediction motion vector candidate list mvplstlx (step S303 in fig. 20). Details of the registration processing step in this step S303 will be described later using the flowchart of fig. 29.

Next, the temporal prediction motion vector

candidate derivation units

322 and 422 derive candidates of the prediction motion vector from the block in the image at a time different from the current processing target image. In this process, a flag availability flag LXCol, a motion vector mvLXCol, a reference index refIdxCol, a reference list listCol, indicating whether or not prediction motion vector candidates of coded blocks of pictures of different times are available, are derived, and mvLXCol is added to the prediction motion vector candidate list mvpllistlx (step S304 of fig. 20).

Further, it is assumed that the processing of the temporal prediction motion vector

candidate derivation sections

322 and 422 in units of sequences (SPS), pictures (PPS), or slices may be omitted.

Next, the predicted motion vector

candidate adding units

325 and 425 add predicted motion vector candidates of a predetermined value such as (0, 0) before satisfying the predicted motion vector candidate list mvplstlx (S305 in fig. 20).

< common merge mode derivation unit (common merge) >)

The normal merge mode deriving unit 302 in fig. 18 includes a spatial merge candidate deriving unit 341, a temporal merge candidate deriving unit 342, an average merge candidate deriving unit 344, a history merge candidate deriving unit 345, a merge candidate supplementing unit 346, and a merge candidate selecting unit 347.

The normal merge mode deriving unit 402 in fig. 24 includes a spatial merge candidate deriving unit 441, a temporal merge candidate deriving unit 442, an average merge candidate deriving unit 444, a history merge candidate deriving unit 445, a merge candidate supplementing unit 446, and a merge candidate selecting unit 447.

Fig. 21 is a flowchart illustrating steps of a normal merge mode derivation process having a function common to the normal merge mode derivation unit 302 of the image encoding apparatus and the normal merge mode derivation unit 402 of the image decoding apparatus according to the embodiment of the present invention.

The respective processes are described in order below. In the following description, the case where the slice type slice_type is a B slice will be described unless otherwise specified, but the present invention is applicable to a P slice. However, in the case where the slice type slice_type is a P slice, since only L0 prediction (pred_l0) exists as an inter prediction mode, L1 prediction (pred_l1) and BI prediction (pred_bi) do not exist. Therefore, the process around L1 can be omitted.

The normal merge mode derivation unit 302 and the normal merge mode derivation unit 402 have a merge candidate list mergeCandList. The merge candidate list mergeCandList constitutes a list structure, and is provided with a merge index indicating a position inside the merge candidate list, and a storage area for storing the merge candidate corresponding to the index as an element. The number of the merge index starts from 0, and the merge candidates are saved in the memory area of the merge candidate list mergeCandList. In the subsequent processing, it is assumed that the merge candidates of the merge index i registered in the merge candidate list mergeCandList are represented by mergeCandList [ i ]. In the present embodiment, it is assumed that the merge candidate list mergeCandList is capable of registering at least six merge candidates (inter prediction information). Then, a variable numCurrMergeCand indicating the number of merge candidates registered in the merge candidate list mergeCandList is set to 0.

The spatial merge candidate derivation unit 341 and the spatial merge candidate derivation unit 441 derive spatial merge candidates from blocks (B1, A1, B0, A0, B2 in fig. 11) adjacent to the left and upper sides of the block to be processed in the order of B1, A1, B0, A0, B2 based on the encoding information stored in the encoding information storage memory 111 of the image encoding apparatus or the encoding information storage memory 205 of the image decoding apparatus, and register the derived spatial merge candidates in the merge candidate list MergeCandList (step S401 in fig. 21). Here, N representing any one of the spatial merging candidates B1, A1, B0, A0, B2, or the temporal merging candidate Col is defined. A flag availableglagn indicating whether inter prediction information of the block N can be used as a spatial merge candidate, a reference index refIdxL0N of L0 and a reference index refIdxL1N of L1 of the spatial merge candidate N, an L0 prediction flag predflag L0N indicating whether L0 prediction is performed, and a motion vector mvL1N of motion vector mvL0N, L1 of an L1 prediction flag predflag L1N, L0 indicating whether L1 prediction is performed are derived. However, in the present embodiment, since the merge candidates are derived without referring to the inter prediction information of the block included in the encoding block to be processed, the spatial merge candidates using the inter prediction information of the block included in the encoding block to be processed are not derived. (B1, A1, B0, A0, B2 in FIG. 11)

Next, the temporal merging candidate derivation unit 342 and the temporal merging candidate derivation unit 442 derive temporal merging candidates of pictures from different times, and the derived temporal merging candidates are registered in the merging candidate list mergeCandList (step S402 in fig. 21). A flag availablefagcl indicating whether or not the temporal merge candidate can be used, an L0 prediction flag predflag L0Col indicating whether or not the L0 prediction of the temporal merge candidate is performed, an L1 prediction flag predflag L1Col indicating whether or not the L1 prediction is performed, and motion vectors mvL0Col and mvL1Col of L0 are derived.

Further, the processing by the temporal merging candidate derivation unit 342 and the temporal merging candidate derivation unit 442 in units of sequences (SPS), pictures (PPS), or slices can be omitted.

Next, the history merge candidate deriving unit 345 and the history merge candidate deriving unit 445 register the history prediction motion vector candidates registered in the history prediction motion vector candidate list hmvpc and list in the merge candidate list mergeCandList (step S403 in fig. 21).

Further, in the case where the number of merging candidates numcurrmergecandid registered in the merging candidate list mergeCandList is smaller than the maximum number of merging candidates MaxNumMergeCand, the number of merging candidates numCurrMergeCand registered in the merging candidate list mergeCandList derives a history merging candidate with the maximum number of merging candidates MaxNumMergeCand as an upper limit, and registers it in the merging candidate list mergeCandList.

Next, the average merge candidate derivation unit 344 and the average merge candidate derivation unit 444 derive an average merge candidate from the merge candidate list mergeCandList, and add the derived average merge candidate to the merge candidate list mergeCandList (step S404 in fig. 21).

Further, in the case where the number of merging candidates numcurrmergecandid registered in the merging candidate list mergeCandList is smaller than the maximum number of merging candidates MaxNumMergeCand, the number of merging candidates numCurrMergeCand registered in the merging candidate list mergeCandList is up-bound with the maximum number of merging candidates MaxNumMergeCand, an average merging candidate is derived, and registered in the merging candidate list mergeCandList.

Here, the average merge candidate is a new merge candidate having a motion vector obtained by averaging the motion vectors possessed by the first and second merge candidates registered in the merge candidate list mergeCandList, for each L0 prediction and L1 prediction.

Next, in the merge candidate supplementing unit 346 and the merge candidate supplementing unit 446, when the number of merge candidates numCurrMergeCand registered in the merge candidate list mergeCandList is smaller than the maximum number of merge candidates MaxNumMergeCand, the number of merge candidates numCurrMergeCand registered in the merge candidate list mergeCandList derives an added merge candidate with the maximum number of merge candidates MaxNumMergeCand as an upper limit, and registers in the merge candidate list mergeCandList (step S405 in fig. 21). With the maximum merge candidate number MaxNumMergeCand as an upper limit, a merge candidate whose prediction mode with a motion vector having a value of (0, 0) is L0 prediction (pred_l0) is added to the P slices. In the B slice, a prediction mode in which the motion vector has a value of (0, 0) is added as a merging candidate of BI-prediction (pred_bi). The reference index at the time of adding the merge candidate is different from the reference index that has been added.

Next, the merge candidate selecting unit 347 and the merge candidate selecting unit 447 select a merge candidate from among the merge candidates registered in the merge candidate list mergeCandList. The encoding-side merge candidate selecting unit 347 calculates the code amount and the distortion amount to select a merge candidate, and supplies the inter prediction information indicating the merge index and the merge candidate of the selected merge candidate to the motion compensation predicting unit 306 via the inter prediction mode determining unit 305. On the other hand, the decoding-side merge candidate selecting unit 447 selects a merge candidate based on the decoded merge index, and supplies the selected merge candidate to the motion compensation predicting unit 406.

< update history prediction motion vector candidate List >

Next, the method for initializing and updating the history prediction motion vector candidate list hmvpc and list provided in the encoding-side encoding information storage memory 111 and decoding-side encoding information storage memory 205 will be described in detail. Fig. 26 is a flowchart for explaining the step of the history prediction motion vector candidate list initialization/update processing.

In the present embodiment, it is assumed that updating of the history prediction motion vector candidate list hmvpc and list is performed in the encoded information storage memory 111 and the encoded information storage memory 205. The inter-frame prediction unit 102 and the inter-frame prediction unit 203 may be provided with a history prediction motion vector candidate list update unit, and update of the history prediction motion vector candidate list hmvpc and list may be performed.

The initial setting of the history prediction motion vector candidate list hmvpc and list is performed at the beginning of the slice, and when the normal prediction motion vector mode or the normal merge mode is selected by the prediction method determining unit 105 on the encoding side, the history prediction motion vector candidate list hmvpc and list is updated, and when the prediction information decoded by the bit string decoding unit 201 is the normal prediction motion vector mode or the normal merge mode on the decoding side, the history prediction motion vector candidate list hmvpc and list is updated.

Inter-prediction information used when inter-prediction is performed in the normal prediction motion vector mode or the normal merge mode is registered in the history prediction motion vector candidate list hmvpc and list as an inter-prediction information candidate hmvpc and. The inter prediction information candidate hmvpc and includes reference indexes refIdxL0 and refIdxL1 of L0, an L0 prediction flag predflag L0 indicating whether L0 prediction is performed, and motion vectors mvL0 and mvL1 of L1 prediction flags predflag L1 and L0 indicating whether L1 prediction is performed.

When inter-prediction information having the same value as that of the inter-prediction information candidate hmvpc and exists in an element (i.e., inter-prediction information) registered in the history prediction motion vector candidate list hmvpc and list provided in the encoding-side encoding information storage memory 111 and the decoding-side encoding information storage memory 205, the element is deleted from the history prediction motion vector candidate list hmvpc and list. On the other hand, in the case where there is no inter-prediction information of the same value as the inter-prediction information candidate hmvpand, the element at the beginning of the history prediction motion vector candidate list hmvpc and list is deleted, and the inter-prediction information candidate hmvpc and is added to the last of the history prediction motion vector candidate list hmvpc and list.

The number of elements of the history prediction motion vector candidate list hmvpc and list provided in the encoding-side encoding information storage memory 111 and decoding-side encoding information storage memory 205 of the present invention is set to 6.

First, the history prediction motion vector candidate list hmvpc and list is initialized in units of slices (step S2101 of fig. 26). At the beginning of the slice, all elements of the history prediction motion vector candidate list hmvpc and list are set to be empty, and the value of the history prediction motion vector candidate number (current candidate number) numhmvpc and registered in the history prediction motion vector candidate list hmvpc and list is set to 0.

It is assumed that the initialization of the history prediction motion vector candidate list hmvpc and list is performed in a slice unit (the first encoded block of the slice), but it may be performed in a picture unit, a rectangle (tile) unit, and a tree block unit.

Next, the following update processing of the history prediction motion vector candidate list hmvpc and list is repeated for each encoded block within the slice (steps S2102 to S2111 in fig. 26).

First, initial setting is performed in units of coded blocks. A FALSE value is set for the flag identifying candexist indicating whether the same candidate exists, and "0" is set for the deletion object index removeIdx indicating the candidate of the deletion object (step S2103 in fig. 26).

It is determined whether or not there is an inter prediction information candidate hmvpand to be registered (step S2104 in fig. 26). When the encoding-side prediction method determining unit 105 determines that the normal prediction motion vector mode or the normal merge mode is used, or when the decoding-side bit string decoding unit 201 decodes the normal prediction motion vector mode or the normal merge mode, the inter-frame prediction information is set as the inter-frame prediction information candidate hmvpc and to be registered. When the prediction method determination unit 105 on the encoding side determines the intra prediction mode, the sub-block prediction motion vector mode, or the sub-block merge mode, or when the bit string decoding unit 201 on the decoding side decodes the intra prediction mode, the sub-block prediction motion vector mode, or the sub-block merge mode, the history prediction motion vector candidate list hmvpc and list is not updated, and the inter prediction information candidate hmvpc and to be registered does not exist. If there is no candidate hmvpand for the inter prediction information to be registered, steps S2105 to S2106 are skipped (step S2104 in fig. 26: no). When there is an inter prediction information candidate hmvpand to be registered, the processing after step S2105 is executed (yes in step S2104 in fig. 26).

Next, it is determined whether or not there is an element (inter prediction information) of the same value as the inter prediction information candidate hmvpc and to be registered, that is, whether or not there is the same element, among the elements of the history prediction motion vector candidate list hmvpc and list (step S2105 in fig. 26). Fig. 27 is a flowchart of the same element check processing procedure. When the value of the history prediction motion vector candidate number numhmvpand is 0 (no in step S2121 in fig. 27), the history prediction motion vector candidate list hmvpc and list is empty, and the same candidates are not present, so steps S2122 to S2125 in fig. 27 are skipped, and the same element check processing step is terminated. When the number of history prediction motion vector candidates numhmvpand is greater than 0 (yes in step S2121 of fig. 27), the history prediction motion vector index hMvpIdx is from 0 to numhmvpc and-1, and the process in step S2123 is repeated (steps S2122 to S2125 of fig. 27). First, whether or not hMvpIdx elements hmvpedx of the history prediction motion vector candidate list from 0 are identical to the inter prediction information candidates hmvpcnd is compared (step S2123 of fig. 27). If the same is performed (yes in step S2123 in fig. 27), the flag identifying candexist indicating whether the same candidate exists is set to TRUE, the deletion object index removeIdx indicating the position of the element to be deleted is set to the current value of the historical prediction motion vector index hMvpIdx, and the same element confirmation process is terminated. If the motion vector index is not identical (step S2123: NO in FIG. 27), hMvpIDx is incremented by 1, and if the historical prediction motion vector index hMvpIDx is not greater than NumHmvpCand-1, the processing proceeds to step S2123 and thereafter.

Returning again to the flowchart of fig. 26, the shifting and adding process of the elements of the history prediction motion vector candidate list hmvpc and list is performed (step S2106 of fig. 26). Fig. 28 is a flowchart of an element shift/addition processing procedure of the history prediction motion vector candidate list hmvpc and list of step S2106 of fig. 26. First, it is determined whether to add a new element after removing an element stored in the history prediction motion vector candidate list hmvpc and list or to add a new element without removing an element. Specifically, whether the flag identical candexist indicating the presence or absence of the same candidate is TRUE or numhmvvpc and is 6 is compared (step S2141 of fig. 28). When either a condition indicating whether or not the same candidate exists is satisfied (yes in step S2141 of fig. 28) that the flag identifying candexist is TRUE or the current candidate number numhmvpc and is 6 is satisfied, the element stored in the history prediction motion vector candidate list hmvpc and list is removed, and then a new element is added. The initial value of index i is set to the value of removeidx+1. The element shift process of step S2143 is repeated from the initial value to numhmvpand. (steps S2142 to S2144 of fig. 28). By copying the element of HmvpCand List [ i ] to HmvpCand List [ i-1], the element is shifted forward (step S2143 of FIG. 28), increasing i by 1 (steps S2142 to S2144 of FIG. 28). Next, the inter prediction information candidate hmvpc and is added to the (numhmvpc and-1) th hmvpc and list [ numhmvpc and-1] counted from 0 corresponding to the last of the history prediction motion vector candidate list (step S2145 of fig. 28), and the element shift/addition processing of the history prediction motion vector candidate list hmvpc and list is ended. On the other hand, when neither of the conditions indicating whether or not the identical candidate exists as TRUE or not and NumHmvpCand is 6 is satisfied (no in step S2141 of fig. 28), the inter-frame prediction information candidate hmvpc and is added to the last of the history prediction motion vector candidate list without removing the element stored in the history prediction motion vector candidate list hmvpc and list (step S2146 of fig. 28). Here, the last of the history prediction motion vector candidate list is the NumHmvpCand HmvpCand List [ NumHmvpCand ]. Further, the numhmvpc and is incremented by 1, and the shifting and adding process of the elements of the history prediction motion vector candidate list hmvpc and list is ended.

Fig. 31 is a diagram for explaining an example of the update processing of the history prediction motion vector candidate list. When a new element is added to the history motion vector candidate list hmvpc and list in which six elements (inter-frame prediction information) have been registered, the new element is sequentially compared with the new inter-frame prediction information from the element before the history motion vector candidate list hmvpc and list (fig. 31A), and if the new element is the same value as the element HMVP2 of the history motion vector candidate list hmvpc and list from the third element from the beginning, the element HMVP2 is deleted from the history motion vector candidate list hmvpc and list, the elements HMVP3 to HMVP5 after the new element are shifted (copied) one by one, and the new element is added to the last element of the history motion vector candidate list hmvpc and list (fig. 31B), thereby completing the update of the history motion vector candidate list hmvpc and list.

< historic prediction motion vector candidate derivation processing >

Next, a method of deriving a history prediction motion vector candidate from the history prediction motion vector candidate list hmvpc and list, which is a processing step of step S304 of fig. 20, is described in detail, and the processing step of step S304 of fig. 20 is a processing common to the history prediction motion vector candidate deriving unit 323 of the encoding-side normal prediction motion vector pattern deriving unit 301 and the history prediction motion vector candidate deriving unit 423 of the decoding-side normal prediction motion vector pattern deriving unit 401. Fig. 29 is a flowchart illustrating the steps of the history prediction motion vector candidate derivation processing.

When the current predicted motion vector candidate number numCurrMvpCand is equal to or greater than the maximum element number (here, 2) of the predicted motion vector candidate list mvplstlx or the value of the history predicted motion vector candidate number numhmvppcand is 0 (step S2201: no in fig. 29), the processing from steps S2202 to S2209 in fig. 29 is omitted, and the history predicted motion vector candidate derivation processing step is ended. When the current predicted motion vector candidate number numCurrMvpCand is smaller than the maximum element number 2 of the predicted motion vector candidate list mvplstlx and the value of the history predicted motion vector candidate number numhmvpcad is larger than 0 (step S2201: yes of fig. 29), the processing of steps S2202 to S2209 of fig. 29 is performed.

Next, the processing of steps S2203 to S2208 of fig. 29 is repeated until the index i is from 1 to 4 and any one of the history prediction motion vector candidates numcheckedfmvvcnd is smaller (steps S2202 to S2209 of fig. 29). When the current predicted motion vector candidate number numCurrMvpCand is 2 or more, which is the maximum element number of the predicted motion vector candidate list mvplstlx (no in step S2203 in fig. 29), the processing in steps S2204 to S2209 in fig. 29 is omitted, and the history predicted motion vector candidate derivation processing step is terminated. In the case where the current predicted motion vector candidate number numCurrMvpCand is smaller than the maximum element number 2 of the predicted motion vector candidate list mvplstlx (step S2203: yes in fig. 29), the processing after step S2204 in fig. 29 is performed.

Next, the processing from steps S2205 to S2207 is performed for Y being 0 and 1 (L0 and L1), respectively (steps S2204 to S2208 of fig. 29). When the current predicted motion vector candidate number numCurrMvpCand is 2 or more, which is the maximum element number of the predicted motion vector candidate list mvplstlx (no in step S2205 in fig. 29), the processing in steps S2206 to S2209 in fig. 29 is omitted, and the history predicted motion vector candidate derivation processing step is terminated. In the case where the current predicted motion vector candidate number numCurrMvpCand is smaller than the maximum pixel number 2 of the predicted motion vector candidate list mvplstlx (step S2205: yes in fig. 29), the processing after step S2206 in fig. 29 is performed.

Next, in the history prediction motion vector candidate list hmvpc and list, if the element of the reference index refIdxLX which is the same as the reference index refIdxLX of the motion vector to be encoded and is an element which is different from any element of the prediction motion vector list mvpvlistlx (yes in step S2206 of fig. 29), the motion vector of LY of the history prediction motion vector candidate hmvpc and list [ numhmhvpcand ] is added to the element mvcurrmvpcand [ numCurrMvpCand ] of the number numvmvpcand of the number numvpcand of the current prediction motion vector candidates by 1 (step S2207 of fig. 29). When there is no element of the same reference index refIdxLX as the reference index refIdxLX of the encoding/decoding object motion vector in the history prediction motion vector candidate list hmvpc and there is no element different from any element of the prediction motion vector list mvplstlx (no in step S2206 of fig. 29), the addition processing of step S2207 is skipped.

The above processing of steps S2205 to S2207 of fig. 29 is performed in both L0 and L1 (steps S2204 to S2208 of fig. 29). When the index i is increased by 1 and the index i is equal to or smaller than 4 and the number of candidates for the historical motion vector numhmvpc is smaller, the processing in step S2203 and subsequent steps (steps S2202 to S2209 in fig. 29) is performed again.

< history merge candidate derivation processing >

Next, a method of deriving the history merge candidates from the history merge candidate list hmvpc and list, which is a processing step of step S404 of fig. 21, is described in detail, wherein the processing step of step S404 of fig. 21 is a processing common to the history merge candidate derivation unit 345 of the normal merge mode derivation unit 302 on the encoding side and the history merge candidate derivation unit 445 of the normal merge mode derivation unit 402 on the decoding side. Fig. 30 is a flowchart illustrating the history merge candidate derivation processing step.

First, an initialization process is performed (step S2301 of fig. 30). A FALSE value is set for each element from 0 to (numCurrMergeCand-1) of ispumed [ i ], and the number of elements numCurrMergeCand registered in the current merge candidate list is set for the variable numorigmmergecand.

Next, the initial value of the index hMvpIdx is set to 1, and the addition processing from step S2303 to step S2310 in fig. 30 (steps S2302 to S2311 in fig. 30) is repeated from this initial value to numhmvpcnd. If the number of elements numCurrMergeCand registered in the current merge candidate list is not (maximum merge candidate number MaxNumMergeCand-1) or less, the history merge candidate derivation process ends since the merge candidate is added to all elements in the merge candidate list (step S2303: no in fig. 30). If the number of elements numCurrMergeCand registered in the current merge candidate list is (maximum merge candidate number MaxNumMergeCand-1) or less, the processing of step S2304 and thereafter is performed. A FALSE value is set for the sameMotion (step S2304 of fig. 30). Next, the initial value of the index i is set to 0, and the processing of steps S2306 and S2307 in fig. 30 is performed from the initial value to numorigin cand-1 (S2305 to S2308 in fig. 30). Comparing whether or not the element hmvpc and hdx of the (numhmvpc and-hMvpIdx) th from 0 of the history motion vector prediction candidate list is the same as the element mergetcandlist [ i ] of the i-th from 0 of the merge candidate list (step S2306 of fig. 30).

The same value of the merge candidate means that the merge candidate is the same value when the values of all the constituent elements (inter prediction mode, reference index, and motion vector) that the merge candidate has are the same. When the merge candidates are the same value and ispranded [ i ] is FALSE (yes in step S2306 in fig. 30), both the sameMotion and ispranded [ i ] are set to TRUE (step S2307 in fig. 30). If the values are not the same (no in step S2306 in fig. 30), the process in step S2307 is skipped. After the repetition processing of steps S2305 to S2308 of fig. 30 is completed, it is compared whether sameMotio n is FALSE (step S2309 of fig. 30), and in the case where sameMotion is FALSE (step S2309 of fig. 30), that is, since the (NumHvpCand-hMvpIdx) th element hmvpcndlist [ NumHvpCand-hMvpIdx ] from 0 of the history prediction motion vector candidate list is not present in the mergeCandList, the (NumHvpCand-hmppidx) th element hmvpcndlist [ numhvand-hMvpIdx ] from 0 of the history prediction motion vector candidate list is added to the (numcurrmergcand) th element hmvpcnd [ NumHvpCand 2311 of the merge candidate list (step S30). The index hMvpIdx is incremented by 1 (step S2302 in fig. 30), and the processing repeated in steps S2302 to S2311 in fig. 30 is performed.

After confirmation of all elements in the history prediction motion vector candidate list or addition of merge candidates to all elements in the merge candidate list is completed, the history merge candidate derivation process is completed.

< motion Compensation prediction Process >

The motion compensation prediction unit 306 acquires the position and size of a block to be currently predicted during encoding. The motion compensation prediction unit 306 obtains inter prediction information from the inter prediction mode determination unit 305. The reference index and the motion vector are derived from the acquired inter-frame prediction information, and a prediction signal is generated after acquiring an image signal in which a reference image specified by the reference index in the decoded image memory 104 is shifted from the same position as the image signal of the prediction block by the amount of the motion vector.

When the inter prediction mode in the inter prediction is prediction from a single reference picture such as L0 prediction or L1 prediction, a prediction signal obtained from one reference picture is referred to as a motion compensation prediction signal, and when the inter prediction mode is prediction from two reference pictures such as BI prediction, a signal obtained by weighted-averaging the prediction signals obtained from the two reference pictures is referred to as a motion compensation prediction signal, and the motion compensation prediction signal is supplied to the prediction method determination unit 105. Here, the ratio of the weighted averages of the bi-predictions is set to 1:1, but other ratios may be used for the weighted averages. For example, the weighted ratio may be set to be larger as the picture interval between the picture to be predicted and the reference picture is closer. The weighting ratio may be calculated using a table of combinations of picture intervals and weighting ratios.

The motion compensation prediction unit 406 has the same function as the motion compensation prediction unit 306 on the encoding side. The motion compensation prediction unit 406 acquires inter prediction information from the normal prediction motion vector mode derivation unit 401, the normal merge mode derivation unit 402, the sub-block prediction motion vector mode derivation unit 403, and the sub-block merge mode derivation unit 404 through the switch 408. The motion compensation prediction unit 406 supplies the obtained motion compensation prediction signal to the decoded image signal superimposing unit 207.

< concerning inter prediction mode >

The process of performing prediction from a single reference picture is defined as single prediction. In the case of single prediction, prediction using either one of the two reference pictures registered in the reference list L0 or L1, such as L0 prediction or L1 prediction, is performed.

Fig. 32 shows a case where the reference picture (RefL 0 Pic) of L0 in single prediction is at a time before the processing target picture (CurPic). Fig. 33 shows a case where the reference picture of L0 prediction in single prediction is at a time point after the processing target picture. Similarly, the single prediction can be performed by replacing the reference picture predicted by L0 in fig. 32 and 33 with the reference picture predicted by L1 (RefL 1 Pic).

The process of performing prediction from two reference pictures is defined as BI-prediction, and in the case of BI-prediction, BI-prediction is expressed by using both L0 prediction and L1 prediction. Fig. 34 shows a case where the L0 predicted reference picture in bi-prediction is at a time before the processing target picture and the L1 predicted reference picture is at a time after the processing target picture. Fig. 35 shows a case where the L0 predicted reference picture and the L1 predicted reference picture in bi-prediction are at times before the processing target picture. Fig. 36 shows a case where the L0 predicted reference picture and the L1 predicted reference picture in bi-prediction are at times after the processing target picture.

In this way, the relationship between the predicted type of L0/L1 and time can be used when L0 is not limited to the past direction and L1 is not limited to the future direction. In addition, in the case of bi-prediction, each of L0 prediction and L1 prediction may be performed using the same reference picture. Further, whether to perform motion compensation prediction in single prediction or double prediction is determined based on information (e.g., flag) indicating whether to use L0 prediction and whether to use L1 prediction.

< reference index >

In the embodiment of the present invention, in order to improve the accuracy of motion compensation prediction, it is possible to select an optimal reference picture from among a plurality of reference pictures in motion compensation prediction. Accordingly, a reference picture utilized in motion compensated prediction is used as a reference index, and the reference index is encoded into a bitstream together with a differential motion vector.

< motion Compensation Process based on usual prediction motion vector mode >

As also shown in the inter prediction unit 102 on the encoding side of fig. 16, when the inter prediction information based on the normal prediction motion vector pattern deriving unit 301 is selected in the inter prediction mode determining unit 305, the motion compensation predicting unit 306 obtains the inter prediction information from the inter prediction mode determining unit 305, derives the inter prediction mode, the reference index, and the motion vector of the block to be currently processed, and generates a motion compensation prediction signal. The generated motion compensation prediction signal is supplied to the prediction method determination unit 105.

Similarly, as also shown in the inter prediction unit 203 on the decoding side in fig. 22, when the switch 408 is connected to the normal prediction motion vector pattern deriving unit 401 during decoding, the motion compensation predicting unit 406 obtains inter prediction information based on the normal prediction motion vector pattern deriving unit 401, derives an inter prediction pattern, a reference index, and a motion vector of a block to be currently processed, and generates a motion compensation prediction signal. The generated motion-compensated prediction signal is supplied to the decoded image signal superimposing section 207.

< motion Compensation Process based on usual merge mode >

As also shown in the inter prediction unit 102 on the encoding side of fig. 16, when the inter prediction information based on the normal merge mode derivation unit 302 is selected in the inter prediction mode determination unit 305, the motion compensation prediction unit 306 acquires the inter prediction information from the inter prediction mode determination unit 305, derives the inter prediction mode, the reference index, and the motion vector of the block to be currently processed, and generates a motion compensation prediction signal. The generated motion compensation prediction signal is supplied to the prediction method determination unit 105.

Similarly, as also shown in the inter prediction unit 203 on the decoding side in fig. 22, when the switch 408 is connected to the normal merge mode derivation unit 402 during decoding, the motion compensation prediction unit 406 acquires inter prediction information based on the normal merge mode derivation unit 402, derives an inter prediction mode, a reference index, and a motion vector of a block to be currently processed, and generates a motion compensation prediction signal. The generated motion-compensated prediction signal is supplied to the decoded image signal superimposing section 207.

< motion Compensation Process based on sub-Block prediction motion vector mode >

As also shown in the inter prediction unit 102 on the encoding side of fig. 16, when the inter prediction information based on the sub-block prediction motion vector pattern deriving unit 303 is selected in the inter prediction mode determining unit 305, the motion compensation predicting unit 306 obtains the inter prediction information from the inter prediction mode determining unit 305, derives the inter prediction mode, the reference index, and the motion vector of the block to be currently processed, and generates a motion compensation prediction signal. The generated motion compensation prediction signal is supplied to the prediction method determination unit 105.

Similarly, as also shown in the inter prediction unit 203 on the decoding side in fig. 22, when the switch 408 is connected to the sub-block prediction motion vector pattern deriving unit 403 during decoding, the motion compensation predicting unit 406 obtains inter prediction information based on the sub-block prediction motion vector pattern deriving unit 403, derives an inter prediction pattern, a reference index, and a motion vector of a block to be currently processed, and generates a motion compensation prediction signal. The generated motion-compensated prediction signal is supplied to the decoded image signal superimposing section 207.

< motion Compensation Process based on sub-Block merge mode >

As also shown in the inter prediction unit 102 on the encoding side of fig. 16, when the inter prediction information based on the sub-block merge mode deriving unit 304 is selected in the inter prediction mode determining unit 305, the motion compensation predicting unit 306 obtains the inter prediction information from the inter prediction mode determining unit 305, derives the inter prediction mode, the reference index, and the motion vector of the block to be currently processed, and generates a motion compensation prediction signal. The generated motion compensation prediction signal is supplied to the prediction method determination unit 105.

Similarly, as also shown in the inter prediction unit 203 on the decoding side in fig. 22, when the switch 408 is connected to the sub-block merge mode deriving unit 404 during decoding, the motion compensation predicting unit 406 obtains inter prediction information based on the sub-block merge mode deriving unit 404, derives an inter prediction mode, a reference index, and a motion vector of a block to be currently processed, and generates a motion compensation prediction signal. The generated motion-compensated prediction signal is supplied to the decoded image signal superimposing section 207.

< motion Compensation Process based on affine transformation prediction >

In the normal prediction motion vector mode and the normal merge mode, affine model-based motion compensation can be used based on the following flags. The following flags are reflected in the following flags based on the inter prediction condition decided by the inter prediction mode decision section 305 in the encoding process, and are encoded into the bitstream. In the decoding process, it is determined whether to perform affine model-based motion compensation based on the following flags in the bitstream.

The sps_affine_enabled_flag indicates whether affine model-based motion compensation can be used in inter prediction. If sps_affine_enabled_flag is 0, suppression is performed in units of sequence so that motion compensation based on affine model is not performed. In addition, the inter_affine_flag and the cu_affine_type_flag are not transmitted in a CU (coding block) syntax of the coded video sequence. If the sps_affine_enabled_flag is 1, affine model-based motion compensation can be used in the encoded video sequence.

The sps_affine_type_flag indicates whether motion compensation based on a six-parameter affine model can be used in inter prediction. If sps_affine_type_flag is 0, it is suppressed to motion compensation that is not based on a six-parameter affine model. In addition, the cu_affine_type_flag is not transmitted in the syntax of the CU encoding the video sequence. If sps_affine_type_flag is 1, motion compensation based on a six-parameter affine model can be used in the encoded video sequence. In the case where there is no sps_affine_type_flag, it is set to 0.

In the case of decoding a P slice or a B slice, if inter_affine_flag is 1 in a CU that is a current processing object, motion compensation prediction signals of the CU that is the current processing object are generated using affine model-based motion compensation. If inter_affine_flag is 0, the affine model is not used for the CU that is the current processing object. In the case where the inter_affine_flag is not present, it is set to 0.

In the case of decoding a P slice or a B slice, if the cu_affine_type_flag is 1 in the CU that is the current processing object, a motion compensation prediction signal of the CU that is the current processing object is generated using motion compensation based on the six-parameter affine model. If the cu_affine_type_flag is 0, a motion compensation prediction signal of the CU that is the current processing object is generated using motion compensation based on a four-parameter affine model.

In motion compensation based on an affine model, since a reference index or a motion vector is derived in units of sub-blocks, a motion compensation prediction signal is generated in units of sub-blocks using the reference index or the motion vector that is a processing target.

The four-parameter affine model is the following model: and deriving the motion vector of the sub-block according to four parameters of the horizontal component and the vertical component of the respective motion vectors of the two control points, and performing motion compensation by taking the sub-block as a unit.

In the present embodiment, in the derivation of the predicted motion vector candidate list in the normal predicted motion vector mode, candidates are added in the order of spatial predicted motion vector candidates, historical predicted motion vector candidates, and temporal predicted motion vector candidates. By adopting such a structure, the following effects can be obtained.

1. In the history prediction motion vector candidate derivation process, the same element confirmation step of the element that has been added to the prediction motion vector candidate list and the element in the history prediction motion vector candidate list is performed, and the element in the history prediction motion vector candidate list is added to the prediction motion vector candidate list only if it is different, thus ensuring that the prediction motion vector candidate lists have different elements, respectively. Further, the spatial prediction motion vector candidate using the spatial correlation and the history prediction motion vector candidate using the processing history have different characteristics, respectively. Therefore, the possibility of providing a plurality of prediction motion vector candidates having different characteristics becomes high, and the coding efficiency can be improved.

2. The historic prediction motion vector candidate derivation process performs the same element confirmation as the spatial prediction motion vector candidate, but does not perform the same element confirmation as the temporal prediction motion vector candidate. Therefore, the number of times of confirmation of the same element can be limited, and therefore, the processing load associated with deriving the predicted motion vector candidate list can be reduced.

3. The temporal prediction motion vector candidate derivation process does not perform the same element confirmation as the spatial prediction motion vector candidate and the history prediction motion vector candidate. Thus, the historical predicted motion vector candidates and the temporal predicted motion vector candidates can be independently derived. An improvement in throughput based on parallel processing can be achieved.

(second embodiment)

In the second embodiment, in the generation of the predicted motion vector candidate list in the normal predicted motion vector mode, the temporal predicted motion vector candidates are not derived, but candidates are added in the order of spatial predicted motion vector candidates and historical predicted motion vector candidates.

Fig. 38 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 301 in fig. 16 in the second embodiment.

Fig. 39 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 401 in fig. 22 in the second embodiment.

In the second embodiment, the temporal predicted motion vector candidates are not derived, and the predicted motion vector candidate list is generated, so that the processing load can be reduced. In addition, in the normal prediction motion vector mode, since the prediction motion vector candidate list is sufficiently filled with the history of prediction motion vector candidates, the encoding efficiency is not lowered either.

(third embodiment)

In the third embodiment, in the generation of the predicted motion vector candidate list in the normal predicted motion vector mode, candidates are added in the order of spatial predicted motion vector candidates, temporal predicted motion vector candidates, and historical predicted motion vector candidates. Here, in the history prediction motion vector candidate derivation process, the same element confirmation as the spatial prediction motion vector candidate and the temporal prediction motion vector candidate is not performed.

Fig. 40 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 301 in fig. 16 in the third embodiment.

Fig. 41 is a block diagram showing the detailed configuration of the normal prediction motion vector pattern deriving unit 401 in fig. 22 in the third embodiment.

In the third embodiment, as in the first embodiment, the number of times of confirmation of the same element can be limited, and therefore, the processing load associated with deriving the predicted motion vector candidate list can be reduced. Further, by adding temporal prediction motion vector candidates to the prediction motion vector candidate list in order higher than the historical prediction motion vector candidates, it is possible to generate the prediction motion vector candidate list with high coding efficiency by prioritizing temporal prediction motion vector candidates with high prediction efficiency over historical prediction motion vector candidates without checking the same element of the prediction motion vector between candidates of different types (spatial prediction motion vector candidates, temporal prediction motion vector candidates, historical prediction motion vector candidates) and suppressing the processing load.

All the embodiments described above may be combined in plural.

In all of the embodiments described above, the bitstream output by the image encoding apparatus has a specific data format so as to be able to be decoded according to the encoding method used in the embodiment. In addition, the image decoding device corresponding to the image encoding device can decode the bit stream of the specific data format.

In the case of using a wired or wireless network for exchanging a bit stream between the image encoding apparatus and the image decoding apparatus, the bit stream may be converted into a data format suitable for a transmission form of a communication line to be transmitted. In this case, there is provided: a transmitting device that converts the bit stream output from the image encoding device into encoded data in a data format suitable for a transmission form of a communication line and transmits the encoded data to a network; and a receiving device for receiving the encoded data from the network, recovering the encoded data into a bit stream, and providing the bit stream to the image decoding device. The transmitting apparatus includes: a memory for buffering a bit stream outputted from the image encoding device; a packet processing unit configured to group the bit streams; and a transmitting unit that transmits the encoded data that has been packetized via a network. The receiving device includes: a receiving unit that receives encoded data that has been packetized via a network; a memory for buffering the received encoded data; and a packet processing section that performs packet processing on the encoded data to generate a bit stream, and supplies the bit stream to the image decoding apparatus.

In addition, a display unit for displaying the image decoded by the image decoding device may be added to the configuration as a display device. In this case, the display section reads out the decoded image signal generated by the decoded image signal superimposing section 207 and stored in the decoded image memory 208, and displays it on the screen.

The imaging unit may be added to the configuration, and the captured image may be input to the image encoding device as an imaging device. In this case, the photographing section inputs the photographed image signal to the block dividing section 101.

Fig. 37 shows an example of a hardware configuration of the codec device according to the present embodiment. The codec device includes the structures of the image encoding device and the image decoding device according to the embodiment of the present invention. The codec device 9000 includes a CPU 9001, a codec IC9002, an I/O interface 9003, a memory 9004, an optical disc drive 9005, a network interface 9006, and a video interface 9009, and is connected to each other via a bus 9010.

The image encoding portion 9007 and the image decoding portion 9008 are typically mounted as a codec IC 9002. The image encoding process of the image encoding device according to the embodiment of the present invention is performed by the image encoding unit 9007, and the image decoding process of the image decoding device according to the embodiment of the present invention is performed by the image decoding unit 9008. The I/O interface 9003 is realized by a USB interface, for example, and is connected to an external keyboard 9104, a mouse 9105, and the like. The CPU 9001 controls the codec device 9000 based on a user operation input through the I/O interface 9003 to perform an action desired by the user. Examples of operations performed by the user through the keyboard 9104, the mouse 9105, and the like include selection of a function to perform encoding or decoding, setting of encoding quality, input/output destinations of a bitstream, input/output destinations of an image, and the like.

When the user desires an operation of reproducing an image recorded in the disc recording medium 9100, the optical disc drive 9005 reads out a bit stream from the inserted disc recording medium 9100 and transmits the read-out bit stream to the image decoding section 9008 of the codec IC9002 via the bus 9010. The image decoding unit 9008 performs image decoding processing in the image decoding apparatus according to the embodiment of the present invention on the input bit stream, and transmits the decoded image to the external monitor 9103 via the video interface 9009. The codec device 9000 has a network interface 9006, and is connectable to an external distribution server 9106 and a mobile terminal 9107 via a network 9101. When the user wishes to reproduce an image recorded on the distribution server 9106 or the mobile terminal 9107 instead of an image recorded on the disk recording medium 9100, the network interface 9006 acquires a bit stream from the network 9101 instead of reading out the bit stream from the input disk recording medium 9100. In addition, when the user desires to reproduce the image recorded in the memory 9004, the image decoding process in the image decoding apparatus according to the embodiment of the present invention is performed on the bit stream recorded in the memory 9004.

In the case where the user desires to perform an operation of encoding an image captured by the external camera 9102 and recording the encoded image in the memory 9004, the video interface 9009 inputs an image from the camera 9102 and transmits the image to the image encoding unit 9007 of the codec IC9002 via the bus 9010. The image encoding unit 9007 performs image encoding processing in the image encoding device according to the embodiment of the present invention on an image input via the video interface 9009, and generates a bit stream. The bit stream is then sent to the memory 9004 over the bus 9010. When the user wishes to record a bit stream on the disc recording medium 9100 instead of in the memory 9004, the optical disc drive 9005 performs writing of the bit stream to the inserted disc recording medium 9100.

A hardware configuration having an image encoding device without an image decoding device, or a hardware configuration having an image decoding device without an image encoding device may be realized. Such a hardware configuration is realized by, for example, replacing the codec IC9002 with the image encoding unit 9007 or the image decoding unit 9008, respectively.

The processing related to the above-described encoding and decoding may of course be implemented as a transmitting, storing, and receiving means using hardware, and may be implemented by firmware stored in a ROM (read only memory), a flash memory, or the like, or software of a computer or the like. The firmware program and the software program may be provided by being recorded in a readable recording medium such as a computer, or may be provided from a server via a wired or wireless network, or may be provided as a data broadcast of terrestrial waves or satellite digital broadcasting.

The present invention has been described above based on the embodiments. The embodiments are exemplified, and various modifications are possible for combinations of the respective constituent elements and the respective processing steps, and such modifications are also within the scope of the present invention, as will be understood by those skilled in the art.

Symbol description

100 image encoding device, 101 block dividing unit, 102 inter prediction unit, 103 intra prediction unit, 104 decoded image memory, 105 prediction method determining unit, 106 residual generating unit, 107 orthogonal transformation/quantization unit, 108 bit string encoding unit, 109 inverse quantization/inverse orthogonal transformation unit, 110 decoded image signal overlapping unit, 111 encoded information storage memory, 200 image decoding device, 201 bit string decoding unit, 202 block dividing unit, 203 inter prediction unit, 204 intra prediction unit, 205 encoded information storage memory, 206 inverse quantization/inverse orthogonal transformation unit, 207 decoded image signal overlapping unit, 208 decoded image memory.

Claims

1. A moving picture decoding apparatus, comprising:

a spatial motion information candidate deriving unit that derives a spatial motion information candidate from motion information of a block spatially close to the block to be decoded;

a temporal motion information candidate deriving unit that derives a temporal motion information candidate from motion information of a block that is temporally close to the block to be decoded; and

a historical motion information candidate derivation unit for deriving a historical motion information candidate from a memory for holding motion information of the decoded block,

the historical motion information candidates are compared with the spatial motion information candidates for motion information, but not with the temporal motion information candidates for motion information,

the temporal motion information candidate is registered in a motion information candidate list without a comparison of motion information with the spatial motion information candidate, and the historical motion information candidate is registered in the motion information candidate list if the spatial motion information candidate is not identical to the motion information candidate.

2. A moving picture decoding method in a moving picture decoding apparatus, the moving picture decoding method characterized by comprising the steps of:

Deriving spatial motion information candidates from motion information of a block spatially close to the decoding object block;

deriving temporal motion information candidates from motion information of a block temporally close to the decoding object block; and

historical motion information candidates are derived from a memory that holds motion information for the decoded block,

3. A moving picture encoding apparatus, comprising:

a spatial motion information candidate deriving unit that derives a spatial motion information candidate from motion information of a block spatially close to the block to be encoded;

a temporal motion information candidate deriving unit that derives a temporal motion information candidate from motion information of a block that is temporally close to the block to be encoded; and

A historical motion information candidate derivation unit for deriving a historical motion information candidate from a memory for holding motion information of the encoded block,

4. A moving picture encoding method in a moving picture encoding apparatus, the moving picture encoding method characterized by comprising the steps of:

deriving spatial motion information candidates from motion information of a block spatially close to the encoding target block;

deriving temporal motion information candidates from motion information of a block temporally close to the block to be encoded; and

a step of deriving historical motion information candidates from a memory holding motion information of the encoded block,