WO2011013253A1

WO2011013253A1 - Prediction-signal producing device using geometric transformation motion-compensation prediction, time-varying image encoding device, and time-varying image decoding device

Info

Publication number: WO2011013253A1
Application number: PCT/JP2009/063692
Authority: WO
Inventors: 昭行谷沢; 太一郎塩寺; 中條　健
Original assignee: 株式会社東芝
Priority date: 2009-07-31
Filing date: 2009-07-31
Publication date: 2011-02-03

Abstract

Disclosed is a prediction-signal producing device comprising: a setting unit for setting prediction selection information indicating which of first and second geometric transformation parameters indicating the information relating to the shape of an image according to the geometric transformation of a pixel block is to be used; an acquiring unit for acquiring the motion information or geometric transformation parameter of one or more second adjacent blocks, which have already been subjected to the prediction signal producing treatment, of the first adjacent block adjacent to one of the pixel blocks having the image signals divided; a deriving unit for deriving the predicted geometric transformation parameters of one pixel block from the geometric transformation parameters of the second adjacent block; a setting unit for deriving the first geometric transformation parameter by a predetermined method from the derived values of the geometric transformation parameters and the predicted geometric transformation parameters and for setting the derived parameter; a setting unit for setting the second geometric transformation parameters on the basis of the motion information of one pixel block and the second adjacent block; and a producing unit for subjecting the reference image signals at the time of performing motion compensations for one pixel block, to a geometric transformation treatment by using the first or second geometric transformation parameters indicated in the selection information, thereby to produce a prediction signal.

Description

Predicted signal generation apparatus, moving picture coding apparatus, and moving picture decoding apparatus using geometric transformation motion compensated prediction

The present invention relates to a prediction signal generation apparatus and a moving picture coding method for deriving a geometric transformation parameter using motion information of neighboring blocks and a prediction target block, and performing a geometric transformation prediction process for the prediction target block based on the derived geometric transformation parameter. The present invention relates to a device and a video decoding device.

In recent years, a moving picture coding method that has greatly improved coding efficiency has been jointly developed by ITU-T and ISO / IEC. H. H.264 and ISO / IEC 14496-10 (hereinafter referred to as “H.264”). H. In H.264, prediction processing, conversion processing, and entropy coding processing are performed in units of rectangular blocks (for example, 16 × 16 pixels, 8 × 8 pixels). For this reason, H.C. In H.264, when an object that cannot be expressed by a rectangular block is predicted, the prediction efficiency is increased by selecting a smaller prediction block (for example, 4 × 4 pixels). Methods for effectively predicting such objects include a method of preparing a plurality of prediction patterns in a rectangular block, and a method of applying motion compensation using affine transformation to a deformed object.

For example, Japanese Patent Application Laid-Open No. 2007-312397 uses an object motion model as an affine transformation model, and calculates the optimum affine transformation parameters for each block to be predicted, thereby taking into consideration the enlargement / reduction / rotation of the object. A video frame transfer method using prediction is disclosed.

RC Kordasiewicz, MD Gallant, and S. Shirani, “Affine Motion Prediction Based on Translational Motion Vectors,” IEEE Trans. On Circuits and Systems for Video Technologies, Vol. 17, No. 10, October Based on the motion vector information calculated as a translation model, the block is divided into triangular patches, and the affine transformation parameters are estimated for each patch, so that motion compensation prediction of the affine transformation model is performed approximately. A method is disclosed.

However, in the method described in Japanese Patent Application Laid-Open No. 2007-312397, information on six types of affine transformation parameters is transmitted for every pixel block, so that overhead increases.

In addition, the method described in Kordasiewicz et al. Uses affine transformation using motion vectors of eight types of adjacent blocks adjacent to a pixel block to be predicted, such as up, down, left, and right, and a motion vector calculated from the pixel block to be predicted. In order to estimate the parameters, it is necessary to re-encode the frame a plurality of times in order to obtain an optimal motion vector. On the other hand, when only the motion vector is calculated in units of frames from the original image, this conventional method is not optimal from the viewpoint of code amount and encoding distortion, and there is a problem that encoding efficiency decreases. Furthermore, when an edge or a different object exists between the adjacent block and the prediction target block, an error due to the affine transformation parameter is generated, the prediction efficiency is lowered, and the affine transformation parameter including the error is used in the subsequent affine transformation. Since it is used for parameter derivation, the error of the affine transformation parameter propagates while accumulating, and the prediction efficiency of the subsequent affine transformation prediction decreases.

The present invention reduces a motion detection process necessary for estimating a geometric transformation parameter used for geometric transformation motion compensated prediction and improves prediction efficiency without increasing a code amount, a moving picture coding apparatus, a moving picture, and the like. It is an object of the present invention to provide a decoding device and a moving image decoding device.

According to a first aspect of the present invention, prediction selection information indicating whether to use a first geometric transformation parameter or a second geometric transformation parameter indicating information related to an image shape by geometric transformation of a pixel block is set. One setting unit and one or more second blocks of which the prediction signal generation process has already been completed among the plurality of first adjacent blocks adjacent to one pixel block of the plurality of pixel blocks into which the image signal is divided An acquisition unit that acquires motion information of neighboring blocks or the geometric transformation parameters; a derivation unit that derives predicted geometric transformation parameters of the one pixel block from the geometric transformation parameters of the one or more second neighboring blocks; Deriving the first geometric transformation parameter by a predetermined method from the derived geometric transformation parameter derivation value and the predicted geometric transformation parameter, A second setting unit for setting, a third setting unit for setting the second geometric transformation parameter based on motion information of the one pixel block and the one or more second adjacent blocks, A prediction signal is generated by performing a geometric transformation process using the first geometric transformation parameter or the second geometric transformation parameter indicated by the selection information on a reference image signal when performing motion compensation for one pixel block. And a generation unit for generating a prediction signal.

According to a second aspect of the present invention, the prediction selection information indicating whether to use the first geometric transformation parameter or the second geometric transformation parameter is encoded using the prediction signal generation device described above. When the first geometric transformation parameter is selected, a derived value of the geometric transformation parameter is derived from the first geometric transformation parameter and the predicted geometric transformation parameter by a predetermined method. And a second encoding unit that encodes the derived value, and a third encoding unit that encodes information indicating a difference signal between the input image signal and the prediction signal. Providing equipment.

According to a third aspect of the present invention, there is provided a moving image decoding apparatus that decodes moving image encoded data obtained by encoding an input image signal in units of a plurality of pixel blocks, and performs decoding processing by a prescribed method. Motion information or pixel block of one or more second adjacent blocks that have already been decoded among a plurality of first adjacent blocks adjacent to one pixel block of the plurality of pixel blocks into which the image signal is divided A motion information acquisition unit for acquiring a geometric conversion parameter indicating information related to the shape of an image by geometric conversion of the image, and decoding selection information indicating whether to use the first geometric conversion parameter or the second geometric conversion parameter A first decoding unit, a second decoding unit that decodes a derived value of the geometric transformation parameter when the first geometric transformation parameter is selected, and the one or more A derivation unit for deriving a predicted geometric transformation parameter of the one pixel block from the geometric transformation parameters of two adjacent blocks, a derivation value of the decoded geometric transformation parameter, and the predicted geometric transformation parameter; When the first setting unit for deriving and setting the first geometric transformation parameter by the method and the second geometric transformation parameter are selected, the one pixel block and the one or more first geometric transformation parameters are selected. A second setting unit for setting the second geometric transformation parameter based on motion information of two adjacent blocks, and the selection information for a reference image signal when performing motion compensation for the one pixel block. A generation unit that generates a prediction signal by performing a geometric transformation process using the first geometric transformation parameter or the second geometric transformation parameter shown. To provide a moving picture decoding apparatus according to claim Rukoto.

It is a block diagram of the moving image encoder according to the first embodiment and the second embodiment. It is a block diagram of the inter estimated signal generation apparatus according to 1st Embodiment and 2nd Embodiment. It is a figure which shows the flow of an encoding process. It is a figure which shows a 16x16 pixel block. It is a block diagram of a derived parameter derivation unit. It is a block diagram of a 2nd geometric transformation parameter derivation part. It is a figure showing the positional relationship of the pixel block used as the object of an encoding or decoding, and an adjacent block. It is a figure showing the positional relationship of an adjacent block in case the pixel block used as the object of an encoding or decoding is the upper left of a macroblock. It is a figure showing the positional relationship of an adjacent block in case the pixel block used as the object of an encoding or decoding is an upper right of a macroblock. It is a figure showing the positional relationship of an adjacent block in case the pixel block used as the object of an encoding or decoding is the lower left of a macroblock. It is a figure showing the positional relationship of an adjacent block in case the pixel block used as the object of an encoding or decoding is a lower right of a macroblock. It is a block diagram of a prediction geometric transformation parameter derivation unit. It is a block diagram of a geometric transformation prediction part. It is a figure which shows the example which produces | generates the pixel value of the fractional position which performed geometric transformation by the interpolation process. It is a figure which shows the example of the index of prediction selection information. It is a flowchart which shows the flow of a process of the geometric transformation prediction in the inter prediction signal generation apparatus shown by 1st Embodiment. It is a figure which shows a syntax structure. It is a figure which shows the information contained in a slice header syntax. It is a figure which shows the information contained in the slice data syntax in 1st Embodiment thru | or 4th Embodiment. It is a figure which shows the information contained in the macroblock layer syntax in 1st Embodiment and 3rd Embodiment. It is a figure which shows the information contained in the macroblock prediction syntax in 1st Embodiment and 3rd Embodiment. It is a figure which shows the information contained in the submacroblock prediction syntax in 1st Embodiment and 3rd Embodiment. It is a figure which shows the information contained in the macroblock layer syntax in 2nd Embodiment and 4th Embodiment. It is a figure which shows the information contained in the macroblock prediction syntax in 2nd Embodiment and 4th Embodiment. It is a figure which shows the information contained in the submacroblock prediction syntax in 2nd Embodiment and 4th Embodiment. It is a block diagram which shows the moving image decoding apparatus according to 3rd Embodiment and 4th Embodiment.

Hereinafter, the first to fourth embodiments will be described with reference to the drawings. The first and second embodiments relate to a video encoding device, and the third and fourth embodiments relate to a video decoding device.

The video encoding apparatus described in the following embodiment divides each frame constituting an input image signal into a plurality of pixel blocks, performs encoding processing on the divided pixel blocks, and performs compression encoding. It is a device that outputs a code string.

[First Embodiment]
With reference to FIG. 1, the structure of the moving image encoder 100 which uses geometric transformation prediction based on 1st Embodiment is demonstrated.

The moving picture coding apparatus 100 is connected to the coding control unit 114. In the moving image encoding apparatus 100, the subtractor 101 calculates a difference between the input image signal 115 and the prediction signal 206, and outputs a prediction error signal 116. The output terminal of the subtractor 101 is connected to the transform / quantization unit 102. The transform / quantization unit 102 includes, for example, an orthogonal transformer (discrete cosine transformer) and a quantizer, performs orthogonal transform (discrete cosine transform) on the prediction error signal 116, quantizes it, and transforms it into a transform coefficient 117. The output terminal of the transform / quantization unit 102 is connected to the inverse quantization / inverse transform unit 103 and the entropy coding unit 112.

The inverse quantization / inverse transform unit 103 includes an inverse quantizer and an inverse orthogonal transformer (inverse discrete cosine transformer), inversely quantizes the transform coefficient 117, and performs inverse orthogonal transform to restore the decoded prediction error signal 118. . The output terminal of the inverse quantization / inverse transform unit 103 is connected to the adder 104. The adder 104 adds the decoded prediction error signal 118 and the prediction signal 206 to generate a decoded image signal 119. The output terminal of the adder 104 is connected to the reference image memory 105. The reference image memory 105 stores the decoded image signal 119 as a reference image signal. The output terminal of the reference image memory 105 is connected to the motion information search unit 106, the intra prediction signal generation device 107, and the inter prediction signal generation device 109.

The motion information search unit 106 uses the input image signal 115 and the reference image signal 207 to calculate motion information (motion vector) 210 suitable for the prediction target block. The output terminal of the motion information search unit 106 is connected to the derived parameter derivation unit 108 and the inter prediction signal generation device 109. As shown in FIG. 4, the derivation parameter derivation unit 108 includes a predictive geometric transformation parameter derivation unit 601 and a converter 602. The derivation parameter derivation unit 108 has a function of deriving a predictive geometric transformation parameter to be described later, It has a function of calculating derived parameters using information 210. The output terminal of the derived parameter deriving unit 108 is connected to the inter prediction signal generation device 109 and the entropy coding unit 112.

As shown in FIG. 2, the inter prediction signal generation device 109 has a function of generating a prediction signal 206 using the input motion information 210, derived parameter information 108, reference image signal 207, and prediction selection information 123. Have. On the other hand, the intra prediction signal generation device 107 has a function of performing intra prediction using the input reference image signal 207. Output terminals of the intra prediction signal generation device 107 and the inter prediction signal generation device 109 are connected to terminals of the prediction separation switch 110, respectively.

The prediction selection unit 111 sets the prediction selection information 123 according to the prediction mode controlled by the encoding control unit 114. The output terminal of the prediction selection unit 111 is connected to the inter prediction signal generation device 109, the prediction separation switch 110, and the entropy encoding unit 112. The prediction separation switch 110 switches between the intra prediction signal generation device 107 and the inter prediction signal generation device 109 according to the prediction selection information 123 of the prediction selection unit 111. The switching terminal of the prediction separation switch 110 is connected to the subtractor 101 and the adder 104, and introduces the prediction signal of the intra prediction signal generation device 107 or the inter prediction signal generation device 109 to the subtractor 101 and the adder 104.

The entropy encoding unit 112 includes an encoder and a multiplexer, and entropy-encodes and multiplexes the transform coefficient 117, the derived parameter information 211, and the prediction selection information 123. The output terminal of the entropy encoding unit 112 is connected to the output buffer 113. The output buffer 113 temporarily stores the multiplexed data and outputs it as encoded data 129 according to the output timing managed by the encoding control unit 114.

The moving image encoding apparatus 100 having the above configuration performs an intra prediction (intraframe prediction) or inter prediction (interframe prediction) encoding process on the input image signal 115 based on the encoding parameter input from the encoding control unit 114. The prediction signal 206 is generated, and the encoded data 129 is output. That is, an input image signal 115 of a moving image or a still image is input to the moving image encoding apparatus 100 after being divided into pixel blocks, for example, macroblocks. The input image signal is one encoding processing unit including both a frame and a field. In the present embodiment, an example in which a frame is used as one encoding processing unit will be described.

The moving picture encoding apparatus 100 performs encoding in a plurality of prediction modes with different block sizes and generation methods of the prediction signal 206. The generation method of the prediction signal 206 is roughly divided into intra prediction (intraframe prediction) in which a prediction signal is generated only within a frame to be encoded, and prediction using a plurality of temporally different reference frames. In this embodiment, an example in which a prediction signal is generated using inter prediction will be described in detail.

In the first to fourth embodiments, the macro block is set to the basic processing block size of the encoding process. The macroblock is typically a 16 × 16 pixel block shown in FIG. 3B, for example, but may be a 32 × 32 pixel block unit or an 8 × 8 pixel block unit. The shape of the macroblock does not necessarily need to be a square lattice. Hereinafter, the encoding target block or macroblock of the input image signal 115 is simply referred to as a “prediction target block”.

In the first embodiment to the fourth embodiment, it is assumed that the encoding process is performed from the upper left to the lower right as shown in FIG. In FIG. 3A, in the encoded frame f subjected to the encoding process, a block located on the left and above the block c to be encoded is an encoded block p.

Next, the flow of encoding in the moving image encoding apparatus 100 will be described. First, the input image signal 115 is input to the subtractor 101. The subtracter 101 further receives a prediction signal 206 corresponding to each prediction mode output from the prediction separation switch 110. The subtractor 101 calculates a prediction error signal 116 obtained by subtracting the prediction signal 206 from the input image signal 115. The prediction error signal 116 is input to the transform / quantization unit 102.

The transform / quantization unit 102 performs orthogonal transform such as discrete cosine transform (DCT) on the prediction error signal 116 to generate transform coefficients. The transformation in the transformation / quantization unit 102 is H.264. In addition to the discrete cosine transform used in H.264, discrete sine transform, wavelet transform, or component analysis is included.

The transform / quantization unit 102 quantizes the transform coefficient in accordance with quantization information represented by a quantization parameter, a quantization matrix, and the like given by the encoding control unit 114. The transform / quantization unit 102 outputs the quantized transform coefficient 117 to the entropy coding unit 112 and also outputs it to the inverse quantization / inverse transform unit 103.

The entropy encoding unit 112 performs entropy encoding, for example, Huffman encoding or arithmetic encoding, on the quantized transform coefficient 117. The entropy encoding unit 112 further performs entropy encoding on various encoding parameters used when the encoding target block including the prediction information output from the encoding control unit 114 is encoded. . As a result, encoded data 129 is generated.

Note that the encoding parameter is a parameter required for decoding prediction information, information on transform coefficients, information on quantization, and the like. Note that the encoding parameter of the prediction target block is held in an internal memory of the encoding control unit 114, and is used when the prediction target block is used as an adjacent block of another pixel block.

The encoded data 129 generated and multiplexed by the entropy encoding unit 112 is output from the moving image encoding apparatus 100, temporarily stored in the output buffer 113, and then managed by the encoding control unit 114. The encoded data 129 is output according to the output timing. The encoded data 129 is sent to, for example, a storage system (storage medium) or a transmission system (communication line) (not shown).

In the inverse quantization / inverse transform unit 103, an inverse quantization process is performed on the quantized transform coefficient 117 output from the transform / quantization unit 102. Here, the quantization information corresponding to the quantization information used in the transform / quantization unit 102 is loaded from the internal memory of the encoding control unit 114 and subjected to inverse quantization processing. Note that the quantization information is, for example, a parameter represented by a quantization parameter, a quantization matrix, or the like.

The inverse quantization / inverse transform unit 103 further reproduces the decoded prediction error signal 118 by performing inverse orthogonal transform such as inverse discrete cosine transform (IDCT) on the transform coefficient after inverse quantization. The

The decoded prediction error signal 118 is input to the adder 104. The adder 104 adds the decoded prediction error signal 118 and the prediction signal 206 output from the prediction separation switch 110 to generate a decoded image signal 119. The decoded image signal 119 is a local decoded image signal. The decoded image signal 119 is stored as the reference image signal 207 in the reference image memory 105. The reference image signal 207 stored in the reference image memory 105 is output to the motion information search unit 106, the intra prediction signal generation device 107, the inter prediction signal generation device 109, etc., and is referred to when performing prediction.

The motion information search unit 106 uses the input image signal 115 and the reference image signal 207 to calculate motion information 210 suitable for the prediction target block. The motion information may be represented by an affine transformation parameter, for example. In addition, the motion information can be represented by a motion vector. The motion information 210 may be, for example, a predicted value for predicting an affine transformation parameter with another affine transformation parameter or the like, or may be a predicted value for predicting a motion vector with another motion vector or the like. Here, motion information including geometric deformation between images is used as motion information.

The motion information search unit 106 searches for motion information 210 (affine transformation parameters and motion vectors) by performing a search such as block matching between the prediction target block of the input image signal 115 and the interpolated image of the reference image signal 207. calculate. As an evaluation criterion for matching, for example, a value obtained by accumulating a difference between the input image signal 115 and the interpolated image after matching for each pixel, a value obtained by adding a difference value between the calculated affine transformation parameter and the center of search, or the like. Use.

In addition to the method described above, the motion information 210 may be determined by using a value obtained by converting the difference between the predicted image and the original image, taking into account the size of the motion vector or the affine transformation parameter, Alternatively, the determination may be made by taking into account the code amount of the affine transformation parameter or the like. Moreover, you may utilize costs, such as Formula (1) and Formula (2) mentioned later. Further, the matching may be performed through a search within the matching range based on search range information provided from the outside of the moving image encoding apparatus 100, or may be performed hierarchically for each pixel accuracy.

The motion information 210 calculated for a plurality of reference image signals in this way is input to the inter prediction signal generation device 109 and used to generate the prediction signal 206. The plurality of reference image signals are locally decoded images having different display times.

The calculated motion information 210 is output to the derived parameter derivation unit 108. As shown in FIG. 4, the derived parameter derivation unit 108 includes a predicted geometric transformation parameter derivation unit 601 and a converter 602, derives a predicted geometric transformation parameter to be described later, and inputs the predicted geometric transformation parameter and the input motion. The derived parameter is calculated using the information 210. The converter 602 may be, for example, a subtracter. In addition to the subtracter, the converter 602 may be an adder, a multiplier, a divider, a converter that performs conversion using a predetermined matrix, and a combination of these. The converter which implement | achieves may be sufficient. Hereinafter, the converter 602 will be described as a subtractor.

The derived parameter information 211 derived by the derived parameter deriving unit 108 is output to the entropy encoding unit 112, and after being subjected to entropy encoding, is multiplexed into encoded data. Furthermore, the motion information 210 obtained by encoding the target pixel block is stored in the internal memory of the encoding control unit 114, and is appropriately loaded from the inter prediction signal generation device 109 and used.

The reference image signal 207 stored in the reference image memory 105 is output to the intra prediction signal generation device 107. In the intra prediction signal generation device 107, intra prediction is performed using the input reference image signal 207. For example, H.M. In H.264, a prediction signal is generated by performing pixel interpolation in the prediction direction such as the vertical direction and the horizontal direction using an encoded reference pixel value adjacent to the prediction target block. In addition, after interpolating the pixel value using a predetermined interpolation method, the interpolated pixel value may be copied in a predetermined prediction direction. The generated prediction signal 206 is output to the prediction separation switch 110.

The inter prediction signal generation apparatus 109 generates a prediction signal 206 using the input motion information 210, derived parameter information 108, reference image signal 207, and prediction selection information 123. The generated prediction signal 206 is output to the prediction separation switch 110.

The prediction separation switch 110 selects the output terminal of the intra prediction signal generation device 107 and the output terminal of the inter prediction signal generation device 109 according to the prediction selection information 123. When the information shown in the prediction selection information 123 is intra prediction, the switch is connected to the intra prediction signal generation device 107. On the other hand, when the prediction selection information 123 is inter prediction, the switch is connected to the inter prediction signal generation device 109. An example of the prediction selection information 123 is shown in FIG.

The prediction selection unit 111 sets the prediction selection information 123 according to the prediction mode controlled by the encoding control unit 114. As the prediction mode, intra prediction or inter prediction can be selected, and a plurality of modes may exist for each. The encoding control unit 114 controls which mode is selected. For example, the prediction signal 206 may be generated for all prediction modes, and one prediction mode may be selected from these, or the prediction mode may be limited according to the characteristics of the input image.

More specifically, the prediction selection information 123 is determined using a cost such as the following equation. The code amount (for example, the code amount of the derivation parameter 211 and the code amount of the prediction block size) required for the prediction information when the prediction mode is selected is OH, and the absolute difference (prediction error) between the input image signal 115 and the prediction signal 206 If the SAD is defined as the absolute cumulative sum of the signals 116, the following determination formula is used.

Where K is the cost and λ is a constant. λ is a Lagrangian undetermined multiplier determined based on the quantization scale and the value of the quantization parameter. In this determination formula, the mode giving the value with the smallest cost K is selected as the optimum prediction mode.

The prediction selection information 123 may be determined using (a) only prediction information or (b) only SAD instead of the formula (1), and Hadamard transformation is applied to these (a) and (b). A value or a value close to it may be used. As yet another example, a provisional encoding unit is prepared, and the amount of code when the prediction error signal 116 generated in the prediction mode by the provisional encoding unit is actually encoded, the input image signal 115 and the decoded image signal The prediction selection information 123 may be determined using a square error with 119. The judgment formula in this case is as follows.

Here, J is an encoding cost, and D is an encoding distortion representing a square error between the input image signal 114 and the decoded image signal 118. On the other hand, R represents a code amount estimated by provisional encoding.

When the encoding cost J of Equation (2) is used, provisional encoding and local decoding processing are required for each prediction mode, so that the circuit scale or calculation amount increases. However, since a more accurate code amount and encoding distortion are used, high encoding efficiency can be maintained. The cost may be calculated using only R or only D instead of the expression (2), or the cost function may be created using a value approximating R or D.

Next, the inter prediction signal generation device 109 will be described with reference to FIG.

The inter prediction signal generation device 109 includes a prediction separation switch 201, a geometric transformation prediction unit 202, a second geometric transformation parameter derivation unit 203, a first geometric transformation parameter derivation unit 204, and a predicted geometric transformation parameter derivation unit 205.

First, the process of the second geometric transformation parameter derivation unit 203 will be specifically described. The second geometric transformation parameter derivation unit 203 uses the motion information 210 of the prediction target block output from the motion information search unit 106 and the motion information of the encoded block stored in the encoding control unit 114 to perform prediction. The second geometric transformation parameter 208 of the target block is derived. The motion information 210 stored in the encoding control unit 114 is a motion vector included in the motion information 210 of the adjacent block, and is hereinafter referred to as “adjacent motion vector”.

As shown in FIG. 5, the second geometric transformation parameter derivation unit 203 includes a motion information acquisition unit 501 and a second parameter derivation unit 502. The motion information acquisition unit 501 determines an adjacent block from which motion information is acquired from among a plurality of adjacent blocks, and acquires motion information of the adjacent block, for example, a motion vector. The second parameter derivation unit 502 derives a second geometric transformation parameter from the motion vector of the adjacent block.

Hereinafter, the process of deriving the adjacent motion vector by the motion information acquisition unit 501 will be described with reference to FIG.

≪Derivation of adjacent block and adjacent motion vector≫
6A to 6E are diagrams illustrating the relationship between adjacent blocks with respect to a prediction target block. FIG. 6A shows an example in which the sizes of prediction target blocks and adjacent blocks (for example, 16 × 16 pixel blocks) match.

In FIG. 6A, a hatched pixel block p is a pixel block that has already been encoded or predicted (hereinafter referred to as “predicted pixel block”). A block c with dot hatching indicates a prediction target block, and a pixel block n displayed in white is an uncoded pixel (unpredicted) block. In the figure, X represents an encoding (prediction) target pixel block.

The adjacent block A is the adjacent block on the left of the prediction target block X, the adjacent block B is the adjacent block on the prediction target block X, the adjacent block C is the adjacent block on the upper right of the prediction target block X, and the adjacent block D is This is an adjacent block at the upper left of the prediction target block X.

The adjacent motion vector held in the internal memory of the encoding control unit 114 is only the motion vector of the predicted pixel block. As shown in FIG. 3A, since the pixel block is encoded and predicted from the upper left to the lower right, when the pixel block X is predicted, the right and lower pixel blocks are still encoded. Has not been made. Therefore, an adjacent motion vector cannot be derived from these adjacent blocks.

6B to 6E are diagrams illustrating examples of adjacent blocks when the prediction target block is an 8 × 8 pixel block. In FIG. 6B to FIG. 6E, bold lines represent macroblock boundaries. 6B is a pixel block located at the upper left in the macro block, FIG. 6C is a pixel block located at the upper right in the macro block, FIG. 6D is a pixel block located at the lower left in the macro block, and FIG. An example is shown in which each pixel block located at the lower right in the block is a prediction target block.

Since the inside of the macro block is similarly encoded from the upper left to the lower right, the position of the adjacent block changes according to the encoding order of the 8 × 8 pixel block. When the encoding process or the prediction signal generation process of the corresponding 8 × 8 pixel block is completed, the pixel block becomes an encoded pixel block and is used as an adjacent block of the pixel block to be processed later. In FIG. 6E, since the upper right pixel block corresponding to the adjacent block C is an unencoded pixel block, the pixel block located at the upper right of the encoded pixel block is set as an adjacent block.

Note that, as described with reference to FIGS. 6B to 6E, adjacent blocks having the closest Euclidean distance to the prediction target block X are referred to as adjacent blocks A, B, C, and D, respectively. For example, in FIG. 6E, since the adjacent block located at the upper right is an uncoded block, the nearest block among the upper right adjacent blocks is the adjacent block C. Further, even when pixel blocks having different block sizes are mixed, a block having the shortest Euclidean distance with respect to the prediction target block X is set as an adjacent block.

In the above description, the case where the block is 16 × 16 pixels and 8 × 8 pixels has been described as an example. However, a square pixel block such as 32 × 32 pixels, 4 × 4 pixels, or the like is used using a similar framework. Adjacent blocks may be determined for rectangular pixel blocks such as 8 pixels and 8 × 16 pixels.

In addition to using four pixel blocks A, B, C, and D as adjacent blocks, adjacent blocks may be defined more widely. For example, a pixel block on the left of the adjacent block A may be used, or a pixel block further on the adjacent block B may be used.

≪Derivation of second geometric transformation parameter≫
Next, a method for deriving the second geometric transformation parameter 208 in the second geometric transformation parameter deriving unit 502 will be described. The second geometric transformation parameter 208 is derived by the second parameter deriving unit 502. The adjacent motion vectors held by the adjacent blocks are defined by equations (3) to (6), respectively.

Also, the motion information 210 provided from the motion information search unit 106 is defined by equation (7). Note that the motion information 210 indicates a motion vector of the prediction target block X.

The second geometric transformation parameter 208 is derived using the motion vector and the adjacent motion vector represented by the equations (3) to (7). When affine transformation is used as geometric transformation, the transformation formula is expressed by the following formula (8).

Here, the parameters (c, f) correspond to motion vectors, and the parameters (a, b, d, e) indicate parameters associated with geometric deformation. u and v indicate the coordinates of the encoding target block, and x and y indicate the coordinates of the reference image. If the parameter (a, b, d, e) is (1, 0, 0, 1), it means that it is the same as the motion compensation of the parallel movement model (formula (18) described later).

In addition, although the example which used affine transformation was shown here as geometric transformation, geometric, such as a bilinear transformation, Helmert transformation, secondary conformal transformation, projective transformation, and three-dimensional projective transformation corresponding to geometric transformation, was shown. A transformation may be used. In this case, the required number of parameters varies depending on the geometric transformation to be used, but a suitable geometric transformation can be selected according to the nature of the image to be applied, depending on the amount of code when the parameters are encoded and the corresponding geometric transformation pattern. Should be selected. Hereinafter, in this embodiment, an example using affine transformation will be described.

In equation (8), coordinates (x, y) are converted to coordinates (u, v) by affine transformation. Six parameters a, b, c, d, e, and f included in Expression (8) represent geometric transformation parameters. In affine transformation, since these six types of parameters are derived from adjacent vectors, six or more input values are required. When the motion vectors of the adjacent blocks A and B and the prediction target block X are used, a geometric transformation parameter is derived by the following equation (9). Here, it is assumed that the motion vector is ¼ precision, and the precision of the parameters (a, b, d, e) is 1/64.

However, ax and ay are variables based on the size of the prediction target block, and are calculated by the following equation (10).

Here, mb_size_x and mb_size_y indicate the horizontal and vertical sizes of the macroblock. When a 16 × 16 pixel block is used, mb_size_x = 16 and mb_size_y = 16. Moreover, blk_size_x and blk_size_y represent the horizontal and vertical sizes of the prediction target pixel block. In the case of FIG. 6B, blk_size_x = 8 and blk_size_y = 8.

Here, an example in which the geometric transformation parameters are derived using the motion vectors of the adjacent pixel blocks A and B as input values is shown, but it is not always necessary to use the motion vectors of the adjacent pixel blocks A and B. Motion vectors calculated from the blocks C and D and other adjacent pixel blocks may be used, or geometric transformation parameters may be obtained from the motion vectors of the plurality of adjacent pixel blocks using parameter fitting. In addition, Equation (8) shows an example in which a, b, d, and e are obtained as real numbers, respectively, but by determining the calculation accuracy of these parameters in advance, an integer can be easily obtained as in Equation (9). Is possible.

≪Derivation of first geometric transformation parameter≫
Next, the predicted geometric transformation parameter derivation unit 205 will be described.

The predicted geometric transformation parameter derivation unit 205 includes a motion information acquisition unit 701 and a predicted geometric transformation parameter calculation unit 702 as shown in FIG. The motion information acquisition unit 501 determines an adjacent block in the same procedure as the motion information acquisition unit 501 held by the second geometric transformation derivation unit 203. However, the motion information calculated from the adjacent block is a geometric transformation parameter.

The motion information acquisition unit 701 derives a prediction geometric transformation parameter 212 of the prediction target block using the motion information 210 of the encoded block stored in the encoding control unit 114. The motion information 210 stored in the encoding control unit 114 is motion information 210 of adjacent encoded blocks, and is hereinafter referred to as “adjacent geometric transformation parameter”. A method for deriving the predicted geometric transformation parameter 212 will be described with reference to FIG. 6A. The adjacent geometric transformation parameters held by the adjacent blocks are defined by equations (11) to (14), respectively.

ap indicates an affine transformation parameter, which is a six-dimensional parameter as shown in Equation (8). Using the adjacent geometric transformation parameter (adjacent affine transformation parameter) derived in this way, the predicted geometric transformation parameter calculation unit 702 calculates the predicted geometric transformation parameter.

The predicted geometric transformation parameter calculation unit 702 calculates a predicted geometric transformation parameter by median processing using the spatial correlation between the prediction target block and the adjacent block.

Here, pred_ap represents a predicted geometric transformation parameter. The function affine_median () is a function that takes the median value of the 6-dimensional affine transformation parameters. Moreover, you may determine a prediction geometric transformation parameter using following Formula.

Here, the function median () indicates a scalar median. Note that T in equation (16) means transposition. In equation (16), a predicted geometric transformation parameter is derived by taking a median value for each geometric transformation parameter component.

Here, an example in which a predicted geometric transformation parameter is uniquely derived from an adjacent geometric transformation parameter has been shown, but information on which neighboring geometric transformation parameter is used can be added. In this case, when the first geometric transformation parameter 209 is selected, since selection information needs to be added, the amount of information increases. However, there is an edge between adjacent blocks and prediction target blocks, or different objects are When adjacent to each other, it is possible to select an appropriate prediction geometric transformation parameter. In this embodiment, since four blocks are used as adjacent blocks, information indicating four combinations is output to the entropy encoding unit 112 together with the derived parameter information 211 and encoded.

Here, there is a case where the geometric transformation parameter is not set when the adjacent block is in the skip mode or the like. In such a case, it may be initialized to ap = (1, 0, c, 0, 1, f). Further, the geometric transformation parameters of such adjacent blocks may be derived again. For example, when normal motion compensation prediction is selected in the adjacent block A shown in FIG. 6A, the geometric transformation parameter is ap = (1, 0, c, 0, 1, f). Here, with reference to adjacent block A, four blocks that have already been encoded (further located on the left, upper located, upper left, and upper right) are adjacent. A geometric transformation parameter is derived using Equation (9) as a block. It is also possible for ap _A to use this geometric transformation parameter as an adjacent geometric transformation parameter. It is also possible to derive predicted geometric transformation parameters by re-deriving geometric transformation parameters for adjacent blocks B, C, and D in the same manner.

The predicted geometric transformation parameter 212 derived by the predicted geometric transformation parameter derivation unit 205 is output to the first geometric transformation parameter derivation unit 204. The first geometric transformation parameter derivation unit 204 adds the inputted geometric transformation parameter derivation parameter information 211 of the prediction target block and the predicted geometric transformation parameter 212. The added parameter becomes the first geometric transformation parameter 209 and is output to the geometric transformation prediction unit 202.

Here, an example is shown in which the first geometric transformation parameter deriving unit 204 derives the first geometric transformation parameter 209 by adding the predicted geometric transformation parameter 212 and the derivation parameter information 211 of the geometric transformation parameter of the prediction target block. Depending on the method of generating the derived parameter information 211, any of addition, subtraction, multiplication, division, conversion using a predetermined matrix, and a value derived using a combination of these may be used. . For example, when the predicted geometric transformation parameter is −pred_ap, the first geometric transformation parameter derivation unit 204 adds the predicted geometric transformation parameter 212 (−pred_ap) and the derived parameter information 211 to obtain the first geometric transformation parameter 209. To derive.

In any case, the prediction geometric transformation parameter 212 and the first geometric transformation parameter derivation unit 204 define the derivation formula so as to reduce the information amount of the first geometric transformation parameter 209 calculated in the prediction target block as much as possible. .

≪Processing of geometric transformation prediction part≫
The geometric transformation prediction unit 202 has a function of generating a prediction signal using the input geometric transformation parameters and the reference image signal 207. The geometric transformation prediction unit 202 includes a geometric transformation unit 401 and an interpolation unit 402 as shown in FIG. The geometric transformation unit 202 performs geometric transformation on the reference image signal 207 and calculates the position of the predicted pixel. The interpolation unit 402 calculates the predicted pixel value corresponding to the fractional position of the predicted pixel obtained by the geometric transformation by interpolation or the like.

An example of geometric transformation prediction and motion compensation prediction for a 16 × 16 pixel prediction target block will be described with reference to FIG.

In FIG. 9, the prediction target block is a square pixel block CR composed of pixels indicated by Δ. The corresponding pixel of motion compensation prediction is indicated by ●. A pixel block MER composed of pixels indicated by ● is square. On the other hand, the pixel corresponding to the geometric transformation prediction is indicated by x, and the pixel block GTR composed of these pixels is, for example, a parallelogram.

The region after motion compensation and the region after geometric transformation describe the corresponding region of the reference image signal according to the coordinates of the frame to be encoded. As described above, by using the geometric transformation prediction, it is possible to generate a prediction signal in accordance with deformation such as rotation, enlargement / reduction, shearing, and mirror transformation of the rectangular pixel block.

The geometric transformation unit 401 calculates coordinates (u, v) after the geometric transformation using the input geometric transformation parameters using the equation (8). The calculated coordinates (u, v) after geometric transformation are real values. Therefore, the predicted value is generated by interpolating the luminance value corresponding to the coordinates (u, v) from the reference image signal.

In this embodiment, an interpolation method is used. The interpolation method is expressed by the following equation (17).

Here, P (u, v) represents the predicted pixel value after the interpolation process, and R (x, y) represents the integer pixel value of the used reference image signal. Assuming that the pixel interpolation accuracy is 1/64, (x−u) = U / 64 and (y−v) = V / 64, and Equation (17) can be transformed into an integer calculation shown in Equation (18).

Here, f represents a rounding offset. In this embodiment, f = 0.

As described above, a new prediction signal is generated by applying interpolation for each coordinate in the prediction target block subjected to geometric transformation.

The geometric transformation prediction unit 202 can also generate a prediction signal using a conventional translation model. The coordinate conversion formula of the translation model is expressed by formula (19).

Equation (19) is the same as the parameter (a, b, d, e) in Equation (8) being (1, 0, 0, 1). When the prediction selection information 213 is designated not to use the geometric transformation prediction, the geometric transformation prediction unit 202 can calculate the equation (2) regardless of the values of the first geometric transformation parameter 208 and the second geometric transformation parameter 209. The conventional motion compensated prediction can be realized by changing 8) to Equation (19) and deriving coordinates.

In this embodiment, an example in which bilinear interpolation is used as an interpolation method is shown. However, nearest neighbor interpolation, cubic convolution interpolation, linear filter interpolation, Lagrange interpolation, spline, and so on. Any interpolation method such as an interpolation method or a Lanczos interpolation method may be applied.

In the present embodiment, an example of 1/64 pixel accuracy is shown as the interpolation accuracy, but any accuracy may be used.

The prediction separation switch 201 switches the output terminals of the two prediction signals 206 output from the geometric transformation prediction unit 202. That is, the prediction separation switch 201 uses the prediction selection information 213 (FIG. 1) as the output terminal of the prediction signal 215 generated by the first geometric transformation parameter 209 and the output terminal of the prediction signal 214 generated by the second geometric transformation parameter 208. In accordance with 123).

Examples of the

prediction selection information

213 and 123 are shown in FIG. When the index of the prediction selection information 213 is 0, the skip mode is selected. In the skip mode, transform coefficients, motion vectors, etc. are not encoded. For this reason, it means that the first geometric transformation prediction that needs to encode the additional motion information 210 is not selected. Moreover, when the index of the prediction selection information 213 is 9, intra prediction is selected. In this case, since the output terminal of the prediction separation switch 110 is connected to the intra prediction signal generation device 107, this means that the inter prediction signal generation device 109 does not need to perform the prediction signal generation processing.

Note that elements not defined in the present embodiment may be inserted between the rows in the index table shown in FIG. 10, and the prediction method, the prediction block size, the name of the prediction mode, or information on a combination of these may be included. The description you have may be included. Further, these index tables may be divided into a plurality of tables, or a plurality of index tables may be integrated. Moreover, it is not always necessary to use the same term, and it may be arbitrarily changed depending on the form to be used. Furthermore, each element described in the index table may be changed so as to be described by an independent flag.

The above is the outline of the inter prediction signal generation apparatus 200 in the present embodiment of the present invention.

With reference to FIG. 11, the process of the prediction signal generation of the inter prediction signal generation apparatus 109 is demonstrated. When the inter prediction signal generation process is started (S501), according to the prediction selection information 213 input from the outside of the inter prediction signal generation device 109, the prediction separation switch 201 sets the prediction selection information 213 to the first geometric transformation parameter. It is determined whether it is 209 (S502). When this determination is YES, the prediction separation switch 201 switches the output terminal of the geometric transformation prediction unit 202 to the first prediction signal 215. On the other hand, when the determination is NO, the prediction separation switch 201 switches the output terminal of the geometric transformation prediction unit 202 to the second prediction signal 215.

When the determination is YES, the motion information acquisition unit 701 in the predicted geometric transformation parameter derivation unit 205 determines an adjacent block based on the motion information 210 input from the outside (S508). Using the determined adjacent block motion information 210, an adjacent geometric transformation parameter is derived (S509). Receiving the derived adjacent geometric transformation parameter, the predicted geometric transformation parameter derivation unit 702 derives the predicted geometric transformation parameter 212 using equation (15) or equation (16) (S510). The predicted geometric transformation parameter 212 is output to the first geometric transformation parameter derivation unit 204.

The first geometric transformation parameter derivation unit 204 derives the first geometric transformation parameter 209 using the derived parameter information 211 and the predicted geometric transformation parameter 212 input from the outside (S511). The first geometric transformation parameter 209 is input to the geometric transformation prediction unit 202, and the geometric transformation unit 401 derives the coordinates after geometric transformation using the equation (8) (S512). Based on the calculated coordinates, the interpolation unit 402 performs an interpolation process on the reference image signal 207 input from the outside using Expression (18) to generate a first prediction signal 215 ( S513). The first prediction signal 215 is output to the outside via the prediction separation switch 201 to which the output terminal is connected (S515), and the first geometric transformation parameter 209 used in the prediction target block is stored in the memory (S514). ). The first geometric transformation parameter 209 held in the memory is used as an adjacent geometric transformation parameter or an adjacent motion vector of the next block (S517).

When the determination in step S502 is NO, the motion information acquisition unit 501 in the second geometric transformation parameter derivation unit 203 determines an adjacent block based on the motion information 210 input from the outside (S503). ). An adjacent motion vector is derived using the determined adjacent block motion information 210 (S504). Receiving the derived adjacent motion vector, the second parameter deriving unit 502 derives the second geometric transformation parameter 208 using the equations (9) to (10) (S505).

The second geometric transformation parameter 208 is input to the geometric transformation prediction unit 202, and the geometric transformation unit 401 derives coordinates after the geometric transformation using the equation (8) (S506). Based on the derived coordinates, the interpolation unit 402 performs an interpolation process on the reference image signal 207 input from the outside using Expression (18) to generate the second prediction signal 214 ( S507). The second prediction signal 214 is output to the outside via the prediction separation switch 201 (S515), and the second geometric transformation parameter 208 used in the prediction target block is stored in the memory (S514). The second geometric transformation parameter 208 held in the memory is used as an adjacent geometric transformation parameter or an adjacent motion vector of the next block (S517).

The above is the process flow of the inter prediction signal generation apparatus 200 in the present embodiment.

Next, a syntax structure in the moving picture coding apparatus 100 will be described. As shown in FIG. 12, the syntax 1600 mainly has three parts. The high-level syntax 1601 has higher layer syntax information that is equal to or higher than a slice. The slice level syntax 1602 has information necessary for decoding for each slice, and the macroblock level syntax 1603 has information necessary for decoding for each macroblock.

Each part has a more detailed syntax. High level syntax 1601 includes sequence and picture level syntax, such as sequence parameter set syntax 1604 and picture parameter set syntax 1605. The slice level syntax 1602 includes a slice header syntax 1606, a slice data syntax 1607, and the like. The macroblock level syntax 1603 includes a macroblock layer syntax 1608, a macroblock prediction syntax 1609, and the like.

In the example of the slice header syntax 1605 shown in FIG. 13, slice_affine_motion_prediction_flag is a syntax element indicating whether to apply geometric transformation prediction to a slice. When slice_affine_motion_prediction_flag is 0, the geometric transformation prediction unit 202 does not use the parameters (a, b, d, e) in Equation (8) but uses Equation (19) for this slice.

Equation (19) H.264 represents motion compensation prediction using a translation model used in H.264 and the like, and parameters (c, f) correspond to motion vectors. When this flag is 0, it is the same as the motion compensation prediction of the conventional translation model. On the other hand, when slice_affine_motion_prediction_flag is 1, the prediction separation switch 201 dynamically switches the prediction signal as indicated by the prediction selection information 213 in the slice.

In the example of the slice data syntax 1606 shown in FIG. 14, mb_skip_flag is a flag indicating whether or not the macroblock is encoded in the skip mode. In the skip mode, transform coefficients, motion vectors, etc. are not encoded. For this reason, the first geometric transformation prediction is not applied to the skip mode.

“AvailAffineMode” is an internal parameter indicating whether or not the second geometric transformation prediction can be used in the macroblock. When AvailAffineMode is 0, it means that the prediction selection information 213 is set not to use the second geometric transformation prediction. When the adjacent motion vector of the adjacent block and the motion vector of the prediction target block have the same value, AvailAffineMode is 0. Otherwise, AvailAffineMode is 1.

The setting of AvailAffineMode can also be set using an adjacent geometric transformation parameter or an adjacent motion vector. For example, when the adjacent motion vector points in a completely different direction, there is a possibility that an object boundary exists in the adjacent block of the current prediction target block. Therefore, it is possible to set AvailAffineMode to 0.

On the other hand, when AvailAffineMode is 1, mb_affine_motion_skip_flag indicating whether to use the second geometric transformation prediction or the motion compensation prediction is encoded. When mb_affine_motion_skip_flag is 1, it means that the second geometric transformation prediction is applied to the skip mode. When mb_affine_motion_skip_flag is 0, it means that motion compensation prediction is applied using equation (19).

In the example of the macroblock layer syntax 1607 shown in FIG. 15, mb_type indicates macroblock type information. That is, whether the current macroblock is intra-coded, inter-coded, what block shape is being predicted, whether the prediction direction is unidirectional prediction or bidirectional prediction, etc. Contains information. The mb_type is passed to the macroblock prediction syntax and the submacroblock prediction syntax indicating the syntax of the subblock in the macroblock.

In the example of the macroblock prediction syntax shown in FIG. 16, mb_affine_pred_flag indicates whether the first geometric transformation prediction or the second geometric transformation prediction is used in the block. When the flag is 0, the prediction selection information 213 is set to use the second geometric transformation parameter. On the other hand, when the flag is 1, the prediction selection information 213 is set to use the first geometric transformation parameter.

NumMbPart () is an internal function that returns the number of block divisions specified in mb_type. It is 1 for a 16 × 16 pixel block, 2 for an 8 × 16 pixel block, and 2 for an 8 × 8 pixel block. In the case, 4 is output.

In the figure, mv_l0 and mv_l1 indicate motion vector difference information in the macroblock. The motion vector information is a value set by the motion information search unit 106 and obtained by taking a difference from a predicted motion vector not disclosed in the present embodiment.

In the figure, mvd_l0_affine and mvd_l1_affine indicate derived parameters in the macroblock, and indicate difference information of components (a, b, d, e) excluding motion vectors of affine transformation parameters. This syntax element is encoded only when the first geometric transformation parameter is selected.

In the example of the sub-macroblock prediction syntax shown in FIG. 17, mb_affine_pred_flag indicates whether to use the first geometric transformation prediction or the second geometric transformation prediction in the block. When the flag is 0, the prediction selection information 213 is set to use the second geometric transformation parameter. On the other hand, when the flag is 1, the prediction selection information 213 is set to use the first geometric transformation parameter.

In the figure, mv_l0 and mv_l1 indicate motion vector difference information in the sub macroblock. The motion vector information is a value set by the motion information search unit 106 and obtained by taking a difference from a predicted motion vector not disclosed in the present embodiment.

Mvd_l0_affine and mvd_l1_affine in the figure indicate derived parameters in the sub-macroblock, and indicate difference information of components (a, b, d, e) excluding the motion vectors of the affine transformation parameters. This syntax element is encoded only when the first geometric transformation parameter is selected.

In the syntax structure according to the first embodiment, when the prediction target pixel block is in the skip mode, the conventional motion compensated prediction or the second geometric transformation prediction using the translation model can be selected, and other inter predictions can be selected. In some cases, the first geometric transformation prediction or the second geometric transformation prediction can be selected.

It should be noted that syntax elements not defined in the present embodiment may be inserted between lines in the syntax tables shown in FIGS. 12 to 17, and descriptions regarding other conditional branches may be included. Further, the syntax table may be divided into a plurality of tables, or a plurality of syntax tables may be integrated. Moreover, it is not always necessary to use the same term, and it may be arbitrarily changed depending on the form to be used.

According to the above embodiment, two geometric transformation parameters indicating information related to the shape of the image by the geometric transformation of the pixel block, that is, the first geometric transformation parameter and the second geometric transformation parameter are derived, and any of these geometric transformation parameters is derived. A prediction signal is generated by performing motion compensation prediction using a geometric transformation parameter selected according to prediction selection information indicating whether or not to select.

[Second Embodiment]
Next, a second embodiment will be described. The configuration of the video encoding apparatus according to the second embodiment is the same as that of the first embodiment. Note that blocks and syntax having the same functions as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted here. In the second embodiment, only the syntax structure is different from the first embodiment.

FIG. 18 is a diagram illustrating an example of the macroblock layer syntax 1607. Mb_type shown in the figure indicates macroblock type information. That is, whether the current macroblock is intra-coded, inter-coded, what block shape is being predicted, whether the prediction direction is unidirectional prediction or bidirectional prediction, etc. Contains information. The mb_type is passed to the macroblock prediction syntax and the submacroblock prediction syntax indicating the syntax of the subblock in the macroblock. mb_additional_affine_motion_flag indicates flag information for selecting whether to use the first geometric transformation parameter or the second geometric transformation parameter for the prediction target block. When this flag is 0, the second geometric transformation parameter is used, and when this flag is 1, the first geometric transformation parameter is used.

In the example of the macroblock prediction syntax shown in FIG. 19, whether mb_affine_pred_flag uses a geometric transformation prediction including a first geometric transformation prediction or a second geometric transformation prediction, or uses a motion compensated prediction of a translation model. , Shows. When the flag is 0, the prediction selection information 213 is set to use the motion compensated prediction of the translation model regardless of the mb_additional_affine_motion_flag. On the other hand, when the flag is 1, the prediction selection information 213 is set to use the first geometric transformation parameter or the second geometric transformation parameter according to the flag information of mb_additional_affine_motion_flag.

In the example of the sub-macroblock prediction syntax shown in FIG. 20, mb_affine_pred_flag is a block that uses geometric transformation prediction including first geometric transformation prediction or second geometric transformation prediction, or uses motion compensation prediction of a translation model. Indicates whether or not When the flag is 0, the prediction selection information 213 is set to use the motion compensated prediction of the translation model regardless of the mb_additional_affine_motion_flag. On the other hand, when the flag is 1, the prediction selection information 213 is set to use the first geometric transformation parameter or the second geometric transformation parameter according to the flag information of mb_additional_affine_motion_flag.

In the syntax structure according to the second embodiment, when the pixel block to be predicted is in the skip mode, the conventional motion compensation or the second geometric transformation prediction using the translation model can be selected. Then, at the macroblock level, it is determined whether the first geometric transformation prediction or the second geometric transformation prediction is used, and the geometric transformation prediction including the first geometric transformation prediction or the second geometric transformation prediction at the sub macroblock level. Or using motion compensated prediction of a translation model.

It should be noted that syntax elements not defined in the present embodiment may be inserted between lines in the syntax tables shown in FIGS. 18 to 20, and descriptions regarding other conditional branches may be included. Further, the syntax table may be divided into a plurality of tables, or a plurality of syntax tables may be integrated. Moreover, it is not always necessary to use the same term, and it may be arbitrarily changed depending on the form to be used.

As described above, in the first embodiment, when predicting an object having a motion that is not suitable for a rectangular block, excessive block division is prevented and block division information is prevented from increasing. Improve coding efficiency and improve subjective image quality by predicting motion with geometric deformation in a block without applying additional information, and applying an optimal prediction method for each. There is an effect.

Next, third to fourth embodiments relating to moving picture decoding will be described.

[Third Embodiment]
A moving picture decoding apparatus according to the third embodiment will be described with reference to FIG. The video decoding device 300 decodes, for example, encoded data generated by the video encoding device according to the first embodiment.

The video decoding device 300 decodes the encoded data 311 stored in the input buffer 301 and outputs a decoded image signal 317 to the output buffer 309. The encoded data 311 is encoded data that is transmitted from, for example, the moving image encoding apparatus 100, transmitted through a storage system or a transmission system, once stored in the input buffer 301, and multiplexed.

The video decoding device 300 includes an entropy decoding unit 302, an inverse quantization / inverse conversion unit 303, an adder 304, a reference image memory 305, an intra prediction signal generation device 306, an inter prediction signal generation device 307, and a prediction separation switch 308. Have The moving picture decoding apparatus 300 is also connected to the input buffer 301, the output buffer 309, and the decoding control unit 310.

The entropy decoding unit 302 decodes the encoded data 311 by syntax analysis based on the syntax for each frame or field. The entropy decoding unit 302 sequentially entropy-decodes the code string of each syntax, and reproduces the motion information 315, the derived parameter information 316, the encoding parameters of the decoding target block, and the like. The encoding parameter includes all parameters necessary for decoding such as prediction information, information on transform coefficients, information on quantization, and the like.

The transform coefficient decoded by the entropy decoding unit 302 is input to an inverse quantization / inverse transform unit 303 including an inverse quantizer and an inverse transformer. Various information relating to the quantization decoded by the entropy decoding unit 302, that is, the quantization parameter and the quantization matrix are set in the internal memory of the decoding control unit 310 and loaded when used as an inverse quantization process. The

In the inverse quantization / inverse transform unit 303, the inverse quantization process is first performed by the inverse quantizer using the information on the loaded quantization. The inverse quantized transform coefficient is then subjected to inverse transform processing, for example, inverse discrete cosine transform, by an inverse transformer. Here, the inverse orthogonal transform has been described. However, when wavelet transform or the like is performed in the encoding device, the inverse quantization / inverse transform unit 303 performs the corresponding inverse quantization and inverse wavelet transform. Good.

Through the inverse quantization / inverse transform unit 303, the restored prediction error signal 312 is input to the adder 304. The adder 304 adds the prediction error signal 312 and the prediction signal 416 output from the prediction separation switch 308 described later to generate a decoded image signal 317.

The generated decoded image signal 317 is output from the moving image decoding apparatus 300, temporarily stored in the output buffer 317, and then output according to the output timing managed by the decoding control unit 310. The decoded image signal 317 is stored in the reference image memory 305 and becomes a reference image signal 313.

The reference image signal 313 is sequentially read from the reference image memory 305 for each frame or each field, and is input to the intra prediction signal generation device 306 or the inter prediction signal generation device 307. The motion information 315 used in the decoding target pixel block is stored in the decoding control unit 310, and is appropriately loaded from the decoding control unit 310 and used in the inter prediction signal generation processing of the next block.

The intra prediction signal generation device 306 has the same function and configuration as the intra prediction signal generation device 107 in the video encoding device 100 shown in FIG. That is, the intra prediction signal generation device 306 performs intra prediction using the input reference image signal 313. For example, H.M. In H.264, a prediction signal is generated by performing pixel interpolation in the prediction direction such as the vertical direction and the horizontal direction using an encoded reference pixel value adjacent to the prediction target block. In addition, after interpolating the pixel value using a predetermined interpolation method, the interpolated pixel value may be copied in a predetermined prediction direction. The generated prediction signal 416 is output to the prediction separation switch 308.

The inter prediction signal generation device 307 has the same function and configuration as the inter prediction signal generation device 109 shown in FIGS. 1 and 2 and FIGS. That is, the inter prediction signal generation device 307 generates the prediction signal 416 using the input motion information 315, derivation parameter information 316, reference image signal 313, and prediction selection information 314. The motion information 315, the derived parameter information 316, the reference image signal 313, and the prediction selection information 314 are the motion information 210, the derived parameter information 211, the reference image signal 207, and the input to the inter prediction signal generation device 109 of the video encoding device 100. Corresponding to each of the prediction selection information 213, the prediction signal 416 is generated in the inter prediction signal generation device 109 shown in FIG.

The generated prediction signal 416 is output to the prediction separation switch 308. The prediction separation switch 308 selects the output terminal of the intra prediction signal generation device 308 and the output terminal of the inter prediction signal generation device 307 according to the prediction selection information 314. When the information shown in the prediction selection information 314 is intra prediction, the switch is connected to the intra prediction signal generation device 308. On the other hand, when the prediction selection information 314 is inter prediction, the switch is connected to the inter prediction signal generation device 307. The prediction selection information 314 is the same as the prediction selection information 123 set by the prediction selection unit 111 of the video encoding device 100, and is shown in FIG.

The above is the outline of the processing of the video decoding device 300 of the third embodiment.

Next, the syntax structure of the encoded data decoded by the video decoding device 300 will be described. The encoded data 311 decoded by the video decoding device 300 may have the same syntax structure as that of the video encoding device 100. Here, the same syntax as in FIGS. 12 to 17 is used.

That is, in the syntax structure in the moving picture decoding apparatus 300, the syntax 1600 has mainly three parts as shown in FIG. The high-level syntax 1601 has higher layer syntax information that is equal to or higher than a slice. The slice level syntax 1602 has information necessary for decoding for each slice, and the macroblock level syntax 1603 has information necessary for decoding for each macroblock.

Equation (19) H.264 represents motion compensation prediction using a translation model used in H.264 and the like, and parameters (c, f) correspond to motion vectors. When this flag is 0, it is the same as the motion compensation prediction of the conventional translation model. On the other hand, when slice_affine_motion_prediction_flag is 1, the prediction separation switch 201 dynamically switches the prediction signal as indicated by the prediction selection information 314 in the slice.

“AvailAffineMode” is an internal parameter indicating whether or not the second geometric transformation prediction can be used in the macroblock. When AvailAffineMode is 0, it means that the prediction selection information 314 is set not to use the second geometric transformation prediction. When the adjacent motion vector of the adjacent block and the motion vector of the prediction target block have the same value, AvailAffineMode is 0. Otherwise, AvailAffineMode is 1.

In the example of the macroblock prediction syntax shown in FIG. 16, mb_affine_pred_flag indicates whether the first geometric transformation prediction or the second geometric transformation prediction is used in the block. When the flag is 0, the prediction selection information 314 is set to use the second geometric transformation parameter. On the other hand, when the flag is 1, the prediction selection information 314 is set to use the first geometric transformation parameter.

In the figure, mv_l0 and mv_l1 indicate motion vector difference information in the macroblock. The motion vector information is a value that is set by the motion information search unit 106 of the video encoding device 100 and is obtained by taking a difference from a predicted motion vector that is not disclosed in the present embodiment.

In the example of the sub-macroblock prediction syntax shown in FIG. 17, mb_affine_pred_flag indicates whether to use the first geometric transformation prediction or the second geometric transformation prediction in the block. When the flag is 0, the prediction selection information 314 is set to use the second geometric transformation parameter. On the other hand, when the flag is 1, the prediction selection information 314 is set to use the first geometric transformation parameter.

In the figure, mv_l0 and mv_l1 indicate motion vector difference information in the sub macroblock. The motion vector information is a value that is set by the motion information search unit 106 of the video encoding device 100 and is obtained by taking a difference from a predicted motion vector that is not disclosed in the present embodiment.

The syntax structure shown in FIGS. 12 to 17 can select the conventional motion compensation prediction or the second geometric transformation prediction using the translation model when the prediction target pixel block is in the skip mode, and other inter predictions. In this case, the first geometric transformation prediction or the second geometric transformation prediction can be selected.

[Fourth Embodiment]
Next, a fourth embodiment will be described. The configuration of the video decoding apparatus according to the fourth embodiment is the same as that of the third embodiment shown in FIG. Note that blocks and syntax having the same functions as those of the third embodiment are denoted by the same reference numerals, and description thereof is omitted here. In the fourth embodiment, only the syntax structure is different from the third embodiment, but is substantially the same as the syntax structure of the second embodiment.

As in the example of the macroblock layer syntax 1607 shown in FIG. 18, mb_type indicates macroblock type information. That is, whether the current macroblock is intra-coded, inter-coded, what block shape is being predicted, whether the prediction direction is unidirectional prediction or bidirectional prediction, etc. Contains information. The mb_type is passed to the macroblock prediction syntax and the submacroblock prediction syntax indicating the syntax of the subblock in the macroblock. mb_additional_affine_motion_flag indicates flag information for selecting whether to use the first geometric transformation parameter or the second geometric transformation parameter for the prediction target block. When this flag is 0, the second geometric transformation parameter is used, and when this flag is 1, the first geometric transformation parameter is used.

In the example of the macroblock prediction syntax shown in FIG. 19, whether mb_affine_pred_flag uses a geometric transformation prediction including a first geometric transformation prediction or a second geometric transformation prediction or uses a motion compensation prediction of a translation model in a block. , Shows. When the flag is 0, the prediction selection information 314 is set to use the motion compensation prediction of the translation model regardless of the mb_additional_affine_motion_flag. On the other hand, when the flag is 1, the prediction selection information 314 is set to use the first geometric transformation parameter or the second geometric transformation parameter according to the flag information of mb_additional_affine_motion_flag.

In the example of the sub-macroblock prediction syntax shown in FIG. 20, mb_affine_pred_flag is a block that uses geometric transformation prediction including first geometric transformation prediction or second geometric transformation prediction, or uses motion compensation prediction of a translation model. Indicates whether or not When the flag is 0, the prediction selection information 314 is set to use the motion compensation prediction of the translation model regardless of the mb_additional_affine_motion_flag. On the other hand, when the flag is 1, the prediction selection information 314 is set to use the first geometric transformation parameter or the second geometric transformation parameter according to the flag information of mb_additional_affine_motion_flag.

In the syntax structure according to the third embodiment, when the pixel block to be predicted is in the skip mode, the conventional motion compensation or the second geometric transformation prediction using the translation model can be selected. Then, at the macroblock level, it is determined whether the first geometric transformation prediction or the second geometric transformation prediction is used, and the geometric transformation prediction including the first geometric transformation prediction or the second geometric transformation prediction at the sub macroblock level. Or using motion compensated prediction of a translation model.

(Modification of the first to fourth embodiments)
(1) In the first to fourth embodiments, the processing target frame is divided into short blocks of 16 × 16 pixel size or the like, and as shown in FIG. The case of encoding / decoding has been described, but the encoding order and decoding order are not limited to this. For example, encoding and decoding may be performed sequentially from the lower right to the upper left, or encoding and decoding may be performed sequentially from the center of the screen toward the spiral. Furthermore, encoding and decoding may be performed in order from the upper right to the lower left, or encoding and decoding may be performed in order from the peripheral part to the center part of the screen.

(2) In the first to fourth embodiments, the block size has been described as a 4 × 4 pixel block and an 8 × 8 pixel block. However, the prediction target block does not need to have a uniform block shape, and 16 × Any block size such as an 8 pixel block, an 8 × 16 pixel block, an 8 × 4 pixel block, or a 4 × 8 pixel block may be used. Also, it is not necessary to make all blocks the same within one macroblock, and blocks of different sizes may be mixed. In this case, as the number of divisions increases, the amount of codes for encoding or decoding the division information increases. Therefore, the block size may be selected in consideration of the balance between the code amount of the transform coefficient and the locally decoded image or the decoded image.

(3) In the first to fourth embodiments, the luminance signal and the color difference signal are not divided and described as an example limited to one color signal component. However, when the prediction processing is different between the luminance signal and the color difference signal, different prediction methods may be used, or the same prediction method may be used. When a different prediction method is used, the prediction method selected for the color difference signal is encoded or decoded by the same method as the luminance signal.

(4) In the first to fourth embodiments, the example in which the information indicating which neighboring block used the predicted geometric transformation parameter is not included in the encoded data has been described. However, information on which neighboring block is used may be included in the encoded data.

By using the method of the above-described embodiment, in order to predict a moving object that is not suitable for the parallel movement model, excessive block division is performed and block division information is prevented from increasing. In other words, by predicting the geometric deformation of the object in the block without significantly increasing the additional information and applying a suitable geometric transformation parameter to each, the coding efficiency is improved and the subjective image quality is also improved. There is an effect.

Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

The method of the present invention described in the above embodiment can be executed by a computer, and as a program that can be executed by the computer, a magnetic disk (flexible disk, hard disk, etc.), an optical disk (CD-ROM, DVD) It is also possible to store and distribute in a recording medium such as a semiconductor memory.

According to the prediction signal generation device, the moving image encoding device, and the moving image decoding device according to the embodiment of the present invention, the error of the parameter generated by the derivation of the geometric transformation parameter used for the geometric transformation motion compensation prediction is reduced. Thus, it is possible to provide a prediction signal generation device, a moving image encoding device, and a moving image decoding device that suppress error propagation and improve prediction efficiency without increasing the code amount.

Also, as another aspect of the present invention, the following prediction signal generation method, video encoding method, and video decoding method can be provided.

A prediction signal generation method for generating a prediction signal sets prediction selection information indicating whether to use a first geometric conversion parameter or a second geometric conversion parameter indicating information related to the shape of an image obtained by pixel block geometric conversion. And one or more second adjacent blocks for which prediction signal generation processing has already been completed among a plurality of first adjacent blocks adjacent to one pixel block of the plurality of pixel blocks into which the image signal is divided And obtaining the first geometric transformation parameter and the second geometric transformation parameter, and predictive geometric transformation of the one pixel block from the geometric transformation parameters of the one or more second neighboring blocks A step of deriving a parameter, a derived value of a geometric transformation parameter input from the outside, and the predicted geometric transformation parameter, in advance Deriving and setting the first geometric transformation parameter according to a selected method, and the second geometry based on motion information of the one pixel block and the one or more second neighboring blocks. Setting a transformation parameter; using the first geometric transformation parameter or the second geometric transformation parameter indicated in the selection information for a reference image signal when performing motion compensation for the one pixel block; Performing a process to generate a prediction signal.

The moving image encoding method includes a step of setting prediction selection information indicating whether to use a first geometric transformation parameter or a second geometric transformation parameter indicating information related to an image shape by geometric transformation of a pixel block; Motion information of one or more second adjacent blocks for which prediction signal generation processing has already been completed among a plurality of first adjacent blocks adjacent to one pixel block of the plurality of pixel blocks into which the signal is divided, or Obtaining a first geometric transformation parameter and the second geometric transformation parameter, and deriving a predicted geometric transformation parameter of the one pixel block from the geometric transformation parameters of the one or more second neighboring blocks. And a derived method of the geometric transformation parameter input from the outside and the predicted geometric transformation parameter by a predetermined method. Deriving and setting the first geometric transformation parameter, and setting the second geometric transformation parameter based on motion information of the one pixel block and the one or more second adjacent blocks. Prediction is performed by performing a geometric transformation process using the first geometric transformation parameter or the second geometric transformation parameter indicated in the selection information on a reference image signal when performing motion compensation on the one pixel block. A step of generating a signal, a step of encoding the prediction selection information indicating whether to use the first geometric transformation parameter or the second geometric transformation parameter, and the first geometric transformation parameter are selected. The derived value of the geometric transformation parameter from the first geometric transformation parameter and the predicted geometric transformation parameter by a predetermined method. Derived to includes a step of encoding the derived value, a step of encoding the differential signal of the input image signal and the prediction signal.

A moving image decoding method for decoding moving image encoded data obtained by encoding an input image signal in units of a plurality of pixel blocks and performing decoding processing by a prescribed method includes a plurality of pixels obtained by dividing an input image signal. Of the plurality of first neighboring blocks adjacent to one pixel block of the block, the motion information of one or more second neighboring blocks that have already been decoded, or the shape of the image by the geometric transformation of the pixel block Obtaining a first geometric transformation parameter and a second geometric transformation parameter indicating information; and decoding selection information indicating whether to use the first geometric transformation parameter or the second geometric transformation parameter Decoding a derived value of the geometric transformation parameter if the first geometric transformation parameter is selected; and the one or more second neighbors Deriving a predicted geometric transformation parameter of the one pixel block from the geometric transformation parameter of the block, and deriving the predicted geometric transformation parameter from the decoded derived value of the geometric transformation parameter and the predicted geometric transformation parameter in a predetermined manner. Deriving and setting one geometric transformation parameter; setting the second geometric transformation parameter based on motion information of the one pixel block and the one or more second neighboring blocks; A prediction signal obtained by performing a geometric transformation process using the first geometric transformation parameter or the second geometric transformation parameter indicated in the selection information on a reference image signal when performing motion compensation for the one pixel block. Generating.

The moving image encoding device, the moving image decoding device, the moving image encoding method, and the moving image decoding method according to the present invention are useful for highly efficient moving image encoding, and in particular, geometric transformation motion compensation. It is suitable for moving picture coding that reduces the motion detection process necessary for estimating the geometric transformation parameter used for prediction.

Claims

A first setting unit for setting prediction selection information indicating which of the first geometric transformation parameter indicating the information related to the shape of the image by the geometric transformation of the pixel block and the second geometric transformation parameter is used;
Motion information of one or more second adjacent blocks for which prediction signal generation processing has already been completed among a plurality of first adjacent blocks adjacent to one pixel block of the plurality of pixel blocks into which the image signal is divided or An acquisition unit for acquiring the geometric transformation parameters;
A derivation unit for deriving predicted geometric transformation parameters of the one pixel block from geometric transformation parameters of the one or more second adjacent blocks;
A second setting unit for deriving and setting the first geometric transformation parameter from the input geometric transformation parameter derived value and the predicted geometric transformation parameter by a predetermined method;
A third setting unit that sets the second geometric transformation parameter based on motion information of the one pixel block and the one or more second adjacent blocks;
A prediction signal is obtained by performing a geometric transformation process using the first geometric transformation parameter or the second geometric transformation parameter indicated in the selection information on a reference image signal when performing motion compensation for the one pixel block. A generating unit to generate;
A prediction signal generation apparatus comprising:
The prediction signal according to claim 1, wherein the third setting unit obtains the second geometric transformation parameter by transforming the motion information based on a relative position between the adjacent block and the pixel block. Generator.
The motion information acquisition unit acquires one motion information based on a motion vector obtained by performing motion prediction on the reference image signal for each of the adjacent blocks of the plurality of second adjacent blocks. The prediction signal generation device according to 1.
The motion information acquisition unit, in the one or more second adjacent blocks, when the geometric transformation prediction is not selected, the motion for which the prediction process corresponding to the adjacent block when the adjacent block is used as a reference has been completed The prediction signal generation device according to claim 1, wherein a geometric transformation parameter is derived using a vector.
In addition to the first aspect, a first encoding unit that encodes the prediction selection information indicating whether to use the first geometric transformation parameter or the second geometric transformation parameter;
When the first geometric transformation parameter is selected, a derived value of the geometric transformation parameter is derived from the first geometric transformation parameter and the predicted geometric transformation parameter by a predetermined method, and the derived value is obtained. A second encoding unit for encoding;
A third encoding unit that encodes information indicating a difference signal between the input image signal and the prediction signal;
A video encoding apparatus using the prediction signal generation apparatus according to claim 1.
2. The prediction signal according to claim 1, wherein the first encoding unit further encodes information indicating which of the one or more second adjacent blocks the geometric transformation parameters are used in the deriving unit. A video encoding apparatus using the generation apparatus.
In a moving image decoding apparatus that decodes moving image encoded data obtained by encoding an input image signal in units of a plurality of pixel blocks and performs decoding processing by a prescribed method,
Motion information or pixels of one or more second adjacent blocks that have already been decoded among a plurality of first adjacent blocks adjacent to one pixel block of the plurality of pixel blocks into which the input image signal is divided A motion information acquisition unit that acquires a geometric transformation parameter indicating information related to the shape of the image by geometric transformation of the block;
A first decoding unit for decoding selection information indicating whether to use the first geometric transformation parameter or the second geometric transformation parameter;
A second decoding unit that decodes a derived value of the geometric transformation parameter when the first geometric transformation parameter is selected;
A derivation unit for deriving predicted geometric transformation parameters of the one pixel block from the geometric transformation parameters of the one or more second adjacent blocks;
A first setting unit that derives and sets the first geometric transformation parameter from the decoded derived value of the geometric transformation parameter and the predicted geometric transformation parameter by a predetermined method;
A second setting of the second geometric transformation parameter based on motion information of the one pixel block and the one or more second neighboring blocks when the second geometric transformation parameter is selected; The setting part of
A prediction signal is obtained by performing a geometric transformation process using the first geometric transformation parameter or the second geometric transformation parameter indicated in the selection information on a reference image signal when performing motion compensation for the one pixel block. A generating unit to generate;
A moving picture decoding apparatus comprising: