WO2014049982A1

WO2014049982A1 - Video encoding device, video decoding device, video encoding method and video decoding method

Info

Publication number: WO2014049982A1
Application number: PCT/JP2013/005285
Authority: WO
Inventors: 彰峯澤; 杉本　和夫; 関口　俊一
Original assignee: 三菱電機株式会社
Priority date: 2012-09-28
Filing date: 2013-09-06
Publication date: 2014-04-03
Also published as: JPWO2014049982A1

Abstract

The present invention addresses the problem of intra prediction efficiency or orthogonal transformation coefficient encoding efficiency decreasing due to spatial resolution in the vertical direction dropping to 1/2, and correlation between pixels decreasing when subjecting an interlace signal to field encoding. The present invention is provided with a variable-length encoding means that splits orthogonal transformation blocks into orthogonal transformation sub-blocks, and switches the encoding order of post-quantization transformation coefficients, which comprise compressed data, between orthogonal transformation block units and orthogonal transformation sub-block units, on the basis of whether a flag based on information indicating whether field encoding is to be performed is enabled.

Description

Moving picture encoding apparatus, moving picture decoding apparatus, moving picture encoding method, and moving picture decoding method

The present invention relates to a moving image encoding apparatus and moving image encoding method for encoding a moving image with high efficiency, a moving image decoding apparatus and a moving image decoding method for decoding a moving image encoded with high efficiency, and It is about.

Conventionally, MPEG and ITU-T H.264. In an international standard video coding scheme such as 26x, an input video frame is divided into macroblock units each composed of 16 × 16 pixel blocks, and after motion compensation prediction is performed, a prediction difference signal is orthogonally converted into block units. Information compression is performed by quantization.

Here, FIG. 23 shows MPEG-4 AVC / H. 1 is a configuration diagram illustrating a H.264 video encoding device.
In this moving image encoding apparatus, when the block division unit 101 receives an image signal to be encoded, the block division unit 101 divides the image signal into macroblock units, and outputs the macroblock unit image signal to the prediction unit 102 as a divided image signal. To do.
When the prediction unit 102 receives the divided image signal from the block division unit 101, the prediction unit 102 predicts the image signal of each color component in the macroblock within a frame or between frames, and calculates a prediction difference signal.

In particular, when motion compensation prediction is performed between frames, a motion vector is searched in units of a macroblock itself or a subblock obtained by further dividing a macroblock.
Then, using the motion vector, a motion compensation prediction image is generated by performing motion compensation prediction on the reference image signal stored in the memory 107, and a prediction signal indicating the motion compensation prediction image and the divided image signal are generated. A prediction difference signal is calculated by obtaining the difference.

On the other hand, when performing intra-frame prediction, Non-Patent Document 1 can select one prediction mode from a plurality of prediction modes for each block as a luminance intra prediction mode.
FIG. 24 is an explanatory diagram illustrating an intra prediction mode when the luminance block size is 4 × 4 pixels.
In FIG. 24, white circles in the block represent pixels to be encoded, and black circles represent encoded pixels that are pixels used for prediction. When the luminance block size is 4 × 4 pixels, nine intra prediction modes from mode 0 to mode 8 are defined.

In FIG. 24, mode 2 is a mode in which average value (DC) prediction is performed, and the pixels in the block are predicted based on the average value of the adjacent pixels above and to the left of the block.
Modes other than mode 2 are modes in which directionality prediction is performed. Mode 0 is vertical prediction, in which a prediction image is generated by repeating adjacent pixels on the block in the vertical direction. For example, mode 0 is selected for a vertical stripe pattern.
Mode 1 is horizontal prediction, in which a prediction image is generated by repeating the adjacent pixel on the left of the block in the horizontal direction. For example, mode 1 is selected for a horizontal stripe pattern.
In mode 3 to mode 8, a predicted image is generated by generating an interpolation pixel in a predetermined direction (direction indicated by an arrow) using the encoded pixels on the upper or left side of the block.

Here, the luminance block size to which the intra prediction is applied can be selected from 4 × 4 pixels, 8 × 8 pixels, and 16 × 16 pixels, and in the case of 8 × 8 pixels, 4 × 4 pixels. Similarly, nine intra prediction modes are defined. However, the pixels used for prediction are not encoded pixels themselves, but those obtained by performing filter processing on these pixels.
On the other hand, in the case of 16 × 16 pixels, in addition to intra prediction modes related to average value prediction, vertical direction prediction, and horizontal direction prediction, four intra prediction modes called “plane prediction” are defined.
The intra prediction mode related to the Plane prediction is a mode in which a pixel generated by interpolating the upper and left encoded adjacent pixels in the block in an oblique direction is a prediction value.

Further, the prediction unit 102 outputs the prediction signal generation parameter determined when obtaining the prediction signal to the variable length coding unit 108.
Note that the prediction signal generation parameter includes, for example, information such as an intra prediction mode indicating how to perform spatial prediction within a frame and a motion vector indicating a motion amount between frames.

When the compression unit 103 receives the prediction difference signal from the prediction unit 102, the compression unit 103 performs DCT (discrete cosine transform) processing on the prediction difference signal to remove signal correlation, and then obtains compressed data by quantization.
When the local decoding unit 104 receives the compressed data from the compression unit 103, the local decoding unit 104 inversely quantizes the compressed data and performs inverse DCT processing, whereby the prediction differential signal corresponding to the prediction differential signal output from the prediction unit 102 Is calculated.

When the adder 105 receives the prediction difference signal from the local decoding unit 104, the adder 105 adds the prediction difference signal and the prediction signal output from the prediction unit 102 to generate a local decoded image.
The loop filter 106 removes block distortion superimposed on the locally decoded image signal indicating the locally decoded image generated by the adder 105, and stores the locally decoded image signal after distortion removal in the memory 107 as a reference image signal. .

When the variable length coding unit 108 receives the compressed data from the compression unit 103, the variable length coding unit 108 performs entropy coding on the compressed data and outputs a bit stream that is a result of the coding.
Note that the variable length coding unit 108 multiplexes the prediction signal generation parameter output from the prediction unit 102 into the bit stream and outputs the bit stream.

Here, in general, as a video signal format to be encoded, a progressive signal in which all the frames shown in FIG. 25 are composed of signals at the same time and a frame shown in FIG. 26 are composed of signals (fields) at two different times. There is an interlace signal. In Non-Patent Document 1, in order to efficiently encode an interlace signal, various functions such as a function of adaptively switching whether the interlace signal is encoded as a frame or a field in units of pictures or macroblocks. A simple encoding tool is incorporated.

On the other hand, in the method disclosed in Non-Patent Document 2, a special encoding tool for improving the encoding efficiency of interlaced signals is not prepared. In Non-Patent Document 2, a filter process for increasing the continuity of block boundaries is performed on a predicted image in a specific intra prediction mode as shown in FIG. 27 during intra prediction. However, when encoding is performed on a field basis, the spatial correlation in the vertical direction is lowered, so that the effect of the filtering process on the upper end of the block may be greatly reduced.

Further, Non-Patent Document 2 divides the orthogonal transform block into 4 × 4 pixel unit blocks (orthogonal transform sub-blocks) called Coefficients Group (CG) as an orthogonal transform coefficient encoding method, and encodes coefficients in CG units. To implement. FIG. 28 shows the coding order (scan order) of the coefficients in the 16 × 16 pixel orthogonal transform block. In this way, 16 CGs in units of 4 × 4 pixels are encoded in order from the lower right CG, and each CG encodes 16 coefficients in the CG in order from the lower right coefficient. Specifically, first, flag information indicating whether or not a significant (nonzero) coefficient exists in 16 coefficients in the CG is encoded, and then a significant (nonzero) coefficient exists in the CG. Only whether each coefficient in the CG is a significant (non-zero) coefficient is encoded in the above order, and finally, coefficient value information is encoded in order for a significant (non-zero) coefficient. This is performed in the above order in units of CG. In this case, the encoding efficiency by entropy encoding can be increased by using a biased scan order so that significant (non-zero) coefficients are generated as continuously as possible. Since progressive video and interlaced video have different distributions of significant (non-zero) coefficients, they cannot be efficiently encoded in the scan order of FIG.

Since the moving image encoding apparatus of Non-Patent Document 2 is configured as described above, when the interlace signal is field-encoded, the vertical spatial resolution becomes 1/2 and the inter-pixel correlation decreases. There has been a problem that the prediction efficiency of intra prediction and the encoding efficiency of orthogonal transform coefficients are reduced.

The present invention has been made to solve the above-described problems, and even when an interlace signal is field-encoded, it is possible to improve encoding efficiency when the interlace signal is field-encoded. It is an object to obtain a moving image encoding device, a moving image decoding device, a moving image encoding method, and a moving image decoding method.

The moving picture coding apparatus according to the present invention comprises variable length coding means for generating a coded bitstream in which compressed data and coding modes are multiplexed, and the variable length coding means orthogonally transforms orthogonal transform blocks. Based on whether or not a flag based on information indicating whether or not to perform field encoding is valid and divided into transform sub-blocks, the coding order of the transform coefficients after quantization that is compressed data is changed to the orthogonal transform block. The unit is switched in units of the orthogonal transform sub-block.

According to the present invention, an orthogonal transform block is divided into orthogonal transform sub-blocks, and a quantized transform that is compressed data is based on whether or not a flag based on information indicating whether or not field coding is valid. Since the coding order of the coefficients is configured to be switched in units of the orthogonal transform block and the orthogonal transform sub-block, it is possible to realize efficient prediction processing and coding processing according to the characteristics of the field signal, There is an effect of improving the encoding efficiency.

It is a block diagram which shows the moving image encoder by Embodiment 1 of this invention. It is a flowchart which shows the processing content (moving image encoding method) of the moving image encoding device by Embodiment 1 of this invention. It is a block diagram which shows the moving image decoding apparatus by Embodiment 1 of this invention. It is a flowchart which shows the processing content (moving image decoding method) of the moving image decoding apparatus by Embodiment 1 of this invention. It is explanatory drawing which shows the example by which the largest encoding block is divided | segmented into a some encoding block hierarchically. (A) shows the distribution of the encoding block and prediction block after a division | segmentation, (b) is explanatory drawing which shows the condition where encoding mode m ( _Bn ) is allocated by hierarchy division | segmentation. Is an explanatory diagram showing an example of the prediction block P _i ^n-selectable intra prediction parameter coding block B ⁿ (intra prediction mode). It is an explanatory diagram showing an example of a pixel used for generating the predicted values of the pixels in the prediction block P _i ⁿ in the case of _{^{_{l i n = m i n =}}} 4. The upper left pixels in the prediction block P _i ⁿ is an explanatory diagram showing a relative coordinate whose origin. It is explanatory drawing which shows an example of a quantization matrix. It is explanatory drawing which shows the structural example in the case of using a some loop filter process in the loop filter part of the moving image encoder by Embodiment 1 of this invention. It is explanatory drawing which shows the structural example in the case of using a several loop filter process in the loop filter part of the moving image decoding apparatus by Embodiment 1 of this invention. It is explanatory drawing which shows an example of an encoding bit stream. It is explanatory drawing which shows the index of the class classification method of a pixel adaptive offset process. It is explanatory drawing which shows an example of distribution of the transformation coefficient in the orthogonal transformation of the size of 16x16 pixels. It is explanatory drawing which shows an example of distribution of the transformation coefficient in the orthogonal transformation of the size of 16x16 pixels in the field signal. It is explanatory drawing which shows the encoding order of the conversion factor in the orthogonal transformation of the size of 16x16 pixels in a field signal. It is explanatory drawing which shows the encoding order of the conversion factor in the orthogonal transformation of the size of 16x16 pixels in a field signal. It is explanatory drawing which shows the encoding order of the conversion factor in the orthogonal transformation of the size of 16x16 pixels in a field signal. It is explanatory drawing which shows the switching area | region of the filter in the filter process at the time of average value prediction. It is explanatory drawing which shows the reference pixel arrangement | positioning of the filter process at the time of average value prediction. It is explanatory drawing which shows the filter process with respect to the intra estimated image in the case of field coding. It is a block diagram which shows the moving image encoder disclosed by the nonpatent literature 1. It is explanatory drawing which shows intra prediction mode in case the block size of a brightness | luminance is 4x4 pixel. It is explanatory drawing which shows a progressive video signal. It is explanatory drawing which shows an interlace video signal. It is explanatory drawing which shows the filter process in intra prediction. It is explanatory drawing which shows the encoding order of the conversion factor in the orthogonal transformation of the size of 16x16 pixels.

Embodiment 1 FIG.
1 is a block diagram showing a moving picture coding apparatus according to Embodiment 1 of the present invention.
In FIG. 1, when a video signal is input as an input image, the slice dividing unit 14 performs a process of dividing the input image into one or more partial images called “slices” according to the slice division information determined by the encoding control unit 2. carry out. The slice division unit can be finely divided to a coding block unit described later. The slice division unit 14 constitutes a slice division unit.

Each time the block dividing unit 1 inputs the slice divided by the slice dividing unit 14, the slice dividing unit 1 divides the slice into maximum coding blocks which are coding blocks of the maximum size determined by the coding control unit 2, and Until the upper limit number of hierarchies determined by the encoding control unit 2 is reached, a process of dividing the maximum encoded block hierarchically into each encoded block is performed.
That is, the block dividing unit 1 divides the slice into each encoded block in accordance with the division determined by the encoding control unit 2, and performs a process of outputting the encoded block. Each coding block is divided into one or a plurality of prediction blocks which are prediction processing units.
The block dividing unit 1 constitutes a block dividing unit.

The encoding control unit 2 determines the maximum size of the encoded block that is a processing unit when the encoding process is performed, and sets the upper limit number of layers when the encoded block of the maximum size is hierarchically divided. By determining, the process of determining the size of each encoded block is performed.
The encoding control unit 2 also includes one or more selectable encoding modes (one or more intra encoding modes having different prediction block sizes indicating prediction processing units, one or more inter codes having different prediction block sizes, and the like). The coding mode to be applied to the coding block output from the block dividing unit 1 is selected from among the coding modes. As an example of the selection method, there is a method of selecting a coding mode having the highest coding efficiency for the coding block output from the block dividing unit 1 from one or more selectable coding modes.

In addition, when the coding mode having the highest coding efficiency is the intra coding mode, the coding control unit 2 sets the intra prediction parameters used when performing the intra prediction processing on the coding block in the intra coding mode. When the coding mode having the highest coding efficiency is the inter coding mode, determined for each prediction block that is the prediction processing unit indicated by the intra coding mode, the inter prediction process for the coding block is performed in the inter coding mode. The process which determines the inter prediction parameter used when implementing for every prediction block which is a prediction process unit which the said inter coding mode shows is implemented.
Further, the encoding control unit 2 performs a process of determining a prediction difference encoding parameter to be given to the transform / quantization unit 7 and the inverse quantization / inverse transform unit 8. The prediction difference encoding parameter includes orthogonal transform block division information indicating the division information of the orthogonal transform block, which is an orthogonal transform processing unit in the encoded block, and a quantum that specifies a quantization step size when the transform coefficient is quantized. Parameters are included.
The encoding control unit 2 constitutes an encoding control unit.

If the coding mode determined by the coding control unit 2 is the intra coding mode, the changeover switch 3 outputs the coded block output from the block dividing unit 1 to the intra prediction unit 4 and the coding control unit 2. If the coding mode determined by the above is the inter coding mode, a process of outputting the coding block output from the block dividing unit 1 to the motion compensation prediction unit 5 is performed.

When the intra control mode is selected by the encoding control unit 2 as the encoding mode corresponding to the encoded block output from the changeover switch 3, the intra prediction unit 4 performs the prediction process for the encoded block. For each prediction block which is a prediction processing unit, an intra prediction process (intraframe prediction) using an intra prediction parameter determined by the encoding control unit 2 while referring to a locally decoded image stored in the intra prediction memory 10. Process) to generate an intra-predicted image.

When the inter coding mode is selected by the coding control unit 2 as the coding mode corresponding to the coding block output from the changeover switch 3, the motion compensation prediction unit 5 and the motion compensation prediction frame memory 12 A motion vector is searched by comparing a locally decoded image of one frame or more stored in the image with a prediction block unit that is a prediction processing unit, and the motion vector and a frame number to be referenced determined by the encoding control unit 2 Using the inter prediction parameters, an inter prediction process (motion compensation prediction process) for the encoded block is performed for each prediction block, and an inter prediction image is generated.
The intra prediction unit 4, the intra prediction memory 10, the motion compensation prediction unit 5, and the motion compensation prediction frame memory 12 constitute a prediction unit.

The subtraction unit 6 subtracts the intra prediction image generated by the intra prediction unit 4 or the inter prediction image generated by the motion compensated prediction unit 5 from the encoded block output from the block division unit 1 and performs the subtraction. The process which outputs the prediction difference signal which shows the difference image which is a result to the conversion and quantization part 7 is implemented. The subtracting unit 6 constitutes a difference image generating unit.
The transform / quantization unit 7 refers to the orthogonal transform block division information included in the prediction difference coding parameter determined by the coding control unit 2 and performs orthogonal transform processing on the prediction difference signal output from the subtraction unit 6 (for example, , DCT (discrete cosine transform), DST (discrete sine transform), orthogonal transform processing such as KL transform in which a base design is made in advance for a specific learning sequence is performed in units of orthogonal transform blocks to calculate transform coefficients In addition, with reference to the quantization parameter included in the prediction differential encoding parameter, the transform coefficient of the orthogonal transform block unit is quantized, and the quantized compressed data that is the transform coefficient is inversely quantized / inversely transformed. 8 and the process of outputting to the variable length encoding unit 13 is performed.
The transform / quantization unit 7 constitutes an image compression unit.

When the transform / quantization unit 7 quantizes the transform coefficient, the transform / quantization unit 7 performs a quantization process of the transform coefficient using a quantization matrix that scales the quantization step size calculated from the quantization parameter for each transform coefficient. You may do it.
Here, FIG. 10 is an explanatory diagram illustrating an example of a 4 × 4 DCT quantization matrix.
The numbers in the figure indicate the scaling value of the quantization step size of each transform coefficient.
For example, in order to suppress the encoding bit rate, as shown in FIG. 10, by scaling the quantization step size to a larger value for the higher frequency transform coefficient, the higher frequency generated in a complex image region or the like. It is possible to perform encoding without dropping information on low-frequency coefficients that greatly affect subjective quality while suppressing the conversion coefficient and suppressing the code amount.
Thus, when it is desired to control the quantization step size for each transform coefficient, a quantization matrix may be used.

The quantization matrix can use an independent matrix for each color signal and coding mode (intra coding or inter coding) at each orthogonal transform size. It is possible to select either a quantization matrix that is commonly prepared in the moving picture decoding apparatus or an already encoded quantization matrix or a new quantization matrix.
Accordingly, the transform / quantization unit 7 sets flag information indicating whether or not to use a new quantization matrix for each orthogonal transform size for each color signal and coding mode, in a quantization matrix parameter to be encoded. .
Furthermore, when a new quantization matrix is used, each scaling value of the quantization matrix as shown in FIG. 10 is set as a quantization matrix parameter to be encoded. On the other hand, when a new quantization matrix is not used, a quantization matrix prepared in advance by the moving picture coding apparatus and the moving picture decoding apparatus as an initial value or a quantization matrix that has already been coded. Thus, an index for specifying a matrix to be used is set as a quantization matrix parameter to be encoded. However, when there is no already-encoded quantization matrix that can be referred to, only a quantization matrix prepared in advance by the moving picture coding apparatus and the moving picture decoding apparatus can be selected.

The inverse quantization / inverse transform unit 8 refers to the quantization parameter and the orthogonal transform block division information included in the prediction difference coding parameter determined by the coding control unit 2, and transforms / quantizes the orthogonal transform block unit. 7, the quantized data output from the subtractor 6 is inversely quantized, and the inverse orthogonal transform process is performed on the transform coefficient that is the compressed data after the inverse quantization, and the local decoding corresponding to the prediction difference signal output from the subtractor 6 A process of calculating the prediction difference signal is performed. When the transform / quantization unit 7 uses the quantization matrix to perform the quantization process, the corresponding inverse quantization can be performed by referring to the quantization matrix even during the inverse quantization process. Perform the process.
The addition unit 9 includes the local decoded prediction difference signal calculated by the inverse quantization / inverse conversion unit 8, the intra prediction image generated by the intra prediction unit 4, or the inter prediction image generated by the motion compensation prediction unit 5. Are added to calculate a locally decoded image corresponding to the encoded block output from the block dividing unit 1.
The inverse quantization / inverse transform unit 8 and the addition unit 9 constitute a local decoded image generation unit.

The intra prediction memory 10 is a recording medium that stores the locally decoded image calculated by the adding unit 9.
The loop filter unit 11 performs a predetermined filtering process on the local decoded image calculated by the adding unit 9 and performs a process of outputting the local decoded image after the filter process.
Specifically, filter (deblocking filter) processing that reduces distortion occurring at the boundaries of orthogonal transform blocks and prediction blocks, processing for adaptively adding an offset in units of pixels (pixel adaptive offset), Wiener filter, etc. The adaptive filter processing that performs the filter processing by adaptively switching the linear filter is performed.

However, the loop filter unit 11 determines whether or not to perform each of the deblocking filter process, the pixel adaptive offset process, and the adaptive filter process, and performs variable-length coding using the valid flag of each process as header information. To the unit 13. When a plurality of the above filter processes are used, each filter process is performed in order. FIG. 11 shows a configuration example of the loop filter unit 11 when a plurality of filter processes are used.
Generally, the more types of filter processing that are used, the better the image quality, but the higher the processing load. That is, image quality and processing load are in a trade-off relationship. In addition, the image quality improvement effect of each filter process varies depending on the characteristics of the image to be filtered. Therefore, the filter processing to be used may be determined according to the processing load allowed by the moving image encoding device and the characteristics of the encoding target image.
The loop filter unit 11 constitutes filtering means.

Here, in the deblocking filter process, various parameters used for selecting the filter strength applied to the block boundary can be changed from the initial values. When changing, the parameter is output to the variable length coding unit 13 as header information.
In the pixel adaptive offset process, first, an image is divided into a plurality of blocks, and when the offset process is not performed for each block, it is defined as one of the class classification methods, and a plurality of class classifications prepared in advance are used. One classification method is selected from the methods.
Next, each pixel in the block is classified by the selected class classification method, and an offset value for compensating the coding distortion is calculated for each class.
Finally, the image quality of the locally decoded image is improved by performing a process of adding the offset value to the luminance value of the locally decoded image.
Therefore, in the pixel adaptive offset processing, block division information, an index indicating the class classification method of each block, and offset information for specifying an offset value of each class in block units are output to the variable length coding unit 13 as header information.

In adaptive filter processing, local decoded images are classified by a predetermined method, and a filter that compensates for the distortion that is superimposed is designed for each region (local decoded image) belonging to each class. Filter the local decoded image.
Then, the filter designed for each class is output to the variable length encoding unit 13 as header information.
As the class classification method, there are a simple method for spatially dividing an image at equal intervals, and a method for classifying an image according to local characteristics (dispersion, etc.) of each block.
Further, the number of classes used in the adaptive filter processing may be set in advance as a value common to the moving image encoding device and the moving image decoding device, or may be a parameter to be encoded.
Compared to the former, the latter can set the number of classes to be used freely, so the image quality improvement effect will be improved, but on the other hand, the amount of code will be increased to encode the number of classes. To do.

When performing the pixel adaptive offset process and the adaptive filter process, it is necessary to refer to the video signal by the loop filter unit 11, so that the video image of FIG. 1 is input so that the video signal is input to the loop filter unit 11. It is necessary to change the encoding device.

The motion compensated prediction frame memory 12 is a recording medium that stores a locally decoded image after the filter processing of the loop filter unit 11.
The variable length coding unit 13 outputs the compressed data output from the transform / quantization unit 7 and the output signal of the coding control unit 2 (block division information in the largest coding block, coding mode, prediction difference coding parameter, Intra prediction parameters or inter prediction parameters) and motion vectors output from the motion compensation prediction unit 5 (when the encoding mode is the inter encoding mode) are variable length encoded to generate encoded data.
Further, as illustrated in FIG. 13, the variable length encoding unit 13 encodes a sequence level header and a picture level header as header information of the encoded bit stream, and generates an encoded bit stream together with the picture data.
The variable length encoding unit 13 constitutes variable length encoding means.

However, picture data is composed of one or more slice data, and each slice data is a combination of a slice level header and the encoded data in the slice.
The sequence level header includes the image size, the color signal format, the bit depth of the signal value of the luminance signal and the color difference signal, and each filter process (adaptive filter process, pixel adaptive offset process, deblocking filter process) in the loop filter unit 11 in sequence units. ) Effective flag information, quantization matrix effective flag information, and a flag indicating whether or not to perform field encoding.
The picture level header is a collection of header information set in units of pictures, such as an index of a sequence level header to be referenced, the number of reference pictures at the time of motion compensation, an entropy coding probability table initialization flag, and a quantization matrix parameter. .

The slice level header includes position information indicating where the slice is located in the picture, an index indicating which picture level header is referred to, a slice coding type (all-intra coding, inter coding, etc.), and a loop. This is a summary of parameters in units of slices such as flag information indicating whether or not to perform each filter process (adaptive filter process, pixel adaptive offset process, deblocking filter process) in the filter unit 11.

In the example of FIG. 1, a block division unit 1, an encoding control unit 2, a changeover switch 3, an intra prediction unit 4, a motion compensation prediction unit 5, a subtraction unit 6, transform / quantization, which are components of the moving image encoding device. Unit 7, inverse quantization / inverse transform unit 8, addition unit 9, intra prediction memory 10, loop filter unit 11, motion compensated prediction frame memory 12, and variable length coding unit 13, each of which has dedicated hardware (for example, It is assumed that the CPU is configured by a semiconductor integrated circuit or a one-chip microcomputer). However, when the moving image encoding apparatus is configured by a computer, the block dividing unit 1, encoding control Unit 2, changeover switch 3, intra prediction unit 4, motion compensation prediction unit 5, subtraction unit 6, transform / quantization unit 7, inverse quantization / inverse transform unit 8, addition unit 9, loop filter unit 11, and variable length code Chemical unit 1 The processing contents stored programs describing the the memory of the computer, may execute a program that the CPU of the computer is stored in the memory.
FIG. 2 is a flowchart showing the processing contents (moving image coding method) of the moving image coding apparatus according to Embodiment 1 of the present invention.

FIG. 3 is a block diagram showing a moving picture decoding apparatus according to Embodiment 1 of the present invention.
In FIG. 3, when the variable length decoding unit 31 receives the encoded bit stream generated by the moving image encoding apparatus of FIG. 1, each header information such as a sequence level header, a picture level header, a slice level header, and the like from the bit stream. In addition, the block division information indicating the division status of each encoded block divided hierarchically is variable-length decoded from the bitstream.
At this time, the quantization matrix is specified from the quantization matrix parameter variable-length decoded by the variable-length decoding unit 31. Specifically, for each color signal and encoding mode of each orthogonal transform size, a quantization matrix that is prepared in advance by the moving image encoding device and the moving image decoding device in advance as the initial value of the quantization matrix parameter, or When indicating that the quantization matrix is already decoded (not a new quantization matrix), specify the quantization matrix by referring to the index information for specifying which quantization matrix among the matrices. When the quantization matrix parameter indicates that a new quantization matrix is used, it is specified as a quantization matrix that uses the quantization matrix included in the quantization matrix parameter.
In addition, the variable length decoding unit 31 refers to each header information to specify the slice division state, and the maximum decoding block included in the slice data of each slice (the “maximum encoding” of the video encoding device in FIG. 1). A block corresponding to a “block” is identified, and the decoding block (the “code” of the moving picture encoding device in FIG. 1) is a unit for performing decoding processing by hierarchically dividing the maximum decoded block with reference to block division information. Block) corresponding to each decoded block, the compressed data, the encoding mode, the intra prediction parameter (when the encoding mode is the intra encoding mode), the inter prediction parameter (the encoding mode is Inter-coding mode), predictive differential coding parameters and motion vectors (if the coding mode is inter-coding mode) ) Carries out a process of variable length decoding the. The variable length decoding unit 31 constitutes a variable length decoding unit.

The inverse quantization / inverse transform unit 32 refers to the quantization parameter and the orthogonal transform block division information included in the prediction difference encoding parameter variable length decoded by the variable length decoding unit 31, and the variable length decoding unit 31 performs variable length decoding. The decoded compressed data is inversely quantized in units of orthogonal transform blocks, and inverse orthogonal transform processing is performed on transform coefficients that are compressed data after inverse quantization, so that the inverse quantization / inverse transform unit 8 in FIG. A process of calculating the same decoded prediction difference signal as the output local decoding prediction difference signal is performed. The inverse quantization / inverse transform unit 32 constitutes a difference image generation unit.

Here, when each header information variable-length decoded by the variable-length decoding unit 31 indicates that the inverse quantization process is performed using the quantization matrix in the slice, the header information is inverted using the quantization matrix. Quantization processing is performed.
Specifically, inverse quantization processing is performed using a quantization matrix specified from each header information.

The changeover switch 33 outputs the intra-prediction parameter variable-length decoded by the variable-length decoding unit 31 to the intra-prediction unit 34 if the coding mode variable-length decoded by the variable-length decoding unit 31 is the intra-coding mode. If the encoding mode variable-length decoded by the variable-length decoding unit 31 is an inter-coding mode, a process of outputting the inter prediction parameters and motion vectors variable-length decoded by the variable-length decoding unit 31 to the motion compensation unit 35 carry out.

When the encoding mode related to the decoded block specified from the block division information variable-length decoded by the variable-length decoding unit 31 is the intra-coding mode, the intra prediction unit 34 performs prediction when performing the prediction process of the decoded block For each prediction block that is a processing unit, an intra prediction process (intraframe prediction process) using the intra prediction parameter output from the changeover switch 33 is performed with reference to the decoded image stored in the intra prediction memory 37. Then, a process for generating an intra-predicted image is performed.

The motion compensation unit 35 performs prediction when performing the prediction process of the decoded block when the coding mode related to the decoded block specified from the block division information subjected to variable length decoding by the variable length decoding unit 31 is the inter coding mode. For each prediction block that is a processing unit, while referring to the decoded image stored in the motion compensated prediction frame memory 39, the inter prediction process (motion compensated prediction) using the motion vector output from the changeover switch 33 and the inter prediction parameter is used. Process) to generate an inter prediction image.
The intra prediction unit 34, the intra prediction memory 37, the motion compensation unit 35, and the motion compensated prediction frame memory 39 constitute a prediction unit.

The addition unit 36 adds the decoded prediction difference signal calculated by the inverse quantization / inverse conversion unit 32 and the intra prediction image generated by the intra prediction unit 34 or the inter prediction image generated by the motion compensation unit 35. Then, a process of calculating the same decoded image as the local decoded image output from the adding unit 9 in FIG. 1 is performed. The adding unit 36 constitutes a decoded image generating unit.

The intra prediction memory 37 is a recording medium that stores the decoded image calculated by the adding unit 36 as a reference image used in the intra prediction process.
The loop filter unit 38 performs a predetermined filter process on the decoded image calculated by the adding unit 36 and performs a process of outputting the decoded image after the filter process.
Specifically, filter (deblocking filter) processing that reduces distortion occurring at the boundaries of orthogonal transform blocks and prediction blocks, processing for adaptively adding an offset in units of pixels (pixel adaptive offset), Wiener filter, etc. The adaptive filter processing that performs the filter processing by adaptively switching the linear filter is performed.
However, the loop filter unit 38 performs each of the above deblocking filter processing, pixel adaptive offset processing, and adaptive filter processing with reference to each header information variable-length decoded by the variable-length decoding unit 31 in the corresponding slice. Specify whether or not.
At this time, when two or more filter processes are performed, if the loop filter unit 11 of the moving picture coding apparatus is configured as shown in FIG. 11, a loop filter unit 38 is configured as shown in FIG. .
The loop filter unit 38 constitutes filtering means.

Here, in the deblocking filter processing, with reference to the header information that has been subjected to variable length decoding by the variable length decoding unit 31, there is information for changing various parameters used for selecting the filter strength applied to the block boundary from the initial value. Based on the change information, deblocking filter processing is performed. When there is no change information, it is performed according to a predetermined method.

In the pixel adaptive offset processing, the decoded image is divided based on the block division information of the pixel adaptive offset processing variable-length decoded by the variable-length decoding unit 31, and the variable-length decoding unit 31 performs variable-length decoding on the block basis. If the index indicating the block classification method is not an index indicating that “offset processing is not performed”, each pixel in the block is classified according to the class classification method indicated by the index. To do.
In addition, the same class classification method candidate as the pixel classification method candidate of the pixel adaptive offset process of the loop filter unit 11 is prepared in advance as a class classification method candidate.
Then, a process of adding the offset to the luminance value of the decoded image is performed with reference to the offset information specifying the offset value of each class in block units.

However, in the pixel adaptive offset processing of the loop filter unit 11 of the moving image encoding device, the block division information is not encoded, and the image is always divided into fixed-size block units (for example, maximum encoded block units), When a class classification method is selected for each block and adaptive offset processing is performed for each class, the loop filter unit 38 also applies pixel adaptation to a block unit of the same fixed size as the loop filter unit 11. Perform offset processing.

In the adaptive filter processing, after classifying by the same method as the moving picture encoding apparatus of FIG. 1 using the filter for each class variable-length decoded by the variable-length decoding unit 31, the filter is based on the class classification information. Process.
The motion compensation prediction frame memory 39 is a recording medium that stores the decoded image after the filter processing of the loop filter unit 38 as a reference image used in the inter prediction processing (motion compensation prediction processing).

In the example of FIG. 3, the variable length decoding unit 31, the inverse quantization / inverse conversion unit 32, the changeover switch 33, the intra prediction unit 34, the motion compensation unit 35, the addition unit 36, and the intra prediction, which are components of the video decoding device. It is assumed that each of the memory 37, the loop filter unit 38, and the motion compensation prediction frame memory 39 is configured by dedicated hardware (for example, a semiconductor integrated circuit on which a CPU is mounted, a one-chip microcomputer, or the like). However, when the moving picture decoding apparatus is configured by a computer, the variable length decoding unit 31, the inverse quantization / inverse conversion unit 32, the changeover switch 33, the intra prediction unit 34, the motion compensation unit 35, the addition unit 36, and the loop A program describing the processing contents of the filter unit 38 is stored in the memory of a computer, and the CPU of the computer is stored in the memory. It is also possible to run the program.
FIG. 4 is a flowchart showing the processing contents (moving image decoding method) of the moving image decoding apparatus according to Embodiment 1 of the present invention.

Next, the operation will be described.
In the first embodiment, each frame image of a video is used as an input image, intra prediction from encoded neighboring pixels or motion compensation prediction between adjacent frames is performed, and an obtained prediction difference signal is obtained. A video encoding device that performs compression processing by orthogonal transform / quantization and then performs variable length encoding to generate an encoded bit stream, and decodes the encoded bit stream output from the video encoding device A moving picture decoding apparatus will be described.

The moving picture coding apparatus in FIG. 1 performs intra-frame / inter-frame adaptive coding by dividing a video signal into blocks of various sizes in response to local changes in the spatial and temporal directions of the video signal. It is characterized by that.
In general, a video signal has a characteristic that the complexity of the signal changes locally in space and time. When viewed spatially, a small image, such as a picture with a uniform signal characteristic in a relatively wide image area such as the sky or a wall, or a picture containing a person or fine texture, on a video frame. A pattern having a complicated texture pattern in the region may be mixed.
Even when viewed temporally, the change in the pattern of the sky and the wall locally in the time direction is small, but because the outline of the moving person or object moves rigidly or non-rigidly in time, the temporal change Is big.

In the encoding process, a prediction difference signal with small signal power and entropy is generated by temporal and spatial prediction to reduce the overall code amount. However, the parameters used for the prediction are set as large as possible in the image signal region. If it can be applied uniformly, the code amount of the parameter can be reduced.
On the other hand, if the same prediction parameter is applied to a large image region with respect to an image signal pattern having a large temporal and spatial change, the number of prediction differential signals increases because prediction errors increase. .
Therefore, in a region where the temporal and spatial changes are large, the block size for performing the prediction process by applying the same prediction parameter is reduced, the amount of parameter data used for prediction is increased, and the power and entropy of the prediction difference signal are increased. It is desirable to reduce

In the first embodiment, in order to perform coding adapted to the general characteristics of such a video signal, first, prediction processing or the like is started from a predetermined maximum block size, and the video signal region is divided hierarchically. In addition, the prediction process and the encoding process of the prediction difference are adapted for each divided area.

The video signal format to be processed by the moving image encoding apparatus of FIG. 1 is a color video signal in an arbitrary color space such as a YUV signal composed of a luminance signal and two color difference signals, or an RGB signal output from a digital image sensor. In addition to the above, it is assumed that the video frame is an arbitrary video signal including a horizontal / vertical two-dimensional digital sample (pixel) sequence, such as a monochrome image signal or an infrared image signal.
However, the gradation of each pixel may be 8 bits, or a gradation of 10 bits or 12 bits.

In the following description, for convenience, unless otherwise specified, it is assumed that the video signal of the input image is a YUV signal, and the two color difference components U and V are subsampled with respect to the luminance component Y 4: 2: 0. The case of handling format signals will be described.
Also, the format of the color difference signal may be other than the 4: 2: 0 format of the YUV signal, or may be the 4: 2: 2 format, 4: 4: 4 format of the YUV signal, or the RGB signal.
A processing data unit corresponding to each frame of the video signal is referred to as a “picture”.
Note that “picture” represents a frame signal when encoded in frame units, and represents a field signal when encoded in field units.

First, the processing contents of the moving picture encoding apparatus in FIG. 1 will be described.
First, the encoding control unit 2 determines the slice division state of a picture to be encoded (current picture), and also determines the size of the maximum encoding block used for encoding the picture and the hierarchy for dividing the maximum encoding block into layers. The upper limit of the number is determined (step ST1 in FIG. 2).
As a method of determining the size of the maximum coding block, for example, the same size may be determined for all the pictures according to the resolution of the video signal of the input image, or the local motion of the video signal of the input image The size difference may be quantified as a parameter, and a small size may be determined for a picture with high motion, while a large size may be determined for a picture with little motion.

For example, the upper limit of the number of division layers can be determined by, for example, determining the same number of layers for all pictures according to the resolution of the video signal of the input image, or when the motion of the video signal of the input image is severe There is a method in which the number of hierarchies is increased so that finer movements can be detected, and when there are few movements, the number of hierarchies is set to be suppressed.
Note that the size of the maximum coding block and the upper limit of the number of layers into which the maximum coding block is divided may be encoded in a sequence level header or the like. Processing may be performed. Although the former increases the code amount of the header information, it is not necessary to perform the determination process on the video decoding device side, so the processing load on the video decoding device can be suppressed and the optimum on the video encoding device side. You can search for and send a new value. On the contrary, since the determination process is performed on the video decoding device side, the processing load on the video decoding device increases, but the code amount of the header information does not increase.

Also, the encoding control unit 2 selects an encoding mode corresponding to each encoding block divided hierarchically from one or more available encoding modes (step ST2).
That is, the encoding control unit 2 divides the image area of the maximum encoding block size into encoded blocks having the encoding block size hierarchically until reaching the upper limit of the number of division layers defined above. A coding mode for each coding block is determined.
There are one or more intra coding modes (collectively referred to as “INTRA”) and one or more inter coding modes (collectively referred to as “INTER”). The coding control unit 2 selects a coding mode corresponding to each coding block from all coding modes available for the picture or a subset thereof.

However, each coding block that is hierarchically divided by the block division unit 1 to be described later is further divided into one or a plurality of prediction blocks, which are units for performing prediction processing, and the division state of the prediction block is also coded mode. Is included as information. That is, the coding mode is an index for identifying what kind of prediction block division the intra or inter coding mode is.
Since the encoding mode selection method by the encoding control unit 2 is a known technique, detailed description thereof is omitted. For example, an encoding process for an encoding block is performed using any available encoding mode. There is a method in which coding efficiency is verified by performing and a coding mode having the best coding efficiency is selected from among a plurality of available coding modes.

In addition, the encoding control unit 2 determines a quantization parameter and an orthogonal transform block division state that are used when the difference image is compressed for each encoding block, and is used when the prediction process is performed. Prediction parameter (intra prediction parameter or inter prediction parameter) is determined.
However, when the encoded block is further divided into prediction block units for which prediction processing is performed, a prediction parameter (intra prediction parameter or inter prediction parameter) can be selected for each prediction block.
Furthermore, in a coding block whose coding mode is the intra coding mode, the details will be described later. Since it is necessary to perform encoding, the selectable transform block size is limited to the size of the prediction block or less.

The encoding control unit 2 outputs the prediction difference encoding parameter including the quantization parameter and the transform block size to the transform / quantization unit 7, the inverse quantization / inverse transform unit 8, and the variable length coding unit 13.
Also, the encoding control unit 2 outputs intra prediction parameters to the intra prediction unit 4 as necessary.
Also, the encoding control unit 2 outputs inter prediction parameters to the motion compensation prediction unit 5 as necessary.

When a video signal is input as an input image, the slice dividing unit 14 divides the input image into slices that are one or more partial images according to the slice division information determined by the encoding control unit 2.
Each time the slice dividing unit 1 inputs each slice from the slice dividing unit 14, the slice dividing unit 1 divides the slice into the maximum coding block size determined by the coding control unit 2, and further encodes the divided maximum coding block. The coding block is hierarchically divided into coding blocks determined by the coding control unit 2, and the coding blocks are output.

Here, FIG. 5 is an explanatory diagram showing an example in which the maximum coding block is hierarchically divided into a plurality of coding blocks.
In FIG. 5, the maximum coding block is a coding block whose luminance component described as “0th layer” has a size of (L ⁰ , M ⁰ ).
Starting from the maximum encoding block, the encoding block is obtained by performing hierarchical division to a predetermined depth separately defined by a quadtree structure.
At depth n, the coding block is an image area of size (L ⁿ , M ⁿ ).
However, L ⁿ and M ⁿ may be the same or different, but FIG. 5 shows a case of L ⁿ = M ⁿ .

Hereinafter, the coding block size determined by the coding control unit 2 is defined as the size (L ⁿ , M ⁿ ) in the luminance component of the coding block.
Since quadtree partitioning is performed, (L ^{n + 1} , M ^{n + 1} ) = (L ⁿ / 2, M ⁿ / 2) always holds.
Note that in a color video signal (4: 4: 4 format) in which all color components have the same number of samples, such as RGB signals, the size of all color components is (L ⁿ , M ⁿ ), but 4: 2. : When the 0 format is handled, the encoding block size of the corresponding color difference component is (L ⁿ / 2, M ⁿ / 2).

Hereinafter, the coding block of the n hierarchy expressed in B ^n, denote the encoding modes selectable by the coding block B ⁿ with m (B ^n).
In the case of a color video signal composed of a plurality of color components, the encoding mode m (B ⁿ ) may be configured to use an individual mode for each color component, or common to all color components. It may be configured to use a mode. Hereinafter, unless otherwise specified, description will be made assuming that it indicates a coding mode for a luminance component of a coding block of a YUV signal and 4: 2: 0 format.

As shown in FIG. 5, the encoded block B ⁿ is divided by the block dividing unit 1 into one or a plurality of prediction blocks representing a prediction processing unit.
Hereinafter, a prediction block belonging to the coding block B ⁿ is ^denoted as P _i ⁿ (i is a prediction block number in the n-th layer). FIG. 5 shows an example of P ₀ ⁰ and P ₁ ⁰ .
How the prediction block is divided in the coding block ^Bn is included as information in the coding mode m ( ^Bn ).
All the prediction blocks P _i ⁿ are subjected to prediction processing according to the encoding mode m (B ⁿ ), and it is possible to select individual prediction parameters (intra prediction parameters or inter prediction parameters) for each prediction block P _i ^n. it can.

For example, the encoding control unit 2 generates a block division state as illustrated in FIG. 6 for the maximum encoding block, and identifies the encoding block.
A rectangle surrounded by a dotted line in FIG. 6A represents each coding block, and a block painted with diagonal lines in each coding block represents a division state of each prediction block.
FIG. 6B shows, in a quadtree graph, a situation in which the encoding mode m (B ⁿ ) is assigned by hierarchical division in the example of FIG. 6A. Nodes surrounded by squares in FIG. 6B are nodes (encoding blocks) to which the encoding mode m (B ⁿ ) is assigned.
Information of the quadtree graph is output from the encoding control unit 2 to the variable length encoding unit 13 together with the encoding mode m (B ⁿ ), and is multiplexed into the bit stream.

The changeover switch 3 is output from the block dividing unit 1 when the encoding mode m (B ⁿ ) determined by the encoding control unit 2 is an intra encoding mode (when m (B ⁿ ) ∈INTRA). The encoded block B ⁿ is output to the intra prediction unit 4.
On the other hand, when the encoding mode m (B ⁿ ) determined by the encoding control unit 2 is the inter encoding mode (when m (B ⁿ ) εINTER), the encoded block output from the block dividing unit 1 B ⁿ is output to the motion compensation prediction unit 5.

In the intra prediction unit 4, the coding mode m (B ⁿ ) determined by the coding control unit 2 is the intra coding mode (when m (B ⁿ ) ∈INTRA), and the coding block B is changed from the changeover switch 3 to the coding block B. ⁿ (step ST3), using the intra prediction parameters determined by the encoding control unit 2 while referring to the local decoded image stored in the intra prediction memory 10, the encoding block B ⁿ and implementing intra prediction process for each of the prediction block _P ^{i n} in, it generates an intra prediction image _{P INTRAi} ⁿ (step ST4).
Incidentally, since it is necessary to video decoding device generates exactly the same intra prediction image and the intra prediction image P _INTRAi ^n, intra prediction parameters used for generating the intra prediction image P _INTRAi ⁿ is from encoding control unit 2 The data is output to the variable length encoding unit 13 and multiplexed into the bit stream.
Details of processing contents of the intra prediction unit 4 will be described later.

The motion-compensated prediction unit 5 has the coding mode m (B ⁿ ) determined by the coding control unit 2 in the inter coding mode (when m (B ⁿ ) ∈ INTER), and the coding block is switched from the changeover switch 3 to the coding block. Upon receiving the B ⁿ (step ST3), the motion vector by comparing the locally decoded image after the filtering process stored in the prediction block P _i ⁿ and the motion compensated prediction frame memory 12 of the encoding block B ⁿ Using the motion vector and the inter prediction parameter determined by the encoding control unit 2, the inter prediction process for each prediction block P _i ⁿ in the encoding block B ⁿ is performed, and the inter prediction image P generating a _INTERi ⁿ (step ST5).
Incidentally, since it is necessary to video decoding device generates exactly the same inter prediction image and the inter-predicted image P _INTERi ^n, inter prediction parameters used for generating the inter prediction image P _INTERi ⁿ is from encoding control unit 2 The data is output to the variable length encoding unit 13 and multiplexed into the bit stream.
In addition, the motion vector searched by the motion compensation prediction unit 5 is also output to the variable length encoding unit 13 and multiplexed into the bit stream.

Subtraction unit 6, upon receiving the encoded block ^{B n} from the block dividing unit 1 from its prediction block _P ^{i n} the coded block ^{B n,} the intra prediction image _{P INTRAi} ⁿ generated by the intra prediction unit 4 or, , by subtracting one of the inter prediction image P _INTERi ⁿ generated by the motion compensation prediction unit 5, and outputs the prediction difference signal e _i ⁿ representing a difference image is the subtraction result to the transform and quantization unit 7 (Step ST6).

When the transform / quantization unit 7 receives the prediction difference signal e _i ⁿ from the subtraction unit 6, the transform / quantization unit 7 refers to the orthogonal transform block division information included in the prediction difference encoding parameter determined by the encoding control unit 2, and orthogonal transform processing for the predicted differential signal e _i ⁿ (e.g., DCT (discrete cosine transform) or DST (discrete sine transform), the orthogonal transform for KL conversion and the base design have been made in advance to the particular learning sequence) This is performed for each orthogonal transform block, and transform coefficients are calculated.
Further, the transform / quantization unit 7 refers to the quantization parameter included in the prediction differential encoding parameter, quantizes the transform coefficient of the orthogonal transform block unit, and compresses the compressed data that is the quantized transform coefficient. The data is output to the inverse quantization / inverse transform unit 8 and the variable length coding unit 13 (step ST7). At this time, the quantization process may be performed using a quantization matrix that scales the quantization step size calculated from the quantization parameter for each transform coefficient.

As the quantization matrix, an independent matrix can be used for each color signal and coding mode (intra coding or inter coding) at each orthogonal transform size. It is possible to select either a quantization matrix that is commonly prepared in the moving picture decoding apparatus or an already encoded quantization matrix or a new quantization matrix.
Accordingly, the transform / quantization unit 7 sets flag information indicating whether or not to use a new quantization matrix for each orthogonal transform size for each color signal and coding mode, in a quantization matrix parameter to be encoded. .
Furthermore, when a new quantization matrix is used, each scaling value of the quantization matrix as shown in FIG. 10 is set as a quantization matrix parameter to be encoded. On the other hand, when a new quantization matrix is not used, a quantization matrix prepared in advance by the moving picture coding apparatus and the moving picture decoding apparatus as an initial value or a quantization matrix that has already been coded. Thus, an index for specifying a matrix to be used is set as a quantization matrix parameter to be encoded. However, when there is no already-encoded quantization matrix that can be referred to, only a quantization matrix prepared in advance by the moving picture coding apparatus and the moving picture decoding apparatus can be selected.
Then, the transform / quantization unit 7 outputs the set quantization matrix parameter to the variable length coding unit 13.

When the inverse quantization / inverse transform unit 8 receives the compressed data from the transform / quantization unit 7, the inverse quantization / inverse transform unit 8 displays the quantization parameter and the orthogonal transform block division information included in the prediction difference coding parameter determined by the coding control unit 2. With reference, the compressed data is inversely quantized in units of orthogonal transform blocks.
When the transform / quantization unit 7 uses a quantization matrix for the quantization process, the corresponding inverse quantization process is performed with reference to the quantization matrix even during the inverse quantization process.
Further, the inverse quantization / inverse transform unit 8 performs inverse orthogonal transform processing (for example, inverse DCT, inverse DST, inverse KL transform, etc.) on transform coefficients that are compressed data after inverse quantization in units of orthogonal transform blocks. Then, a local decoded prediction difference signal corresponding to the prediction difference signal e _i ⁿ output from the subtraction unit 6 is calculated and output to the addition unit 9 (step ST8).

Upon receiving the local decoded prediction difference signal from the inverse quantization / inverse transform unit 8, the adding unit 9 receives the local decoded prediction difference signal and the intra predicted image P _INTRAi ⁿ generated by the intra prediction unit 4 or motion compensation. A local decoded image is calculated by adding one of the inter predicted images P _INTERIn ⁿ generated by the prediction unit 5 (step ST9).
The adding unit 9 outputs the locally decoded image to the loop filter unit 11 and stores the locally decoded image in the intra prediction memory 10.
This locally decoded image becomes an encoded image signal used in the subsequent intra prediction processing.

When the loop filter unit 11 receives the local decoded image from the adding unit 9, the loop filter unit 11 performs a predetermined filter process on the local decoded image, and stores the filtered local decoded image in the motion compensated prediction frame memory 12. (Step ST10).
Specifically, filter (deblocking filter) processing that reduces distortion occurring at the boundaries of orthogonal transform blocks and prediction blocks, processing for adaptively adding an offset in units of pixels (pixel adaptive offset), Wiener filter, etc. The adaptive filter processing that performs the filter processing by adaptively switching the linear filter is performed.

However, the loop filter unit 11 determines whether or not to perform processing for each of the above deblocking filter processing, pixel adaptive offset processing, and adaptive filter processing, and sets a valid flag of each processing as a part of the sequence level header and The data is output to the variable length coding unit 13 as a part of the slice level header. When a plurality of the above filter processes are used, each filter process is performed in order. FIG. 11 shows a configuration example of the loop filter unit 11 when a plurality of filter processes are used.
Generally, the more types of filter processing that are used, the better the image quality, but the higher the processing load. That is, image quality and processing load are in a trade-off relationship. In addition, the image quality improvement effect of each filter process varies depending on the characteristics of the image to be filtered. Therefore, the filter processing to be used may be determined according to the processing load allowed by the moving image encoding device and the characteristics of the encoding target image.

Here, in the deblocking filter process, various parameters used for selecting the filter strength applied to the block boundary can be changed from the initial values. When changing, the parameter is output to the variable length coding unit 13 as header information.

In the pixel adaptive offset process, first, an image is divided into a plurality of blocks, and when the offset process is not performed for each block, it is defined as one of the class classification methods, and a plurality of class classifications prepared in advance are provided. One classification method is selected from the methods.
Next, each pixel in the block is classified by the selected class classification method, and an offset value for compensating the coding distortion is calculated for each class.
Finally, the image quality of the locally decoded image is improved by performing a process of adding the offset value to the luminance value of the locally decoded image.

As a classifying method, a method of classifying by the size of the luminance value of the locally decoded image (referred to as a BO method), or a classification according to the situation around each pixel (whether it is an edge portion or the like) for each edge direction. There is a technique (referred to as EO technique).
These methods are prepared in advance by the moving image encoding device and the moving image decoding device in advance. For example, as shown in FIG. 14, when no offset processing is performed, it is defined as one of the class classification methods, Among these methods, an index indicating which method is used for class classification is selected for each block.

Therefore, the pixel adaptive offset processing outputs the block division information, the index indicating the block class classification method, and the block offset information to the variable length coding unit 13 as header information.

Also, in adaptive filter processing, local decoded images are classified by a predetermined method, and a filter that compensates for superimposed distortion is designed for each region (local decoded image) belonging to each class. Then, the local decoded image is filtered.
Then, the filter designed for each class is output to the variable length encoding unit 13 as header information.
Here, as a class classification method, there are a simple method for spatially dividing an image at equal intervals, and a method for classifying an image according to local characteristics (dispersion, etc.) of the image in units of blocks. In addition, the number of classes used in the adaptive filter process may be set in advance to a common value in the video encoding device and the video decoding device, or may be one of the parameters to be encoded.
Compared to the former, the latter can set the number of classes to be used freely, so the image quality improvement effect will be improved, but on the other hand, the amount of code will be increased to encode the number of classes. To do.

The processes in steps ST3 to ST9 are repeated until the processes for all the coding blocks ^Bn divided hierarchically are completed, and when the processes for all the coding blocks ^Bn are completed, the process proceeds to the process of step ST13. (Steps ST11 and ST12).

The variable length encoding unit 13 uses the compressed data output from the transform / quantization unit 7 and the block division information (FIG. 6B) in the maximum encoding block output from the encoding control unit 2 as an example. (Quadrant tree information), encoding mode m (B ⁿ ) and prediction differential encoding parameter, and intra prediction parameter output from the encoding control unit 2 (when the encoding mode is an intra encoding mode) or inter prediction The parameters (when the encoding mode is the inter encoding mode) and the motion vector (when the encoding mode is the inter encoding mode) output from the motion compensated prediction unit 5 are variable-length encoded, Encoded data indicating the encoding result is generated (step ST13).
At that time, as an encoding method of compressed data which is a quantized orthogonal transform coefficient, the orthogonal transform block is further divided into blocks of 4 × 4 pixel units (encoding sub-blocks) called “Coefficient Group (CG)”. The coefficient encoding process is performed on FIG. 28 shows the coding order (scan order) of the coefficients in the 16 × 16 pixel orthogonal transform block. In

non-cited document

2, 16 CGs in units of 4 × 4 pixels are encoded in this order from the lower right CG, and each CG encodes 16 coefficients in the CG in order from the lower right coefficient. Turn into. Specifically, first, flag information indicating whether or not a significant (nonzero) coefficient exists in 16 coefficients in the CG is encoded, and then a significant (nonzero) coefficient exists in the CG. Only whether each coefficient in the CG is a significant (non-zero) coefficient is encoded in the above order, and finally, coefficient value information is encoded in order for a significant (non-zero) coefficient. This is performed in the above order in units of CG. In this case, the encoding efficiency by entropy encoding can be increased by using a biased scan order so that significant (non-zero) coefficients are generated as continuously as possible. Since the coefficient after the orthogonal transformation represents the lower coefficient of the lower frequency component as it approaches the upper left, starting with the DC component located at the upper left, generally, in the progressive video as shown in FIG. Since many non-zero) coefficients are generated, efficient encoding can be performed by encoding sequentially from the lower right as shown in FIG. On the other hand, when the flag indicating whether or not the field coding of the sequence level header is valid, that is, when the input signal is coded on a field basis, the vertical spatial efficiency decreases, so the vertical prediction efficiency is The frequency coefficient in the vertical direction also increases for the transform coefficient, which is a result of the orthogonal transformation of the prediction difference signal e _i ⁿ , and the significant (non-zero) coefficient generation distribution is progressive as shown in FIG. There is a tendency that the shape is biased to the left side of the orthogonal transform block rather than the video. Accordingly, since the encoding cannot be efficiently performed in the encoding order shown in FIG. 28, for example, switching is performed in the encoding order shown in FIG. By doing so, encoding of significant (non-zero) coefficients is processed continuously in the rear of the encoding order, and the encoding efficiency by entropy encoding can be improved.
In the above description, the 16 × 16 pixel orthogonal transform block has been described. However, encoding processing in units of CG (coding sub-block) is also performed in block sizes other than 16 × 16 pixels, such as a 32 × 32 pixel orthogonal transform block. The encoding order is switched according to whether the flag indicating whether or not the field encoding of the sequence level header is valid, as in the case of the 16 × 16 pixel orthogonal transform block.

In the above, when the flag indicating whether or not the field coding of the sequence level header is valid, the coding order shown in FIG. 17 (coding block unit (coding order in a coding block of 16 × 16 pixels), code The sub-block unit (coding order within the CG of 4 × 4 pixels) is changed, but the shape of the CG is changed from the 4 × 4 pixel block to the 8 × 2 pixel block as shown in FIG. It may be. Even in this way, encoding of significant (non-zero) coefficients is continuously processed by the CG in the rear in the encoding order, and the encoding efficiency by entropy encoding can be improved. That is, when the flag indicating whether or not the field coding of the sequence level header is invalid, the coding order is as shown in FIG. 28, and in the case of FIG. 17, the coding block unit and the coding subblock unit are used. Since the coding order is switched, the coding efficiency can be improved. Further, in the case of FIG. 18, in addition to switching the coding order in units of coding blocks and coding sub-blocks, the shape of the coding sub-block is also changed. Since the change is made, the encoding efficiency can be further increased. In the above description, the case where the coding order is switched in both the coding block unit and the coding subblock unit has been described. However, only the coding block unit or the coding subblock unit may be switched. .
Alternatively, when the flag indicating whether or not the field encoding of the sequence level header is valid, the encoding order shown in FIG. 19 may be used. In this way, by changing not only the shape of the CG but also the scan order in the CG so that the coefficient on the right side of the block is preferentially encoded, the encoding of more significant (non-zero) coefficients can be performed in the encoding order. Processing can be continuously performed in the rear direction, and the encoding efficiency by entropy encoding can be further increased.

In addition, a flag indicating whether or not the sequence level header field coding is prepared in the picture level header, and the coding order of the coefficients when coding the compressed data that is the quantized orthogonal transform coefficients is adapted to each picture. May be switched automatically. By doing so, control adaptive to each picture can be realized, and the encoding efficiency can be improved. Note that in the case of realizing coding that adaptively switches between frame coding and field coding on a picture-by-picture basis, the flag needs to be prepared in the picture level header.
Further, in the first embodiment, the case where the coding order, the shape, and the like are switched based on the flag indicating whether or not the field coding of the sequence level header or the picture level header has been described, but the sequence level header or the picture level header In addition to the flag indicating whether or not the field encoding is performed, a flag indicating whether or not to perform the switching process is defined, and based on the flag indicating whether or not the switching process is performed, the encoding order, the CG shape, The scanning order in the CG may be switched.
In addition, FIGS. 17, 18, and 19 are illustrated as examples of the coding order, the shape of the CG, and the scan order in the CG. However, encoding of significant (non-zero) coefficients is continued in the rear in the coding order. However, the processing order is not limited to this, and the encoding order other than those shown in FIGS. 17, 18, and 19, the CG shape, and the scan order within the CG may be used. The combination of the CG shape and the scan order within the CG Also, it is not limited to FIG. 17, FIG. 18, and FIG. For example, the CG may be 1 × 2, 1 × 4, 1 × 8, 1 × 16, 2 × 2, 2 × 4, 4 × 8 pixels, or the like.
In the first embodiment, the case of field encoding has been described with respect to the case of any one of FIGS. 17, 18, and 19 (cannot be selected), but a plurality of candidates (FIG. 17, FIG. 18, One of them may be selected from FIG. In that case, a flag indicating which of a plurality of candidates has been selected is prepared in the header. This flag may be shared with a flag indicating whether or not to perform field encoding or a flag indicating whether or not to perform this switching process.

Further, as illustrated in FIG. 13, the variable length encoding unit 13 encodes a sequence level header and a picture level header as header information of the encoded bit stream, and generates an encoded bit stream together with the picture data.
However, picture data is composed of one or more slice data, and each slice data is a combination of a slice level header and the encoded data in the slice.

The sequence level header includes the image size, the color signal format, the bit depth of the signal value of the luminance signal and the color difference signal, and each filter process (adaptive filter process, pixel adaptive offset process, deblocking filter process) in the loop filter unit 11 in sequence units. ) Effective flag information, quantization matrix effective flag information, and a flag indicating whether or not to perform field encoding.
The picture level header is a collection of header information set in units of pictures such as an index of a sequence level header to be referenced, the number of reference pictures at the time of motion compensation, an entropy encoding probability table initialization flag, and the like.
The slice level header includes position information indicating where the slice is located in the picture, an index indicating which picture level header is referred to, a slice coding type (all-intra coding, inter coding, etc.), and a loop. This is a summary of parameters in units of slices such as flag information indicating whether or not to perform each filter process (adaptive filter process, pixel adaptive offset process, deblocking filter process) in the filter unit 11.

Next, the processing content of the intra estimation part 4 is demonstrated in detail.
Figure 7 is an explanatory diagram showing an example of the intra prediction mode is an intra prediction parameters each prediction block P _i ⁿ is selectable within a coding block B ^n. However, N _I represents the number of intra prediction modes.
FIG. 7 shows an index value of an intra prediction mode and a prediction direction vector indicated by the intra prediction mode. In the example of FIG. 7, as the number of selectable intra prediction modes increases, Designed to reduce the angle.

Intra prediction unit 4, as described above, with reference to the intra prediction parameters of the prediction block P _i ^n, to implement intra prediction processing for the prediction block P _i ^n, but to generate an intra prediction image P _INTRAi ⁿ , it will be described here intra process of generating an intra prediction signal of a prediction block P _i ⁿ in the luminance signal.

The size of the prediction block P _i ⁿ is set to l _i ⁿ × m _i ⁿ pixels.
FIG. 8 is an explanatory diagram illustrating an example of a pixel used when generating a prediction value of a pixel in the prediction block P _i ⁿ when l _i ⁿ = m _i ⁿ = 4.
In FIG. 8, the encoded pixels (2 × l _i ⁿ +1) and the left encoded pixels (2 × m _i ⁿ ) on the prediction block P _i ⁿ are used as prediction pixels. However, the number of pixels used for prediction may be more or less than the pixels shown in FIG.
Further, in FIG. 8, it is used to predict one line or pixel of one column in the vicinity of the predicted block P _i ^n, 2 rows or two columns, or may be used more pixels in the prediction.

When the index value of the intra prediction mode for the prediction block P _i ⁿ is 0 (planar prediction), the encoded pixel adjacent on the prediction block P _i ⁿ is adjacent to the left of the prediction block P _i ^n. using encoded pixels that generates a predicted image interpolated value according to the distance of the pixel and the prediction pixel in the prediction block P _i ⁿ as the predicted value.

When the index value of the intra prediction mode for the prediction block P _i ⁿ is 2 (average value (DC) prediction), the encoded pixels adjacent on the prediction block P _i ⁿ and the prediction block P _i ⁿ are placed to the left of the prediction block P _i ⁿ . generating a prediction image the mean value of the encoded adjacent pixel as a prediction value of the pixels in the prediction block P _i ^n.
Furthermore, the prediction block P _i ⁿ in the upper end and the area A of Figure 20 positioned at the left end, B, and final prediction image by performing a filtering process for smoothing the block boundary relative to the C. For example, filter processing is performed using the following filter coefficients in the reference pixel arrangement of the filter of FIG.
· Area A (upper left pixel of the partition _P ^{i n)}
a ₀ = 1/2, a ₁ = ¼, a ₂ = ¼
- region B (the upper end of the pixel partition _P ^{i n} other than the region A)
a ₀ = 3/4, a ₂ = ¼, (a ₁ = 0)
· Area C (the leftmost pixel of the partition _P ^{i n} other than the region A)
a ₀ = 3/4, a ₁ = ¼, (a ₂ = 0)
However, when the flag indicating whether or not the field encoding of the sequence level header is valid, the filtering process is not performed on the upper end of the prediction block as shown in FIG. In the case of field coding, since the correlation between the pixels in the vertical direction is low, the prediction efficiency may be deteriorated by the filter processing in the horizontal direction prediction of FIG. Therefore, only the areas A and C are filtered, and the area B is not filtered, so that the amount of calculation can be reduced while suppressing a decrease in prediction efficiency.

However, in the above, when the flag indicating whether or not the field encoding of the sequence level header is valid, only the regions A and C are subjected to the filtering process. However, the region A may be subjected to the same filtering process as the region C. . Thus, by not using the pixels in the vertical direction where the correlation between the pixels is low, it is possible to reduce the amount of computation required for the filter processing while further reducing the possibility of deterioration in prediction efficiency. Alternatively, when it is more important to reduce the amount of calculation, the region A may not be filtered, and only the region C may be filtered.

For prediction block P _i index value of the intra prediction mode for ⁿ is 26 (vertical prediction), and generates a prediction image by calculating the predicted values of the pixels in the prediction block P _i ⁿ from the following equation (1).

However, the coordinates (x, y) is the relative coordinates with the origin at the upper left pixel in the prediction block _P ^{i n} (see Figure 9), S '(x, y) is the predicted value at the coordinates (x, y) , S (x, y) is the luminance value (decoded luminance value) of the encoded pixel at the coordinates (x, y). Further, when the calculated predicted value exceeds the range of values that the luminance value can take, the value is rounded so that the predicted value falls within the range.
Equation (1) shows the filter processing in the vertical direction prediction of FIG. 27. Specifically, the equation in the first row of Equation (1) is MPEG-4 AVC / H. The block boundary is obtained by adding a value obtained by halving the amount of change in the luminance value of the adjacent encoded pixel in the vertical direction to S (x, −1) that is the predicted value of the vertical direction prediction in H.264. This means that filtering is performed so as to be smoothed, and the expression in the second row of Expression (1) is MPEG-4 AVC / H. The same prediction formula as the vertical direction prediction in H.264 is shown.

If the index value of the intra prediction mode for prediction block P _i ⁿ is 10 (horizontal prediction), and generates a prediction image by calculating the predicted values of the pixels in the prediction block P _i ⁿ from the following equation (2).

However, the coordinates (x, y) is the relative coordinates with the origin at the upper left pixel in the prediction block _P ^{i n} (see Figure 9), S '(x, y) is the predicted value at the coordinates (x, y) , S (x, y) is the luminance value (decoded luminance value) of the encoded pixel at the coordinates (x, y). Further, when the calculated predicted value exceeds the range of values that the luminance value can take, the value is rounded so that the predicted value falls within the range.
Equation (2) shows the filter processing in the horizontal prediction of FIG. 27. Specifically, the equation in the first row of Equation (2) is MPEG-4 AVC / H. The block boundary is obtained by adding a value obtained by halving the amount of change in the horizontal luminance value of the adjacent encoded pixel to S (−1, y), which is the predicted value of the horizontal direction prediction in H.264. This means that filtering is performed so as to be smoothed, and the expression in the second line of Expression (2) is MPEG-4 AVC / H. The same prediction formula as the horizontal prediction in H.264 is shown.

However, when the flag indicating whether or not the field encoding of the sequence level header is valid, the equation (3) is used instead of the equation (2) in the horizontal prediction.

That is, as shown in FIG. 22, the filter processing is not performed on the upper end of the prediction block (in the case of average value prediction and vertical direction prediction, the filter processing is performed only on the left end of the prediction block, and in the case of horizontal direction prediction, the filter processing is performed. Do not process). In the case of field coding, since the correlation between pixels in the vertical direction is low, there is a possibility that the prediction efficiency by increasing the continuity of the block boundary by the filter processing in the horizontal direction prediction in FIG. 27 may deteriorate. Therefore, by not performing the filtering process, it is possible to reduce the amount of computation while suppressing a decrease in prediction efficiency.

It should be noted that a flag indicating whether or not the field coding of the sequence level header is prepared in the picture level header, and a prediction block of average value (DC) prediction and horizontal prediction according to the correlation between pixels in the vertical direction of each picture The upper end filter processing may be switched ON / OFF. By doing so, control adaptive to each picture can be realized, and prediction efficiency can be improved. Note that in the case of realizing encoding that adaptively switches between frame encoding and field encoding on a picture-by-picture basis, the flag needs to be prepared in the picture level header.
In the first embodiment, the case has been described in which the filtering process at the upper end of the prediction block is switched on / off based on the flag indicating whether or not the field encoding of the sequence level header or the picture level header is performed. Alternatively, a flag indicating whether or not to perform the main switching process is defined separately from the flag indicating whether or not the field coding of the picture level header is performed, and the upper end of the prediction block is determined based on the flag indicating whether or not to perform the switching process. You may make it switch ON / OFF of a filter process. However, the flag indicating whether or not to perform the switching process is a flag based on the flag indicating whether or not the field encoding is performed.
In the first embodiment, the coding order switching described above and the filter processing switching described above are separately described. However, these may be set in combination.

Further, the block size to be subjected to the filtering process may be limited, for example, the block boundary filtering process of average value (DC) prediction, vertical direction prediction, and horizontal direction prediction is limited to, for example, blocks of 16 × 16 pixels or less. By doing in this way, the amount of calculations required for filter processing can be reduced.

When the index value of the intra prediction mode is other than 0 (plane prediction), 2 (average value prediction), 26 (vertical direction prediction), and 10 (horizontal direction prediction), the prediction direction vector υ _p = ( dx, on the basis of the dy), and it generates the predicted values of the pixels in the prediction block _P ^{i n.}
As shown in FIG. 9, as the origin at the upper left pixel of the prediction block P _i ^n, setting the relative coordinates of the prediction block P _i ⁿ (x, y) and, the position of the reference pixels used for prediction, the following L And the intersection of adjacent pixels.

Where k is a negative scalar value.

When the reference pixel is at the integer pixel position, the integer pixel is set as the prediction value of the prediction target pixel. When the reference pixel is not at the integer pixel position, an interpolation pixel generated from the integer pixel adjacent to the reference pixel is selected. Estimated value.
In the example of FIG. 8, since the reference pixel is not located at the integer pixel position, a value interpolated from two pixels adjacent to the reference pixel is set as the predicted value. Note that an interpolation pixel may be generated not only from two adjacent pixels but also from two or more adjacent pixels, and used as a predicted value.
While increasing the number of pixels used in the interpolation process has the effect of improving the interpolation accuracy of the interpolated pixels, it increases the complexity of the calculation required for the interpolation process, requiring high coding performance even when the calculation load is large. In the case of a video encoding device, it is better to generate interpolation pixels from more pixels.

The processing described above, to generate a predicted pixel for all the pixels of the luminance signal in the prediction block P _i ^n, and outputs an intra prediction image P _INTRAi ^n.
Incidentally, the intra prediction parameters used for generating the intra prediction image P _INTRAi ⁿ (intra prediction mode) is output to the variable length coding unit 13 for multiplexing the bitstream.

Note that the MPEG-4 AVC / H. Similar to the smoothing process performed on the reference image at the time of 8 × 8 pixel block intra prediction of the 264, the intra prediction unit 4, the reference pixels in generating the intermediate prediction image predicted block P _i ⁿ even when configured as a prediction block P _i ⁿ pixels smoothed the encoded pixels adjacent to, it is possible to perform the filtering for the same intermediate predicted image and the above example.

Even for the color difference signal of the prediction block P _i ^n, in the same procedure as the luminance signal, the intra prediction processing based on the intra prediction parameters (intra prediction mode) performed, the intra prediction parameters used for generating the intra prediction image Is output to the variable length encoding unit 13.
However, the intra prediction parameter (intra prediction mode) that can be selected by the color difference signal may be different from that of the luminance signal. For example, in order to reduce the amount of calculation, the MPEG-4 AVC / H. It is good also as the prediction method similar to H.264. In the case of the YUV signal 4: 2: 0 format, the color difference signal (U, V signal) is a signal obtained by reducing the resolution to 1/2 in both the horizontal direction and the vertical direction with respect to the luminance signal (Y signal). Compared to, the complexity of the image signal is low and prediction is easy, so that the number of intra prediction parameters that can be selected is smaller than that of the luminance signal, and the amount of code required to encode the intra prediction parameters can be reduced. May be reduced.

Next, the processing contents of the moving picture decoding apparatus in FIG. 3 will be specifically described.
When the variable length decoding unit 31 receives the encoded bitstream generated by the moving picture encoding apparatus in FIG. 1, the variable length decoding unit 31 performs variable length decoding processing on the bitstream (step ST21 in FIG. 4), and performs field encoding. Header information (sequence level header) composed of one or more pictures, such as a flag indicating whether or not, frame size information, and header information (picture level header) in units of pictures, used in the loop filter unit 38 The filter parameters and quantization matrix parameters to be decoded are decoded.
At this time, the quantization matrix is specified with reference to the quantization matrix parameter variable-length decoded by the variable-length decoding unit 31. Specifically, for each color signal and encoding mode of each orthogonal transform size, a quantization matrix that is prepared in advance by the moving image encoding device and the moving image decoding device in advance as the initial value of the quantization matrix parameter, or When indicating that the quantization matrix is already decoded (not a new quantization matrix), refer to the index information for specifying which quantization matrix among the matrices included in the quantization matrix parameter. When the quantization matrix is specified, and the quantization matrix parameter indicates that a new quantization matrix is to be used, it is specified as a quantization matrix that uses the quantization matrix included in the quantization matrix parameter.
Then, slice unit header information (slice level header) such as slice division information is decoded from slice data constituting picture unit data, and encoded data of each slice is decoded.

Further, the variable length decoding unit 31 determines the maximum coding block size and the upper limit of the number of divided layers determined by the coding control unit 2 of the moving image coding device in FIG. 1 in the same procedure as the moving image coding device. (Step ST22).
For example, when the maximum encoding block size and the upper limit of the number of division layers are determined according to the resolution of the video signal, the maximum encoding is performed in the same procedure as the moving image encoding apparatus based on the decoded frame size information. Determine the block size.
When the maximum encoding block size and the upper limit of the number of divided layers are multiplexed on a sequence level header or the like on the moving image encoding device side, values decoded from the header are used.
Hereinafter, in the video decoding apparatus, the maximum encoded block size is referred to as a maximum decoded block size, and the maximum encoded block is referred to as a maximum decoded block.
The variable length decoding unit 31 decodes the division state of the maximum decoding block as shown in FIG. 6 for each determined maximum decoding block. Based on the decoded division state, a decoded block (a block corresponding to the “encoded block” of the moving image encoding apparatus in FIG. 1) is identified hierarchically (step ST23).

Next, the variable length decoding unit 31 decodes the encoding mode assigned to the decoding block. Based on the information included in the decoded coding mode, the decoded block is further divided into one or more prediction blocks which are prediction processing units, and the prediction parameters assigned to the prediction block units are decoded (step ST24).

That is, when the encoding mode assigned to the decoding block is the intra encoding mode, the variable length decoding unit 31 is included in the decoding block and is intra for each of one or more prediction blocks serving as a prediction processing unit. Decode prediction parameters.
On the other hand, when the coding mode assigned to the decoding block is the inter coding mode, the inter prediction parameter and the motion vector are decoded for each one or more prediction blocks included in the decoding block and serving as a prediction processing unit. (Step ST24).

Furthermore, the variable length decoding unit 31 decodes the compressed data (transformed / transformed transform coefficients) for each orthogonal transform block based on the orthogonal transform block division information included in the prediction difference encoding parameter (step ST24).
At that time, similarly to the encoding process of the compressed data in the variable length encoding unit 13 of the moving image encoding apparatus of FIG. Therefore, normally, as shown in FIG. 28, 16 CGs in units of 4 × 4 pixels are decoded in order from the lower right CG, and each CG decodes 16 coefficients in the CG in order from the lower right coefficient. It will be done. Specifically, first, flag information indicating whether or not a significant (non-zero) coefficient exists in 16 coefficients in the CG is decoded, and then the decoded flag information is significant (non-zero) in the CG. Only when it indicates that a coefficient exists, whether each coefficient in the CG is a significant (non-zero) coefficient is decoded in the order described above, and finally, coefficient value information is sequentially displayed for the coefficient indicating the significant (non-zero) coefficient. Decrypt. This is performed in the above order in units of CG. However, when the flag indicating whether or not the field encoding of the sequence level header decoded by the variable length decoding unit 31 is valid, the moving picture encoding apparatus of FIG. Decoding processing is performed in the same order as the processing order determined by the variable length encoding unit 13. In this way, it is possible to generate the same compressed data as the stream generated by the moving image encoding apparatus in FIG.

In addition, a flag indicating whether or not the sequence level header field coding is prepared in the picture level header, and the coding order of the coefficients when coding the compressed data that is the quantized orthogonal transform coefficient is adapted to each picture. When the variable length coding unit 13 of the moving picture coding apparatus in FIG. 1 is configured so as to be switched, the variable length decoding unit 31 similarly applies the decoding order of compressed data in units of pictures according to the flag. To switch automatically.

If the encoding mode m (B ⁿ ) variable-length decoded by the variable-length decoding unit 31 is an intra-encoding mode (when m (B ⁿ ) ∈INTRA), the changeover switch 33 is changed by the variable-length decoding unit 31. The intra-prediction parameter for each prediction block subjected to variable length decoding is output to the intra-prediction unit 34.
On the other hand, (the case of m ^{(B n)} ∈INTER) variable length decoded coding mode m ^{(B n)} is if the inter coding mode by the variable length decoding unit 31, variable length decoding by the variable length decoding unit 31 The predicted inter prediction parameters and motion vectors in units of prediction blocks are output to the motion compensation unit 35.

When the coding mode m (B ⁿ ) variable-length decoded by the variable-length decoding unit 31 is the intra coding mode (m (B ⁿ ) ∈INTRA) (step ST25), the intra prediction unit 34 selects the changeover switch 33. 1 is received, and the intra prediction parameter is obtained by referring to the decoded image stored in the intra prediction memory 37 in the same procedure as the intra prediction unit 4 in FIG. and implementing intra prediction process to generate an intra prediction image _{P INTRAi} ⁿ for each of the prediction block _P ^{i n} of the decoded block ^{B n} using (step ST26).
However, when the flag indicating whether or not the field coding of the sequence level header decoded by the variable length decoding unit 31 is valid, the average value (DC) prediction and the horizontal direction are performed in the same manner as the moving picture coding apparatus of FIG. The filtering process at the upper end of the prediction block of the prediction is not performed. In this way, it is possible to generate a predicted image that is the same as the stream generated by the moving image encoding device in FIG.

In the moving picture encoding apparatus according to the first embodiment, when a flag indicating whether or not the sequence level header field encoding is provided in the picture level header, whether or not the picture level header field encoding is performed. Filter processing at the upper end of the prediction block of average value (DC) prediction and horizontal direction prediction is switched ON / OFF in units of pictures according to the flag value shown. In this way, it is possible to generate the same predicted image as the stream generated by the moving picture coding apparatus according to Embodiment 1 configured as described above.

When the coding mode m (B ⁿ ) variable-length decoded by the variable-length decoding unit 31 is the inter coding mode (m (B ⁿ ) ∈INTER) (step ST25), the motion compensation unit 35 performs the changeover switch 33. The motion vector and the inter prediction parameter for each prediction block output from the above are received, and the motion vector and the inter prediction parameter are used while referring to the decoded image after filtering stored in the motion compensated prediction frame memory 39. by carrying out inter-prediction processing for each of the prediction block _P ^{i n} of the decoded block ^{B n} to generate an inter prediction image _{P INTERi} ⁿ (step ST27).

When receiving the compressed data and the prediction difference encoding parameter from the variable length decoding unit 31, the inverse quantization / inverse conversion unit 32 performs the prediction difference encoding in the same procedure as the inverse quantization / inverse conversion unit 8 of FIG. With reference to the quantization parameter and orthogonal transform block division information included in the parameters, the compressed data is inversely quantized in units of orthogonal transform blocks.
At this time, when referring to each header information variable-length decoded by the variable-length decoding unit 31, each header information indicates that the inverse quantization process is performed using the quantization matrix in the slice. Inverse quantization processing is performed using a quantization matrix.

At this time, referring to each header information variable-length decoded by the variable-length decoding unit 31, a quantization matrix used for each color signal and coding mode (intra coding or inter coding) with each orthogonal transform size Is identified.
Further, the inverse quantization / inverse transform unit 32 performs an inverse orthogonal transform process on transform coefficients that are compressed data after inverse quantization in units of orthogonal transform blocks, and the inverse quantization / inverse transform unit 8 in FIG. A decoded prediction difference signal identical to the output local decoded prediction difference signal is calculated (step ST28).

Addition unit 36, decodes the prediction difference signal calculated by the inverse quantization and inverse transform unit 32, an intra prediction image P _{INTRAi n} generated by the intra prediction unit 34 ^or, inter prediction generated by the motion compensation unit 35 by adding one of the image P _INTERi ⁿ calculates a decoded image, and outputs the decoded image to the loop filter unit 38, and stores the decoded image to the intra prediction memory 37 (step ST29).
This decoded image becomes a decoded image signal used in the subsequent intra prediction processing.

When the processing of steps ST23 to ST29 for all the decoding blocks ^Bn is completed (step ST30), the loop filter unit 38 performs a predetermined filtering process on the decoded image output from the adding unit 36, and filters The decoded image after processing is stored in the motion compensated prediction frame memory 39 (step ST31).
Specifically, filter (deblocking filter) processing that reduces distortion occurring at the boundaries of orthogonal transform blocks and prediction blocks, processing for adaptively adding an offset in units of pixels (pixel adaptive offset), Wiener filter, etc. The adaptive filter processing that performs the filter processing by adaptively switching the linear filter is performed.
However, the loop filter unit 38 processes each of the above-described deblocking filter processing, pixel adaptive offset processing, and adaptive filter processing with reference to each header information variable-length decoded by the variable-length decoding unit 31 and processing in the corresponding slice. Specify whether or not to perform.
At this time, when two or more filter processes are performed, if the loop filter unit 11 of the moving picture coding apparatus is configured as shown in FIG. 11, the loop filter unit 38 is configured as shown in FIG. Is done.

Here, in the deblocking filter processing, when there is information for referring to the header information that has been variable-length decoded by the variable-length decoding unit 31 and changing various parameters used for selecting the filter strength applied to the block boundary from the initial value. Performs a deblocking filter process based on the change information. When there is no change information, it is performed according to a predetermined method.

In the pixel adaptive offset processing, division is performed based on the block division information of the pixel adaptive offset processing that has been variable length decoded by the variable length decoding unit 31, and the block unit of variable length decoding by the variable length decoding unit 31 is divided into the blocks. When an index indicating a class classification method is referred to and the index is not an index indicating that “offset processing is not performed”, each pixel in the block is classified into blocks in accordance with the class classification method indicated by the index.
In addition, the same class classification method candidate as the pixel classification method candidate of the pixel adaptive offset process of the loop filter unit 11 is prepared in advance as a class classification method candidate.

Then, the loop filter unit 38 refers to the offset information that has been variable-length decoded by the variable-length decoding unit 31 that identifies the offset value of each class in block units, and performs processing to add the offset to the luminance value of the decoded image .

In the adaptive filter processing, after classifying by the same method as the moving picture encoding apparatus of FIG. 1 using the filter for each class variable-length decoded by the variable-length decoding unit 31, the filter is based on the class classification information. Process.
The decoded image after the filter processing by the loop filter unit 38 becomes a reference image for motion compensation prediction and also becomes a reproduced image.

As apparent from the above, according to the first embodiment, when the flag indicating that the input video signal is encoded in the field unit is valid, the intra prediction unit 4 performs the average value prediction or the horizontal value prediction. A configuration in which the filter processing of the upper end of the prediction block when performing intra prediction processing by direction prediction is not performed, and a configuration in which the transform / quantization unit 7 changes the encoding order of transform coefficients, respectively. Therefore, it is possible to realize efficient prediction processing and encoding processing according to the characteristics of the field signal, and to increase the encoding efficiency.

Further, according to the first embodiment, when the flag indicating that the input video signal decoded by the variable length decoding unit 31 is encoded in units of fields is valid, the intra prediction unit 34 A configuration in which the filtering process at the upper end of the prediction block when performing intra prediction processing by value prediction or horizontal prediction is not performed, and a configuration in which the inverse quantization / inverse transform unit 32 changes the decoding order of transform coefficients Are implemented separately or in combination, so that efficient prediction processing and coding processing according to the characteristics of the field signal can be realized, and coding efficiency can be improved. There is an effect that the bitstream encoded by the moving image encoding apparatus of Embodiment 1 can be correctly decoded.

As described above, the moving image encoding device, the moving image decoding device, the moving image encoding method, and the moving image decoding method according to the present invention include a moving image encoding device that performs encoding and decoding processing with high encoding efficiency, This is useful for a moving picture decoding device or the like.

1 block division unit (block division unit), 2 encoding control unit (encoding control unit), 3 changeover switch, 4 intra prediction unit (prediction unit), 5 motion compensation prediction unit (prediction unit), 6 subtraction unit (difference) (Image generation means), 7 transform / quantization section (image compression means), 8 inverse quantization / inverse transform section (local decoded image generation means), 9 addition section (local decoded image generation means), 10 intra prediction memory ( Prediction means), 11 loop filter section (filtering means), 12 motion compensated prediction frame memory (prediction means), 13 variable length coding section (variable length coding means), 14 slice division section (slice division means), 31 variable Long decoding unit (variable length decoding unit), 32 inverse quantization / inverse transform unit (difference image generation unit), 33 changeover switch, 34 intra prediction unit (prediction unit), 5 motion compensation unit (prediction unit), 36 addition unit (decoded image generation unit), 37 intra prediction memory (prediction unit), 38 loop filter unit (filtering unit), 39 motion compensation prediction frame memory (prediction unit), 101 Block division unit, 102 prediction unit, 103 compression unit, 104 local decoding unit, 105 adder, 106 loop filter, 107 memory, 108 variable length coding unit.

Claims

Variable length encoding means for generating an encoded bitstream in which compressed data and encoding mode are multiplexed,
The variable length coding means divides the orthogonal transform block into orthogonal transform sub-blocks, and after quantization, which is compressed data, based on whether or not a flag based on information indicating whether or not field coding is valid. The moving picture coding apparatus is characterized in that the coding order of the transform coefficients is switched in units of the orthogonal transform block and the orthogonal transform sub-block.
A slice dividing unit that divides an input image into slices that are a plurality of partial images, and determines the maximum size of an encoding block that is a processing unit when encoding processing is performed, and the encoding block of the maximum size is hierarchical Coding for determining the upper limit number of layers when divided and selecting a coding mode corresponding to each coding block divided hierarchically from one or more available coding modes The control unit and the slice divided by the slice dividing unit are divided into coding blocks of the maximum size determined by the encoding control unit, and the upper limit number of hierarchies determined by the encoding control unit is reached. Block dividing means for hierarchically dividing the coded block, coded blocks divided by the block dividing means, and intra prediction means A difference image generation unit that generates a difference image with the predicted image generated by the method, and a conversion process of the difference image generated by the difference image generation unit, quantizes the conversion coefficient of the difference image, and performs quantization An image compression unit that outputs the subsequent transform coefficient as compressed data, a differential image is decoded from the compressed data output from the image compression unit, and the decoded differential image and the prediction image generated by the prediction unit are added. A local decoded image generating means for generating a local decoded image and a variable length encoding means for generating an encoded bitstream,
The variable-length encoding unit performs variable-length encoding on the compressed data output from the image compression unit, the encoding mode selected by the encoding control unit, and a flag indicating whether or not the field encoding is performed. 2. The moving picture coding apparatus according to claim 1, wherein a coded bit stream in which the coded data of the data, the coding mode, and the flag is multiplexed is generated.
Variable length decoding means for variable length decoding the compressed data and the coding mode for each coding block divided hierarchically;
The variable length decoding means divides the orthogonal transform block into orthogonal transform sub-blocks, and determines whether or not the quantized compressed data is based on whether or not a flag based on information indicating whether or not field coding is valid. The moving picture decoding apparatus, wherein the decoding order of transform coefficients is switched in units of the orthogonal transform block and the orthogonal transform sub-block.
Variable length decoding is performed on header information including a flag indicating whether or not field encoding is performed from encoded data multiplexed in an encoded bitstream, and each encoded block hierarchically divided from the encoded data is encoded. A variable length decoding unit that performs variable length decoding on the compressed data and the encoding mode, and a prediction process according to the encoding mode related to the encoding block that has been variable length decoded by the variable length decoding unit, to generate a prediction image A difference image before compression by inversely quantizing a transform coefficient that is compressed data relating to a coding block that has been variable length decoded by the variable length decoding means, and by performing a reverse transform on the transform coefficient after the inverse quantization A decoded image is generated by adding the difference image generation means for generating the difference image, the difference image generated by the difference image generation means, and the prediction image generated by the prediction means Video decoding apparatus according to claim 3, characterized in that it comprises a No. image generating means.
A variable length encoding step for generating an encoded bitstream in which compressed data and an encoding mode are multiplexed,
The variable-length coding step divides the orthogonal transform block into orthogonal transform sub-blocks, and after quantization, which is compressed data, based on whether or not a flag based on information indicating whether or not field coding is valid. A moving picture coding method characterized by switching the coding order of transform coefficients in units of the orthogonal transform block and the orthogonal transform sub-block.
A variable-length decoding step for variable-length decoding the compressed data and the encoding mode associated with each encoding block that is hierarchically divided;
The variable length decoding step divides the orthogonal transform block into orthogonal transform sub-blocks, and determines whether or not the quantized compressed data is based on whether or not a flag based on information indicating whether or not field coding is valid. A moving picture decoding method, wherein the decoding coefficient decoding order is switched in units of the orthogonal transform block and the orthogonal transform sub-block.