WO2010007719A1

WO2010007719A1 - Image encoding apparatus, image encoding method, image decoding apparatus, and image decoding method

Info

Publication number: WO2010007719A1
Application number: PCT/JP2009/002453
Authority: WO
Inventors: 齋藤昇平; 影山昌弘; 横山徹; 中村克行; 高橋昌史
Original assignee: 株式会社日立製作所
Priority date: 2008-07-16
Filing date: 2009-06-02
Publication date: 2010-01-21
Also published as: JPWO2010007719A1

Abstract

A reference image having a higher resolution is generated to contribute to improvement of motion prediction accuracy from the viewpoint of encoding. An image encoding apparatus is capable of encoding the difference between original image data and motion prediction image data, and generating the motion prediction image data on the basis of local decoded image data that can be acquired by decoding the encoded data. The image encoding apparatus includes a frame memory (109) that maintains the local decoded image data across a plurality of frames, and decimal pixel image data processing units (110) and (111) for generating decimal pixel precision image data by using local decoded image data of a frame to be encoded that is maintained by the frame memory and local decoded image data of the preceding frame. The image encoding apparatus executes motion detection by using the decimal pixel precision image data generated by the decimal pixel image data processing units as reference image data, and generates motion prediction image data.

Description

Image coding apparatus, image coding method, image decoding apparatus, and image decoding method

The present invention relates to a moving picture compression and decompression technique using a compensation technique, and more particularly to an apparatus and method for image encoding or decoding using a fractional pixel precision image for the compensation technique.

In image encoding / decoding processes such as MPEG-2, MPEG-, H.264, etc., as described in Non-Patent Document 1, filters from adjacent pixels are used to perform motion detection / compensation with sub-pixel image accuracy. A reference image is generated by interpolation. The reference image with decimal pixel image accuracy generated by filter interpolation is obtained by performing filter interpolation to obtain a pixel value with decimal pixel image accuracy around the pixel of the minimum cost function searched in the integer accuracy motion search. . Compared with the case where image data with integer pixel accuracy is used as the reference image, the quality of the reproduced image by decoding can be improved.

In existing image coding standards such as MPEG-2, MPEG-4, H.264, etc., in filter interpolation processing for obtaining a reference image for performing motion compensation with decimal pixel image accuracy in encoding and decoding processing, In the case of MPEG-2, a 2-tap filter is applied, but high-frequency components are cut due to simple pixel interpolation, resulting in a decrease in motion prediction accuracy. In addition, although 8-tap and 6-tap filter processing is applied to MPEG-4 ASP and H.264, respectively, the adjustment of high-frequency components is not sufficient, and the encoding efficiency is improved by improving the prediction accuracy. It was a challenge. In any case, a conventional reference image for performing motion compensation with decimal pixel image accuracy is generated only from image data in a frame to which a macroblock to be processed belongs.

The present invention has been made in view of the above problems, and an object thereof is to generate a reference image with higher resolution in an apparatus and method for image encoding and image decoding. Another object is to improve motion prediction accuracy. Yet another object is to contribute to high image quality.

The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

The outline of typical inventions disclosed in the present application will be briefly described as follows.

That is, in the image data encoding technique, the motion prediction image data is generated based on local decoded image data obtained by encoding the difference between the original image data and the motion prediction image data and decoding the encoded data. When the local decode image data is stored for a plurality of frames, decimal pixel precision image data is generated using the local decode image data of the frame to be encoded and the local decode image data of the previous frame. Then, motion detection is performed using the decimal pixel precision image data as reference image data to generate motion prediction image data.

Also, in the image data decoding technique, motion prediction image data is added to a motion prediction error obtained by decoding encoded image data separated from encoded data encoded by motion prediction, and image data is reproduced. When the motion prediction image data is generated based on the reproduced image data that has been reproduced, the reproduction image data of the decoding target frame and the previous frame are decoded from a frame memory that holds the reproduction image data for a plurality of frames. The reproduced image data is used to generate decimal pixel accuracy image data, and motion detection is performed using the generated decimal pixel accuracy image data as reference image data to generate motion prediction image data.

In both encoding and decoding, the decimal pixel precision image data as the reference image data is generated by using both the image data of the processing target frame and the image data of another frame separated from the frame. Therefore, it is possible to generate decimal pixel accuracy image data with higher accuracy than in the case of generating only by interpolation.

The following is a brief description of the effects obtained by the representative inventions disclosed in the present application.

That is, a higher-resolution reference image can be generated. Thereby, it can contribute to improvement of motion prediction accuracy. Furthermore, it can contribute to high image quality.

It is a block diagram of an example of a structure of the image coding apparatus which concerns on this invention. It is explanatory drawing which shows roughly the generation method of the decimal pixel accuracy image which the decimal pixel accuracy image generation method determination part determines. It is a flowchart which illustrates the determination process in a decimal pixel precision image generation method determination part. It is a flowchart which shows an example of a process of the image coding apparatus which concerns on this invention. It is the figure which showed the outline | summary of the high resolution technique in an image coding apparatus. It is a block diagram of an example of the high resolution process in an image coding apparatus. It is a figure which shows the outline | summary of the high resolution technique in an image coding apparatus. It is explanatory drawing of the phase relationship of the input signal of the 1st structural example of the decimal pixel image generation part in an image coding apparatus. It is explanatory drawing of the phase relationship of the input signal of the 2nd structural example of the decimal pixel image generation part in an image coding apparatus. It is the frequency-gain characteristic of the up-rate device used for the 2nd structural example of the decimal pixel image generation part in an image coding apparatus. It is an example of the tap coefficient of the filter obtained by carrying out the inverse Fourier transform of the frequency characteristic of the up-rate device used for the 2nd structural example of the decimal pixel image generation part in an image coding apparatus. It is the (pi) / 2 phase shifter frequency-gain characteristic used for the 2nd structural example of the decimal pixel image generation part in an image coding apparatus. The filter tap coefficients obtained by inverse Fourier transform of the frequency characteristics of the π / 2 phase shifter used in the second configuration example of the decimal pixel image generation unit in the image encoding device are shown. It is an example of the coefficient of the coefficient determinator used for the 2nd structural example of the decimal pixel image generation part in an image coding apparatus. It is explanatory drawing of the motion search in a screen in an image coding apparatus. It is a flowchart of an example of the high resolution process in an image coding apparatus. It is a block diagram of an example of a structure of the image decoding apparatus which concerns on this invention. It is a flowchart which shows an example of the process of the image decoding apparatus which concerns on this invention. It is the figure which showed an example of the bit stream which concerns on this invention.

DESCRIPTION OF SYMBOLS 101 ... Original image memory 102 ... Subtractor 103 ... Frequency conversion part 104 ... Quantization part 105 ... Variable

length coding part

106, 1503 ...

Inverse quantization part

107, 1504 ... Inverse

frequency conversion part

108, 1505 ...

Adder

109, 113, 1506, 1510 ...

Frame memory

110, 1507 ... Decimal pixel accuracy image generation

method determination unit

111, 1508 ... Decimal pixel accuracy

image generation unit

112, 1509 ... Motion detection /

motion compensation unit

114, 1512 ... In-screen prediction unit 401 ... Position estimation unit 403, 404 ... Up-rater 406, 407 ...

Phase shifter

410, 411, 412, 413 ... Multiplier 409 ... Coefficient determination unit 1501 ... Variable length decoding unit 1502 ... Syntax analysis unit 1511 ... Video display device

1. First, an outline of a typical embodiment of the invention disclosed in the present application will be described. Reference numerals in the drawings referred to in parentheses in the outline description of the representative embodiments merely exemplify what are included in the concept of the components to which the reference numerals are attached.

[1] << Image Encoding Device >> Encoding a difference between original image data and motion predicted image data and generating the motion predicted image data based on local decoded image data obtained by decoding the encoded data The image encoding apparatus capable of encoding includes a frame memory (109) for holding the local decoded image data for a plurality of frames, a local decoded image data of a frame to be encoded, which is stored in the frame memory, and a frame before the frame memory (109). A sub-pixel image data processing unit (110, 111) that generates sub-pixel accuracy image data using local decoded image data of the frame. The image encoding device performs motion detection using the decimal pixel accuracy image data generated by the decimal pixel image data processing unit as reference image data to generate motion prediction image data.

From the above, the decimal pixel precision image data as the reference image data can be generated by using both the local decoded image data of the encoding target frame and the local decoded image data of another frame separated from the frame. Since this is possible, it is possible to generate more accurate decimal pixel accuracy image data than in the case of generating only by interpolation.

Thereby, it is possible to contribute to improvement of motion prediction accuracy from the viewpoint of encoding, and it is possible to contribute to high image quality of an image obtained through encoding and decoding.

[2] In the image encoding device according to item 2, the decimal pixel image data processing unit includes the local decoded image data of the encoding target frame stored in the frame memory and the local decoded image of the previous frame in a predetermined range before the encoding target frame. It is determined whether or not the amount of motion between the data is decimal pixel accuracy, and when the determination result of decimal pixel accuracy is obtained, local decoding image data of the previous frame and local decoding of the encoding target frame The decimal pixel accuracy image data is generated using the image data, and when the determination result of the decimal pixel accuracy is not obtained, the decimal pixel accuracy image data is generated by the interpolation operation on the local decoded image data of the encoding target frame.

From the above, when the amount of motion is integer pixel accuracy (for example, in the case of a still image), even if the former processing is performed to generate decimal pixel accuracy image data, the result is substantially the same as the latter. In this case, by selecting the latter interpolation processing, useless data processing is reduced, and the processing amount and processing time are reduced. In addition, as the former processing target frame is separated, high accuracy or high image quality of the decimal pixel accuracy image data cannot be expected. In consideration of this point, if there is no fractional pixel accuracy image data in a frame range that is a predetermined number of frames away from the encoding target frame, the pixel accuracy is not determined for the subsequent frames, and the processing by interpolation is selected. In this way, wasteful processing is reduced as much as possible.

[3] In the image encoding device according to item 2, the decimal pixel image data processing unit uses information on a screen as a target for determining whether or not the decimal pixel accuracy is obtained without performing motion prediction in the time direction. The frame is limited to an I picture (Intra-Picture) which is a screen obtained by encoding using the above, or a P picture (Predictive-Picture) which is a screen obtained by forward predictive coding between screens. A B picture (Bi-directional Predictive-Picture) that is obtained by predictive coding from the past and the future is excluded. Thereby, it is possible to generate decimal pixel accuracy image data with higher accuracy or higher image quality.

[4] In the image coding device according to item 3, the decimal pixel image data processing unit sequentially determines the frame form and the decimal pixel from the previous frame closest to the encoding target frame with respect to the previous frame within the predetermined range. It is determined whether or not the accuracy. This is because, as the local decoded image data of the frame closer to the encoding target frame is used, the higher pixel quality image data with higher image quality (higher accuracy) can be generated.

[5] In the image encoding device according to item 1, the decimal pixel image generation unit performs, for example, phase shift processing on each of a plurality of image signals of image data to generate a plurality of new image signals. Then, the pixel image precision image data is generated by multiplying the plurality of image signals and the new plurality of image signals by a coefficient and combining them.

[6] << Image coding method >> The image coding method includes the following processes (a) to (l). (a) Read processing for reading motion prediction image data from the prediction image memory, (b) Difference processing for calculating a difference between the read motion prediction image data and input image data as prediction error data, (c) The difference Frequency conversion processing for frequency conversion of the prediction error data calculated in the processing, (d) quantization processing for quantizing the frequency converted data in the frequency conversion processing, (e) data quantized by the quantization processing Variable-length encoding processing for variable-length encoding and generating an encoded stream, (f) inverse quantization processing for inverse-quantizing the data quantized by the quantization processing, and (g) inverse processing by the inverse-quantization processing. Inverse frequency transform processing for reproducing the prediction error data by performing inverse frequency transform on the quantized data, and (h) adding the prediction error data reproduced by the inverse frequency transform processing and the motion prediction image data Local decoded image data (I) a process of storing the local decoded image data calculated in the addition process in a frame memory; (j) a local decoded image data of the encoding target frame held in the frame memory and Decimal pixel image data processing for generating decimal pixel accuracy image data using local decoded image data of the previous frame, and (k) the decimal pixel accuracy image data generated by the decimal pixel image data processing is referred to as reference image data Motion detection / motion compensation processing for generating a predicted image by performing motion detection, and (l) write processing for writing motion predicted image data generated by the motion detection / motion compensation processing to the predicted image memory.

[7] << Image decoding apparatus >> The encoded data encoded by the motion prediction is input and separated into the encoded image data and the additional information, and the motion prediction error obtained by decoding the separated encoded image data is detected. An image decoding device capable of regenerating image data by adding motion prediction image data and generating the motion prediction image data based on the reproduced image data that has been reproduced includes a plurality of frames of the reproduction image data. Decimal pixel image data for generating fractional pixel precision image data using a frame memory (1506) to be held, and reproduction image data of a decoding target frame and reproduction image data of a frame before that held by the frame memory And processing units (1507, 1508). The image decoding apparatus performs motion detection using the decimal pixel accuracy image data generated by the decimal pixel image data processing unit as reference image data to generate motion prediction image data.

As described above, the decimal pixel precision image data as the reference image data can be generated using both the reproduced image data of the decoding target frame and the reproduced image data of another frame separated from the frame. Therefore, it is possible to generate more accurate decimal pixel accuracy image data than in the case of generating only by interpolation.

This can contribute to improvement of motion prediction accuracy from the viewpoint of decoding, and can further contribute to high image quality of an image obtained through coding and decoding.

[8] In the image decoding device according to item 7, the decimal pixel image data processing unit is configured to perform a process between the reproduced image data of the decoding target frame stored in the frame memory and the reproduced image data of the previous frame in a predetermined range before the decoding target frame. It is determined whether or not the amount of motion is decimal pixel accuracy, and when a determination result of decimal pixel accuracy is obtained, the decimal number is obtained using the reproduced image data of the previous frame and the reproduced image data of the decoding target frame. Pixel accuracy image data is generated, and when a determination result of decimal pixel accuracy is not obtained, the decimal pixel accuracy image data is generated by an interpolation operation on the reproduction image data of the decoding target frame. As a result, useless data processing is reduced, which can contribute to reduction in processing amount and processing time.

[9] In the image decoding device according to item 8, the decimal pixel image data processing unit limits a target of determination as to whether or not the decimal pixel accuracy is the frame form of an I picture or a P picture. It is possible to generate decimal pixel accuracy image data with higher accuracy or higher image quality.

[10] In the image decoding device according to [9], the decimal pixel image data processing unit sequentially determines the frame form and the decimal pixel accuracy from the previous frame closest to the decoding target frame with respect to the previous frame within the predetermined range. It is determined whether or not there is. This is because, as the reproduced image data of the frame closer to the decoding target frame is used, the higher pixel quality image data with higher image quality (high accuracy) can be generated.

[11] In the image decoding device according to item 7, the decimal pixel image generation unit performs, for example, phase shift processing on each of the plurality of image signals of the image data to generate a plurality of new image signals, The plurality of image signals and the new plurality of image signals are multiplied and combined to generate decimal pixel precision image data.

[12] << Decoding Method >> The image decoding method includes the following processes (a) to (l). (a) Variable-length decoding process for variable-length decoding by inputting an encoded stream made up of encoded data encoded by motion prediction; (b) encoded image data and additional information for variable-length decoded encoded data (C) Inverse quantization processing for inverse quantization of the separated encoded image data, (d) Motion prediction error by inverse frequency conversion of the data inversely quantized by the inverse quantization processing (E) an addition process for regenerating image data by adding motion prediction image data in the prediction image memory to a motion prediction error reproduced by the inverse frequency conversion process, and (f) the addition process. A reproduction image data writing process for writing the reproduction image data reproduced in step (b) to a frame memory, and (g) reproduction image data of a decoding target frame and reproduction image data of a frame before that held by the frame memory, Decimal pixel image data processing for generating decimal pixel accuracy image data using, (h) motion compensation for generating motion prediction image data using the decimal pixel accuracy image data generated by the decimal pixel image data processing and the additional information And (i) a writing process for writing motion prediction image data generated in the motion compensation process into the prediction image memory.

[13] In the image decoding method according to item 12, in the decimal pixel image data processing, a target of determination as to whether or not the decimal pixel accuracy is accurate is limited to a frame form of an I picture or a P picture.

[14] In the image decoding method according to item 13, the decimal pixel image data processing is performed by sequentially determining the frame form and the decimal pixel accuracy from the previous frame closest to the decoding target frame with respect to the previous frame in the predetermined range. It is determined whether or not.

[15] In the image decoding method according to item 12, the decimal pixel image data processing performs phase shift processing on each of the plurality of image signals of the image data to generate a plurality of new image signals, A plurality of image signals and a new plurality of image signals are multiplied and combined to generate decimal pixel precision image data.

2. Details of Embodiments Embodiments will be further described in detail. DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiments for carrying out the invention, and the repetitive description thereof will be omitted.

<< Image Encoding Device and Image Encoding Method >>
FIG. 1 shows an example of an image encoding device according to the present invention. An original image memory 101 stores input image data. Reference numeral 102 denotes a subtracter that takes a difference between input image data output from the original image memory and predicted image data output from the frame memory 113. Reference numeral 103 denotes a frequency conversion unit that converts the difference image between the original image data calculated by the subtractor 102 and the predicted image and others into the spatial frequency domain. A quantization unit 104 quantizes the data frequency-converted by the frequency conversion unit 103. Reference numeral 105 denotes a variable length coding unit that performs variable length coding on the data quantized by the quantization unit 104. Reference numeral 106 denotes an inverse quantization unit that inversely quantizes the data quantized by the quantization unit 104. Reference numeral 107 denotes an inverse frequency conversion unit that performs inverse frequency conversion on the data inversely quantized by the inverse quantization unit 106. Reference numeral 108 denotes an adder that adds the predicted image data stored in the frame memory 113 to the data subjected to inverse frequency conversion by the inverse frequency conversion unit 107. Reference numeral 109 denotes a frame memory for storing data (local decoded image data) obtained by addition by the adder 108. 110 is a decimal pixel accuracy image generation method determination unit, and 111 is a decimal pixel accuracy image generation unit.

The decimal pixel accuracy image generation method determination unit 110 and the decimal pixel accuracy image generation unit 111 store the local decoded image data of the encoding target frame and the local decoded image data of the previous frame held by the frame memory 109. A decimal pixel image data processing unit that makes it possible to generate decimal pixel precision image data by using it is configured. The decimal pixel accuracy image generation method determination unit 110 determines the generation direction, and the decimal pixel accuracy image generation unit 111 generates the decimal pixel accuracy image data from the local decoded image data according to the determined generation method. Details thereof will be described later.

112 is a motion detection / compensation unit that detects an image close to the original image by motion detection using the decimal pixel accuracy image generated by the decimal pixel accuracy image generation unit 111 as a reference image, and generates predicted image data. A frame memory 113 stores an image (predicted image data) generated by the motion detection / compensation unit 112. Reference numeral 114 denotes an intra-screen prediction unit that generates a predicted image using data in a frame from local decoded image data stored in the frame memory 109.

The decimal pixel accuracy image generation method determined by the decimal pixel accuracy image generation method determination unit 110 is roughly divided into A and B in FIG. In the first method shown in A, decimal pixel precision image data is generated using local decoded image data of two different frames, ie, an encoding target frame and a previous frame. For the generation, the resolution of the image blocks (for example, macro blocks) of two frames is increased by using a super-resolution process described later. The second method shown in B generates decimal pixel precision image data using local decoded image data of one frame of the encoding target frame. For the generation, an image block (for example, a macro block) of one frame is increased in resolution by using an interpolation operation. The decimal pixel precision image data generated by the first method may be generated by using both the local decoded image data of the encoding target frame and the local decoded image data of another frame separated from the frame. Since it is possible, the 1st method can produce | generate more accurate decimal pixel precision image data compared with the 2nd method produced | generated only by interpolation. If the second method is adopted, the calculation processing data for generating the decimal pixel accuracy image data is reduced because the calculation processing data is less than that of the first method.

Whether the first method or the second method is used is determined by, for example, local decoding image data of an encoding target frame stored in the frame memory and local decoding image data of a previous frame in a predetermined range before that. It is determined whether or not the amount of motion during the period is decimal pixel accuracy. The first method is used when the determination result of decimal pixel accuracy is obtained, and the second method is used when the determination result of decimal pixel accuracy is not obtained. Select a method.

The previous frame candidate is selected as follows.
(1) A selection candidate frame is obtained by I-picture (Intra-Picture), which is a screen obtained by encoding using information in a screen without performing temporal motion prediction, or by forward prediction encoding between screens. It is limited to the frame of P picture (Predictive-Picture) that is the obtained screen. A B picture (Bi-directional Predictive-Picture) which is obtained by predictive coding from the past and the future is excluded. Thereby, it is possible to generate decimal pixel accuracy image data with higher accuracy or higher image quality.
(2) Of the frames satisfying the first condition, a picture closest to the encoding target frame is set as a candidate frame. Thereby, it is possible to generate a decimal pixel accuracy image with higher image quality (high accuracy). This is because, as the local decoded image data of the frame closer to the encoding target frame is used, the higher pixel quality image data with higher image quality (higher accuracy) can be generated.
(3) When the motion detection result between the frame that satisfies the first and second conditions and the encoding target frame is a motion amount with integer pixel accuracy in both the vertical direction and the horizontal direction, A frame that is another near frame and is an I or P picture is set as a candidate. As a result, it is possible to solve the problem that the super-resolution processing cannot be performed with the motion amount with integer pixel accuracy.
(4) Even if the past frames in a predetermined range are traced back, if there is no frame satisfying the above (3), the selection of candidates is terminated. In this case, the second method by the normal interpolation enlargement process is selected. Further, as the previous frame according to the first method moves away from the encoding target frame, it becomes impossible to expect high accuracy or high image quality of the decimal pixel accuracy image data. In consideration of this point, if there is no fractional pixel accuracy image data in a frame range that is a predetermined number of frames away from the encoding target frame, the pixel accuracy is not determined for the subsequent frames, and the processing by interpolation is selected. Therefore, useless processing can be reduced as much as possible. Further, by limiting the range of the number of frames in this way, the number of frames stored in the frame memory can be reduced, and hardware resources constituting the frame memory can be reduced. Thereby, cost reduction can be achieved.

In FIG. 1, the decimal pixel accuracy image generation method determination unit 110 determines the first method or the second method by the frame selection unit 120, the motion detection unit 121, and the determination unit 122. The frame selection unit 120 reads local decoded image data of a predetermined frame from the frame memory 109 according to a predetermined procedure. The motion detection unit 121 performs motion detection on local decoded image data of two frames read by the frame selection unit. The determination unit 122 selects the first method or the second method as described above according to the motion detection result or the like.

FIG. 3 illustrates a flowchart of determination processing in the decimal pixel accuracy image generation method determination unit 110. The selection of the previous frame is started from the frame immediately before the encoding target frame (130). It is determined whether the selected previous frame is a frame separated from the encoding target frame by a predetermined number of frames (131). If the selected previous frame is separated, the second method is selected, and the fractional pixel accuracy according to the second method is selected. An instruction to generate image data and necessary data are provided to the decimal pixel precision image generation unit 111 (132).

If it is not a predetermined number of frames away, it is determined whether the previous frame is a B picture. If it is a B picture, the processing of step 131 is also returned. If it is not a B picture, between the previous frame and the encoding target frame Motion detection is performed (134). As a result of the motion detection, when the motion amount is an integer pixel accuracy, the processing in step 131 is also returned. If the amount of motion is not integer pixel accuracy, instructions for generating decimal pixel accuracy image data by the first method and necessary data (local decoded image data of the encoding target frame, local decoded image data of the selected previous frame, The motion detection result image data and motion vector information) are provided to the decimal pixel precision image generation unit 111 (136).

FIG. 4 shows the overall processing flow of the encoding process. The overall flow of the encoding process will be described with reference to FIG. First, the encoding apparatus stores input image data in the original image memory 101 (201). Examples of input image data include digital signals such as RGB signals, Y, Cb, and Cr signals. In the encoding apparatus, the input image may be stored in the original image memory 101 for one frame, or may be divided into a plurality of pixel blocks and stored in units of the pixel blocks. Next, the difference between the original image data read from the original image memory 101 and the predicted image data is calculated (202). If there is no difference between the original image data and the predicted image data, the encoding process is terminated (203). At this time, if information without difference information is added to the stream, the processing on the decoding side can be simplified. When there is a difference between the original image data and the predicted image data, the difference image calculated by the subtracter 102 is converted into the frequency domain by using the frequency conversion unit 103 such as discrete cosine transform (DCT). . The frequency transformation may use other transformations such as Hadamard transformation and Fourier transformation in addition to DCT. When a plurality of frequency conversions are used, information for identifying the type of frequency conversion may be added to the stream. In addition, the block size of the frequency conversion may be different in the vertical and horizontal sizes, for example, 16 × 8 pixels, even if the vertical and horizontal sizes are the same, such as 8 × 8 pixel units. The block size information for frequency conversion may be added to the stream.

The data frequency-converted by the frequency converter 103 is quantized by the quantizer 106 (205). For the quantization process, a method based on a conventional moving image coding standard may be used, or a new quantization step may be determined. When a new quantization step is determined, quantization step information may be added to the stream. The data quantized by the quantization unit 106 is encoded by the variable length encoding unit 105. As a variable-length coding method, methods such as CABAC (Context-Adaptive Binary Arithmetic Coding) and CAVLC (Context-Adaptive Variable Length Coding) adopted in conventional coding standards may be used. You may create a new one. In that case, the code table information is added to the encoded stream.

Next, inverse quantization is performed by the inverse quantization unit 106 (206). As the inverse quantization method, a method based on the conventional video coding standard may be used. The data calculated by the inverse quantization unit 106 is subjected to inverse frequency conversion by the inverse frequency conversion unit 107 (207). The inverse frequency transform unit 107 performs inverse transform from the frequency domain to the spatial domain using the frequency transform block size and the type of frequency transform performed by the frequency transform unit 103. The inverse frequency converted data and the data stored in the frame memory 113 are added and stored in the frame memory 109. Next, as described above, the frame selection unit 120 selects the data stored in the frame memory 109, and the motion detection unit 110 performs pixel-by-pixel motion on the selected frame data and encoding target frame data. A detection process is performed to determine a decimal pixel precision image generation method (208). The motion detection process may use the block matching method that has been used in the conventional encoding process, or may be performed on a pixel-by-pixel basis in order to improve the accuracy of the decimal pixel image. At this time, if the motion vector information is added to the encoded stream, the amount of data becomes enormous. Therefore, the amount of data can be reduced by performing the same motion detection process on the decoding side as that on the encoding side. In that case, a table in which the motion detection method is determined on the encoding side is prepared, and the table number may be added to the stream. On the other hand, when motion vector information is sent, it is not necessary to perform motion detection on the decoding side, and it is not necessary to add a motion detection method. Therefore, by determining whether or not to reduce the amount of data by adding a motion detection method on the user side, by adding the determination information to the stream, an appropriate encoding process according to the processing performance of hardware or the like It can be performed. The decimal pixel accuracy image generation unit 111 generates image data with decimal pixel accuracy using the motion vector detected by the decimal pixel accuracy image generation method determination unit 110 and a plurality of image data stored in the frame memory 109 (209). ). Details of the method of generating image data with decimal pixel accuracy according to the first method will be described later. Since the second method based on the interpolation calculation is the same as the calculation method used in the motion search process with the fractional pixel accuracy known in MPEG-4, H.264 / AVC, etc., detailed description thereof will be omitted.

Next, motion detection / compensation processing is performed using the decimal pixel image data and the original image data generated by the decimal pixel image generation unit 111 to generate a predicted image (210). The motion detection / compensation processing (210) may be performed by calculating a motion vector with decimal pixel accuracy using a block matching method used in the conventional encoding processing. At this time, if motion vector information with decimal pixel accuracy is added to the encoded stream, the amount of data becomes enormous, and therefore the amount of data can be reduced by performing the same motion detection processing on the decoding side as on the encoding side. The above processing is repeated until the processing of all the blocks in the frame of the input video is completed (211).

Here, an outline of a case where, for example, a method described in Japanese Patent Application Laid-Open No. 2007-324789 is applied as a method for generating image data with decimal pixel accuracy by the first method will be described.

FIG. 5 shows an outline of the generation processing of the decimal pixel accuracy image. The decimal pixel precision image generation unit 111 uses a plurality of image data 301 stored in the frame memory 109 and a motion vector between the plurality of image data 301 detected by the motion detection unit 110 to generate a plurality of image data 301. Are aligned, and a pixel value is multiplied by a predetermined coefficient to synthesize each pixel of a plurality of images after alignment, thereby generating a decimal pixel precision image (also referred to as a high resolution image) 302.

FIG. 16 is a flowchart showing the flow of processing of the decimal pixel accuracy image generation unit 111. The decimal pixel accuracy image generation unit 111 generates a high resolution image by three processes, for example, (1) position estimation, (2) wideband interpolation, and (3) weighted sum. Here, (1) position estimation is to estimate the difference in sampling phase (sampling position) of each image data using each image data of a plurality of input image frames (1401, 1402). (2) Wideband interpolation increases the image data density by interpolating and increasing the number of pixels (sampling points) using a wide-band low-pass filter that transmits all high-frequency components of the original signal, including aliasing components. (1403). (3) The weighted sum is a weighted sum corresponding to the sampling phase of each densified data, canceling out aliasing components generated during pixel sampling and simultaneously removing the high-frequency components of the original signal. Is restored (1404).

Fig. 7 shows an overview of this high-resolution image generation technology. As shown in FIG. A, frame # 1 (501), frame # 2 (502), and frame # 3 (503) on different time axes are input and synthesized to obtain an output frame (506). Is assumed. For simplicity, first consider the case where the subject has moved (504) in the horizontal direction, and consider generating a fractional pixel image by one-dimensional signal processing on the horizontal line (505). At this time, as shown in FIG. B and FIG. D, in the frame # 2 (502) and the frame # 1 (501), the signal waveform is displaced depending on the amount of movement (504) of the subject. The position deviation amount is obtained by (1) position estimation, and as shown in FIG. 3C, the frame # 2 (502) is motion-compensated (507) so that the position deviation is eliminated, and the pixels (508) of each frame are also compensated. The phase difference θ (511) between the sampling phases (509) and (510) is obtained. Based on this phase difference θ (511), by performing the above (2) wideband interpolation and (3) weighted sum, as shown in FIG. E, just the middle of the original pixel (508) (phase difference θ = π Sub-pixel image generation is realized by generating a new pixel (512) at the position). (3) The weighted sum will be described later. Actually, the movement of the subject may be accompanied by movements such as rotation and enlargement / reduction as well as parallel movement, but if the time interval between frames is very small or the movement of the subject is slow, These movements can also be considered by approximating local translation.

The first configuration example of the decimal pixel precision image generation unit 111 includes Reference Document 1 (Japanese Patent Laid-Open No. 8-336046), Reference Document 2 (Japanese Patent Laid-Open No. 9-69755), Reference Document 3 (Shin Aoki “Multiple Digital Images”. "Super-resolution processing by data", "Ricoh Technical Report pp.19-25," No.24, "NOVEMBER," 1998)). In the first configuration example of the sub-pixel image generation unit 111, when performing the weighted sum of (3) above, as shown in FIG. 8, if signals of at least three frame images are used, 2 in the one-dimensional direction is used. Double high-resolution image generation is possible.

Here, the decimal pixel accuracy image generation processing in the first configuration example of the decimal pixel accuracy image generation unit 111 will be described with reference to FIG. FIG. 8 is a diagram showing the frequency spectrum of each component in a one-dimensional frequency region. In the figure, the distance from the frequency axis represents the signal intensity, and the rotation angle around the frequency axis represents the phase. The weighted sum of (3) above will be described in detail below.

When the pixel interpolation is performed with the broadband low-pass filter that transmits twice the Nyquist frequency band (frequency band 0 to sampling frequency fs) in the broadband interpolation in (2) above, the same component as the original signal (hereinafter referred to as the original component) And the sum of the aliasing components according to the sampling phase is obtained. At this time, when the (2) wideband interpolation processing is performed on the signals of the three frame images, as shown in FIG. 8A, the phases of the original components (601), (602), and (603) of each frame are obtained. Are well-matched, and it is well known that the phases of the aliasing components (604), (605), and (606) rotate in accordance with the sampling phase difference of each frame. In order to facilitate understanding of the respective phase relationships, the phase relationship of the original components of each frame is shown in FIG. B, and the phase relationship of the folded components of each frame is shown in FIG.

Here, with respect to the signals of the three frame images, by appropriately selecting a coefficient to be multiplied and performing the above (3) weighted sum, the aliasing components (604), (605), (606) of each frame are performed. Can be removed by canceling each other, and only the original components can be extracted. At this time, the vector sum of the folded components (604), (605), and (606) of each frame is set to 0, that is, both the Re axis (real axis) component and the Im axis (imaginary axis) component are set to 0. In order to achieve this, at least three folding components are required. Therefore, by using the signals of at least three frame images, it is possible to realize the generation of a doubled fractional pixel image, that is, to remove one aliasing component.

As described above, in the first configuration example, it is possible to generate a high-precision decimal pixel using the image signals of three frames.

Next, a second configuration example of the decimal pixel precision image generation unit 111 is shown in FIG. In the second configuration example of the decimal pixel accuracy image generation unit 111, it is possible to generate a fractional pixel image that is twice as large as that in the one-dimensional direction by using signals of at least two frame images. Details will be described below.

First, a plurality of frames, that is, a frame to be encoded and a frame that has been encoded in the past, are input from the frame memory 109 to the input unit 400.

First, the position estimation unit 401 estimates the position of the corresponding pixel on the frame # 2 based on the sampling phase (sampling position) of the pixel to be processed on the frame # 1 input to the input unit 400, and the sampling position. The phase difference θ402 is obtained. Next, the up-compensators 403 and 404 of the motion compensation / up-rate unit 415 use the information of the phase difference θ 402 to perform motion compensation on the frame # 2 so as to align the position with the frame # 1, and the frame # 1 and the frame # The number of pixels of 2 is doubled to increase the density. The phase shift unit 416 shifts the phase of the densified data by a certain amount. Here, π / 2 phase shifters 406 and 408 can be used as means for shifting the data phase by a certain amount. Further, in order to compensate for the delay caused by the π / 2 phase shifters 406 and 408, the signals of the frame # 1 and the frame # 2 that have been densified by the delay units 405 and 407 are delayed. In the aliasing component removal unit 417, the coefficients C0, C2, C1, C3 generated by the coefficient determiner 409 based on the phase difference θ402 with respect to the output signals of the delay units 405, 407 and the Hilbert transformers 406, 408, respectively. Are multiplied by

multipliers

410, 411, 412, and 413, and these signals are added by an adder (414) to obtain an output. This output is output from the output unit 418.

Note that the position estimation unit 401 can be realized using the above-described conventional technique as it is. Details of the up-raters 403 and 404, the π / 2 phase shifters 406 and 408, and the aliasing component removing unit 417 will be described later.

FIG. 9 shows an operation in the second configuration example of the decimal pixel precision image generation unit 111. This figure shows the outputs of the delay units 405 and 407 and the π / 2 phase shifters 406 and 408 shown in FIG. 6 in a one-dimensional frequency domain. In FIG. A, the signals of frame # 1 and frame # 2 after the up-rate output from the delay units 405 and 407 are respectively the original components 701 and 702, and the aliasing component 705 that is aliased from the original sampling frequency (fs). The signal is obtained by adding 706. At this time, the folded component 706 is rotated in phase by the above-described phase difference θ402. On the other hand, the signals of frame # 1 and frame # 2 after the up-rate output from the π / 2 phase shifters 406 and 408 are the original components 703 and 704 after the π / 2 phase shift and the π / 2 phase shifted signal, respectively. To which the aliasing components 707 and 708 are added. In order to facilitate understanding of the phase relationship between the components shown in FIG. B, FIG. C, and FIG. A, the original component and the folded component are extracted and shown. Here, when the vector sum of the four components shown in Fig. B is taken, the Re-axis component is set to 1, the Im-axis component is set to 0, and the vector sum of the four components shown in Fig. C is calculated. When taking the values, determine the coefficients to be multiplied by each component so that both the Re-axis and Im-axis components are set to 0. If the weighted sum is taken, the aliasing components are canceled and canceled, and only the original components are extracted. can do. That is, using only two frame images, it is possible to generate a high-resolution image that is twice the one-dimensional direction. Details of this coefficient determination method will be described later.

The operation of the up-raters 403 and 404 used in the second configuration example of the decimal pixel accuracy image generation unit 111 will be described with reference to FIGS. In FIG. 10, the horizontal axis represents frequency, and the vertical axis represents gain (the value of the ratio of the output signal amplitude to the input signal amplitude), indicating the “frequency-gain” characteristics of the up-raters 403 and 404. Here, in the up-raters 403 and 404, a frequency (2fs) twice as high as the sampling frequency (fs) of the original signal is set as a new sampling frequency, and a new pixel is located at a position just in the middle of the original pixel interval. The number of pixels is doubled to increase the density by inserting a sanding point (= zero point), and a filter with a frequency between −fs and + fs all having a gain of 2.0 is applied. At this time, as shown in the figure, the frequency-gain characteristic is a characteristic that repeats every frequency that is an integral multiple of 2 fs due to the symmetry of the digital signal.

FIG. 11 shows filter tap coefficients obtained by inverse Fourier transform of the frequency characteristics shown in FIG. At this time, each tap coefficient Ck (where k is an integer) is a generally known sinc function, shifted by (−θ) to compensate for the sampling phase difference θ402, and Ck = 2sin (πk + θ) / (πk + θ) may be used. In the up-rate device 403, the phase difference θ402 is set to 0 and Ck = 2sin (πk) / (πk). Further, by expressing the phase difference θ (402) as a phase difference in integer pixel units (2π) + a phase difference in decimal pixel image units, the phase difference compensation in integer pixel units is realized by a simple pixel shift, For the compensation of the phase difference in units of decimal pixel images, the filters of the up-raters 403 and 404 may be used.

FIG. 12 shows the frequency-gain characteristics of the π / 2 phase shifters 406 and 408 used in the second configuration example of the decimal pixel image generation unit 111. As the π / 2 phase shifters 406 and 408, generally known Hilbert transformers can be used. In FIG. A, the horizontal axis represents frequency, and the vertical axis represents gain (the value of the ratio of the output signal amplitude to the input signal amplitude), indicating the “frequency-gain” characteristic of the Hilbert transformer. Here, in the Hilbert transformer, the frequency (2fs) that is twice the sampling frequency (fs) of the original signal is set as a new sampling frequency, and all frequency components except 0 between -fs and + fs are gained. A pass band of 1.0. In FIG. B, the horizontal axis represents frequency, and the vertical axis represents phase difference (difference in output signal phase with respect to input signal phase), indicating the “frequency-phase difference” characteristics of the Hilbert transformer. Here, the phase of the frequency component between 0 and fs is delayed by π / 2, and the phase of the frequency component between 0 and −fs is advanced by π / 2. At this time, as shown in the figure, due to the symmetry of the digital signal, the characteristic repeats every frequency that is an integral multiple of 2fs.

FIG. 13 shows filter tap coefficients obtained by inverse Fourier transform of the frequency characteristics shown in FIG. At this time, each tap coefficient Ck may be Ck = 0 when k = 2m (where m is an integer), and Ck = −2 / (πk) when k = 2m + 1.

Note that a differentiator may be used as the π / 2 phase shifters 406 and 408 used for generating the decimal pixel precision image data. In this case, if the general expression cos (ωt + α) representing a sine wave is differentiated by t and multiplied by 1 / ω, d (cos (ωt + α)) / dt * (1 / ω) =-sin (ωt + α) = cos (ωt + α + π / 2), and the function of π / 2 phase shift can be realized. In other words, after taking the difference between the value of the target pixel and the value of the adjacent pixel, a π / 2 phase shift function is realized by applying a filter with a frequency / amplitude characteristic of 1 / ω. May be.

The operation and specific example of the coefficient determiner (409) used in the second configuration example of the decimal pixel accuracy image generation unit 111 will be described with reference to FIG. As shown in FIG. 9A, when the vector sum of the four components shown in FIG. 9B is taken, the Re-axis component is set to 1, the Im-axis component is set to 0, and the four components shown in FIG. If the coefficient to multiply each component is determined so that both the Re-axis and Im-axis components are set to 0 when taking the vector sum of, using only two frame images, It is possible to realize an image signal processing apparatus that generates a doubled decimal pixel image. As shown in FIG. 6, the coefficient for the output of the delay unit (405) (the sum of the original component and the folded component of the frame # 1 after the up-rate) is C0, and the output of the π / 2 phase shifter 406 (after the up-rate) C1 is a coefficient with respect to the sum of the π / 2 phase shift results of the original component and the aliasing component of frame # 1, and a coefficient with respect to the output of delay device 407 (the sum of the original component and aliasing component of frame # 2 after the update) Is C2, and the coefficient for the output of the Hilbert transformer 406 (the sum of the π / 2 phase shift results of the original component and the aliasing component of the frame # 2 after the up-rate) is C3. From the phase relationships of the components shown in FIGS. 9B and 9C, the simultaneous equations shown in FIG. 14B can be obtained, and solving these results in the results shown in FIG. 14C. The coefficient determiner 409 may output the coefficients C0, C1, C2, and C3 obtained in this way. As an example, FIG. 14D shows values of the coefficients C0, C1, C2, and C3 when the phase difference θ402 is changed from 0 to 2π every π / 8. This corresponds to a case where the position of the signal of the original frame # 2 is estimated with an accuracy of 1/16 pixel and motion compensation is performed on the frame # 1.

Note that the up-raters 403 and 404 and the π / 2 phase shifters 406 and 407 require an infinite number of taps in order to obtain ideal characteristics. But there is no practical problem. At this time, a general window function (such as a Hanning window function or a Hamming window function) may be used. If the coefficient of each tap of the simplified Hilbert transformer is the value of the left and right points centered on C0, that is, C (-k) = -Ck (k is an integer), the phase can be shifted by a certain amount. it can.

As described above, the configuration of the decimal pixel accuracy image generation unit 111 in FIG. 1 is the configuration described in FIGS. 6 to 14, thereby generating high-precision decimal pixel accuracy image data from a plurality of frames. It becomes possible.

In particular, according to the second configuration of the fractional pixel accuracy image generation unit 111, it is possible to generate one piece of high precision fractional pixel accuracy image data from two frames, which is smaller than in the first configuration example. Can be encoded with the amount of memory.

Next, a description will be given of the decimal pixel accuracy image generation processing when the intra prediction frame is used as a reference frame. Since the past frame cannot be referred to in the intra prediction frame, the block closest to the encoding target block is searched for the motion vector in the screen as shown in FIG. At this time, the search block size may be one pixel instead of block. The decimal pixel generation process after calculating the motion vector is the same as the method performed between the frames.

In the above-described decimal pixel accuracy image generation processing, a decimal pixel image cannot be generated when the position indicated by the motion vector has integer pixel accuracy. Therefore, based on the result of the decimal pixel accuracy image generation method determination unit 110, it is determined whether or not the motion amount has integer pixel accuracy. When the motion detection position is an integer pixel accuracy position, a decimal pixel accuracy image is generated by filter interpolation used in the conventional coding standard as the second method. At this time, it is possible to reproduce the decimal pixel precision image on the decoding side by adding to the stream information indicating whether to use the conventional second method or whether to use the first method. . The processing unit for switching by either method may be a pixel block unit or a frame unit. In that case, information on whether to encode in units of pixel blocks or in units of frames may be added to the stream.

According to the image coding apparatus and the image coding method described above, since a reference image with higher precision and decimal pixel accuracy can be generated, motion prediction accuracy is improved and video can be efficiently generated with a smaller amount of data. It becomes possible to compress the signal.

Also, the difference between the original image and the predicted image is determined, and if there is no difference, the frequency conversion process, the quantization process, the inverse quantization process, the inverse frequency conversion process, the motion detection process, and the motion compensation process are omitted. It is possible to reduce the processing amount on the production side.

In addition, in the moving picture coding standard that can refer to a plurality of frames, a decimal pixel image can be generated from a larger amount of image data, so that a higher precision decimal pixel image can be generated and motion prediction can be performed. The accuracy is improved and encoding with a small amount of data becomes possible.

<< Image decoding apparatus and image decoding method >>
FIG. 17 illustrates a block diagram of the image decoding apparatus according to the present invention. In the image decoding apparatus, reference numeral 1501 denotes a variable length decoding unit that decodes encoded data sent from the encoding side. A syntax analysis unit 1502 analyzes the syntax of the data decoded by the variable length decoding unit 1501, and separates the encoded data into encoded image data and additional information. Reference numeral 1503 denotes an inverse quantization unit that inversely quantizes data sent from the syntax analysis unit 1502. Reference numeral 1504 denotes an inverse frequency conversion unit that generates motion prediction error data by performing inverse frequency conversion on the data inversely quantized by the inverse quantization unit 1530. Reference numeral 1505 denotes an adder that generates image data by adding the motion prediction error subjected to inverse frequency conversion by the inverse frequency conversion unit 1504 and the motion prediction image data stored in the frame memory 1510. Reference numeral 1506 denotes a frame memory for storing reproduced image data obtained by addition by the adder 1505. Reference numeral 1507 denotes a decimal pixel accuracy image generation method determination unit, and 1508 denotes a decimal pixel accuracy image generation unit.

Here, the operation of the decimal pixel accuracy image generation method determination unit 1507 is the same as the operation of the decimal pixel accuracy image generation method determination unit 110 of the image encoding device shown in FIG. That is, the decimal pixel accuracy image generation method determination unit 1507 determines the decimal pixel accuracy image generation method using the frame selection unit 1520, the motion detection unit 1521, and the determination unit 1522. The frame selection unit 1520 reads the reproduction image data of a predetermined frame from the frame memory 1506 according to a predetermined procedure. The motion detection unit 1521 performs motion detection on the reproduced image data of the two frames read by the frame selection unit 1520. The determination unit 1522 selects a decimal pixel accuracy image generation method according to the motion detection result and the like.

Here, the details of the operation of the decimal pixel accuracy image generation method determination unit 1507 are the same as the operation of the decimal pixel accuracy image generation method determination unit 110 of the image encoding device shown in FIG. That is, for example, according to the flowchart of the determination process shown in FIG. 3, the decimal pixel precision image generation method of either the first method or the second method shown in FIG. 2 is determined. The details of the operation are the same as the description of the decimal pixel precision image generation method determination unit 110 of the image encoding device shown in FIG.

The operation of the decimal pixel accuracy image generation unit 1508 is the same as that of the decimal pixel accuracy image generation unit 111 of the image encoding device shown in FIG.

As described above, the decimal pixel accuracy image generation method determination unit 1507 and the decimal pixel accuracy image generation unit 1508 are the decimal pixel accuracy image generation method determination unit 110 and the decimal pixel accuracy image generation unit 111 of the image encoding device illustrated in FIG. By generating a decimal pixel precision image by the same processing as in FIG. 17, the image decoding apparatus shown in FIG. 17 can generate a highly accurate decoded image assumed by the image encoding apparatus.

Next, 1509 is a motion compensation unit that generates a decoded image from the motion vector sent from the syntax analysis unit 1502 and the image data generated by the decimal pixel precision image generation unit 1508. Reference numeral 1510 denotes a frame memory that stores the decoded image data generated by the motion compensation unit 1509. Reference numeral 1511 denotes a video display device that reads out and outputs decoded data stored in the frame memory 1510.

The image decoding apparatus in FIG. 17 can decode the stream encoded by the image encoding apparatus in FIG. A detailed image decoding method in the image decoding apparatus will be described below. FIG. 18 is a flowchart showing the entire image decoding process.

18, first, the data encoded on the encoding side is decoded by the variable length decoding unit 1501 (1601). Next, the syntax analysis unit 1502 classifies the data decoded by the variable length decoding unit 1501 (1602). Here, the structure of the encoded stream recorded in the image decoding apparatus will be described with reference to FIG. The encoded stream shown in FIG. 19 is encoded by, for example, an image encoding device. In FIG. 19, a data area 1701 stores, for example, a determination flag indicating whether or not there is a difference. Further, for example, a determination flag (1707) on whether or not to perform motion detection, a motion information vector information (1708) generated by the decimal pixel precision image generation unit 1508, and an integer pixel position determination are performed on the data area 1702. A flag (1709) to be executed is stored. The data area 1703 stores quantization parameters, quantization steps, coefficients multiplied by these, or matrix number information used in the encoding process. The data area 1704 stores the frequency conversion type and block size. In the data area 1705, information on the type of the resolution enhancement method generated by the decimal pixel image generation unit 111, and in the data area 1706, coefficients after frequency conversion and quantization of the difference image between the original image and the predicted image are stored. Stored.

Next, the type of data in each data area of the encoded stream shown in FIG. 19 is discriminated, and each flag and each data information are dequantized unit 1503, inverse frequency transform unit 1504, motion compensation unit 1509, motion detection, respectively. Is sent to the respective processing units of the unit 1507, the frame memory 1510, and the decimal pixel accuracy image generation unit 1508.

Next, the inverse quantization unit 1503 performs an inverse quantization process using the data sent from the syntax analysis unit 1502 (1603). Here, when the encoded stream is an encoded stream encoded by the image encoding device in FIG. 1, the inverse quantization process in the inverse quantization unit 1503 is the inverse process to the process in the quantization unit 104 in FIG. (1604). This is the same processing as the processing of the inverse quantization unit 106 in FIG. 1 and may be the inverse quantization technology used in the conventional decoding technology, or it is multiplied by the quantization step stored in the data area 1703. May be.

Next, the inverse frequency transform unit 1504 performs an inverse frequency transform process on the data inversely quantized by the inverse quantizer 1503 (1604). At this time, the inverse frequency transform unit 1604 performs the inverse frequency transform process using the frequency transform type and frequency transform block size information sent from the syntax analysis unit 1502. The inverse frequency conversion process may use a technique in a conventional image decoding technique. Next, motion detection is performed using data that has been subjected to reverse frequency conversion by the reverse frequency conversion unit 1504 by the adder 1505 and data that has been stored in the frame memory 1509 (1605). The determined motion search method may be acquired from the syntax analysis unit 1502.

Next, a decimal pixel image is generated using the image data stored in the frame memory 1506 and the motion vector acquired from the syntax analysis unit 1502 (1606). At this time, the decimal pixel accuracy image generation unit 1508 generates a decimal pixel accuracy image using the motion vector and a plurality of image data, as in the case of the decimal pixel image generation unit 111 of FIG. The content of the decimal pixel accuracy image generation processing is the same as the content described for the decimal pixel accuracy image generation unit 111 in FIG.

17, the decimal pixel accuracy image generation unit 1508 performs high-resolution processing using a motion vector, a plurality of images, and their aliasing distortions, as described for the decimal pixel accuracy image generation unit 111 in FIG. 1. As a result, it is possible to increase the resolution of the decimal pixel precision image data referred to by the motion compensation unit 1509.

If there is no difference sent from the syntax analysis unit 1502, the process ends without performing the inverse quantization, inverse frequency conversion, motion detection, decimal pixel image generation, and motion compensation processing. In addition, in the decimal pixel precision image generation method determination processing 1605 in FIG. 18, when the detection result is determined to be an integer pixel position, the pixel filter used in the conventional moving image encoding method on the encoding side in the syntax analysis unit When it is determined that the decimal pixel image is generated by the interpolation, the decimal pixel image is generated by the pixel filter interpolation used in the conventional moving image encoding method.

Next, motion compensation is performed based on the image data generated by the decimal pixel precision image generation unit 1508 and the motion vector information sent from the syntax analysis unit 1502 (1607). In the case of an intra-screen prediction image, the intra-screen prediction unit 1511 generates a prediction image and stores the data in the frame memory 1506. The decoded image is output to a video display device 1511 such as a TV, a PC monitor, or a projector, for example.

According to the image decoding device and the image decoding method described above, it is possible to restore the resolution of the decimal pixel precision image data that is referred to in the motion search process in the image recovery device and the image decoding method. Therefore, a higher-definition decoded image can be generated.

Further, according to the image reconstruction device and the image decoding method described above, frequency conversion processing, quantization processing, inverse quantization processing, and inverse frequency conversion processing are performed based on difference information between the original image and the predicted image stored in the encoded stream. Thus, the motion detection process and the motion compensation process can be omitted, and the data processing amount on the decoding side can be reduced.

The encoding processing by the image encoding device (image encoding device) and the decoding processing by the image decoding device (image decoding device) are performed using a computer device. Provide a program that controls the above processing on a recording medium (hard disk, optical disk, magneto-optical disk, etc.) or via a transmission line or appropriate network, such as a PC (Personal Computer) or EWS (Engineering Work Station) By making it executable by a computer device, the image encoding process and the image decoding process can be easily performed.

Although the invention made by the present inventor has been specifically described based on the embodiments, the present invention is not limited thereto, and it goes without saying that various modifications can be made without departing from the scope of the invention.

The image encoding device, image encoding method, image decoding device, and image decoding method of the present invention can be used for image encoding processing, image decoding processing, and the like based on various standards. Further, the super-resolution processing used in the first method and the third method, and the filter interpolation calculation processing used in the second method and the fourth method can be appropriately changed.

Claims

An image encoding device capable of encoding the difference between original image data and motion predicted image data and generating the motion predicted image data based on local decoded image data obtained by decoding the encoded data. And
A frame memory for holding the local decoded image data for a plurality of frames;
A fractional pixel image data processing unit that generates fractional pixel precision image data using local decoded image data of a frame to be encoded and local decoded image data of a previous frame held by the frame memory. And
An image encoding device that generates motion prediction image data by performing motion detection using the decimal pixel accuracy image data generated by the decimal pixel image data processing unit as reference image data.
The decimal pixel image data processing unit is configured such that the amount of motion between the local decoded image data of the encoding target frame stored in the frame memory and the local decoded image data of the previous frame in a predetermined range before that is a decimal pixel accuracy. When the determination result of the decimal pixel accuracy is obtained, the decimal pixel accuracy image data is obtained using the local decode image data of the previous frame and the local decode image data of the encoding target frame. The image encoding device according to claim 1, wherein when the decimal pixel accuracy determination result is not obtained, the decimal pixel accuracy image data is generated by an interpolation operation on the local decoded image data of the encoding target frame.
The image encoding device according to claim 2, wherein the decimal pixel image data processing unit limits an object of determination as to whether or not the decimal pixel accuracy is a frame form of an I picture or a P picture.
The decimal pixel image data processing unit sequentially determines the frame form and the decimal pixel accuracy from the previous frame closest to the encoding target frame with respect to the previous frame in the predetermined range. Item 4. The image encoding device according to Item 3.
The decimal pixel image generation unit performs a phase shift process on each of a plurality of image signals of image data to generate a plurality of new image signals, and the plurality of image signals and a plurality of new image signals The image encoding apparatus according to claim 1, wherein decimal pixel-accurate image data is generated by multiplying and synthesizing by multiplying by a coefficient.
A read process for reading motion prediction image data from the prediction image memory;
Difference processing for calculating the difference between the read motion prediction image data and input image data as prediction error data;
A frequency conversion process for frequency conversion of the prediction error data calculated in the difference process;
A quantization process for quantizing the frequency-converted data in the frequency conversion process;
Variable-length encoding processing for variable-length encoding the data quantized by the quantization processing and generating an encoded stream; and
An inverse quantization process for inversely quantizing the data quantized by the quantization process;
Inverse frequency transform processing for reproducing the prediction error data by performing inverse frequency transform on the data inversely quantized in the inverse quantization process;
An addition process of adding the prediction error data reproduced by the inverse frequency conversion process and the motion prediction image data to output local decoded image data;
A process of storing locally decoded image data calculated in the addition process in a frame memory;
Decimal pixel image data processing for generating decimal pixel precision image data using local decoded image data of a frame to be encoded and local decoded image data of a frame before that held by the frame memory;
Motion detection / motion compensation processing for generating a prediction image by performing motion detection using the decimal pixel accuracy image data generated by the decimal pixel image data processing as reference image data;
And a writing process for writing motion prediction image data generated by the motion detection / compensation process into the prediction image memory.
Encoding data encoded by motion prediction is input and separated into encoded image data and additional information, and motion prediction image data is added to a motion prediction error obtained by decoding the separated encoded image data. An image decoding apparatus capable of reproducing image data and generating the motion prediction image data based on the reproduced image data reproduced,
A frame memory for holding the reproduced image data for a plurality of frames;
A fractional pixel image data processing unit that generates fractional pixel precision image data using the reproduced image data of the decoding target frame and the reproduced image data of the previous frame held by the frame memory;
An image decoding device that generates motion prediction image data by performing motion detection using the decimal pixel accuracy image data generated by the decimal pixel image data processing unit as reference image data.
The decimal pixel image data processing unit determines whether or not the amount of motion between the reproduced image data of the decoding target frame stored in the frame memory and the reproduced image data of the previous frame in the predetermined range before that is decimal pixel accuracy. When the determination result of decimal pixel accuracy is obtained, the decimal pixel accuracy image data is generated using the reproduction image data of the previous frame and the reproduction image data of the decoding target frame, and the decimal pixel accuracy is obtained. The image decoding apparatus according to claim 7, wherein when the determination result is not obtained, the decimal pixel precision image data is generated by an interpolation operation on the reproduction image data of the decoding target frame.
The image decoding device according to claim 8, wherein the decimal pixel image data processing unit limits a target of determination as to whether or not the decimal pixel accuracy is a frame form of an I picture or a P picture.
The decimal pixel image data processing unit sequentially determines the frame form and the decimal pixel accuracy from the previous frame closest to the decoding target frame with respect to the previous frame in the predetermined range. 9. The image decoding device according to 9.
The decimal pixel image generation unit performs a phase shift process on each of a plurality of image signals of image data to generate a plurality of new image signals, and the plurality of image signals and a plurality of new image signals The image decoding device according to claim 7, wherein decimal pixel-accurate image data is generated by multiplying and synthesizing by multiplying by a coefficient.
A variable-length decoding process in which an encoded stream including encoded data encoded by motion prediction is input and variable-length decoding is performed;
A parsing process for separating the variable length decoded encoded data into encoded image data and additional information;
An inverse quantization process for inversely quantizing the separated encoded image data;
An inverse frequency transform process for reproducing a motion prediction error by performing an inverse frequency transform on the data quantized by the inverse quantization process;
An addition process for adding the motion prediction image data of the prediction image memory to the motion prediction error reproduced by the inverse frequency conversion process to reproduce the image data;
Reproduction image data writing processing for writing reproduction image data reproduced by the addition processing to a frame memory;
Decimal pixel image data processing for generating decimal pixel accuracy image data using the reproduction image data of the decoding target frame and the reproduction image data of the frame preceding it held by the frame memory;
Motion compensation processing for generating motion prediction image data using the decimal pixel accuracy image data generated by the decimal pixel image data processing and the additional information;
And a writing process for writing motion prediction image data generated by the motion compensation process into the prediction image memory.
13. The image decoding method according to claim 12, wherein the decimal pixel image data processing limits an object of determination as to whether or not the decimal pixel accuracy is an I picture or P picture frame form.
14. The decimal pixel image data processing sequentially determines the frame form and the decimal pixel accuracy from the previous frame closest to the decoding target frame with respect to the previous frame in the predetermined range. The image decoding method as described.
The decimal pixel image data processing generates a plurality of new image signals by performing phase shift processing on each of the plurality of image signals of the image data, and the plurality of image signals and the new plurality of image signals The image decoding method according to claim 12, wherein decimal pixel-accurate image data is generated by multiplying and synthesizing by multiplying by a coefficient.