WO2011068332A2

WO2011068332A2 - Spatial prediction apparatus and predicting method thereof, image encoding device and method using same, and image decoding device and method using same

Info

Publication number: WO2011068332A2
Application number: PCT/KR2010/008389
Authority: WO
Inventors: 김수년; 임정연; 최재훈; 이규민; 정제창
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2009-12-04
Filing date: 2010-11-25
Publication date: 2011-06-09
Also published as: WO2011068332A3; KR20110062748A; KR101601854B1

Abstract

The present invention relates to a spatial prediction apparatus and a predicting method thereof, an image encoding device and method using same, and an image decoding device and method using same. The image encoding device according to embodiments of the present invention comprises: a spatial prediction execution unit which predicts a target block using a directional intra prediction mode and a template matching mode and selects the most inexpensive mode based on rate distortion; and an integer conversion execution unit for executing integer conversion for the residual signals of the images predicted through the template matching mode when the template matching mode is selected by the spatial prediction execution unit. The present invention can enhance the accuracy and efficiency of the intra prediction and can minimize an increase of overhead in the video encoding.

Description

Spatial prediction apparatus and prediction method thereof, image encoding apparatus and method using same, and image decoding apparatus and method

An embodiment of the present invention relates to a spatial prediction apparatus and a prediction method thereof, an image encoding apparatus and method using the same, and an image decoding apparatus and method. More specifically, by using a template matching method in addition to the directional intra prediction mode in the prediction within the same frame for the video, the spatial prediction device that can increase the prediction efficiency and accuracy while minimizing the increase in the overhead thereof, and its A prediction method, an image encoding apparatus and method using the same, and an image decoding apparatus and method.

As information and communication technology including the Internet is developed, not only text and voice but also video communication are increasing. Conventional text-based communication methods are not enough to satisfy various needs of consumers, and accordingly, multimedia services that can accommodate various types of information such as text, video, and music are increasing. The multimedia data has a huge amount and requires a large storage medium and a wide bandwidth in transmission. Therefore, in order to transmit multimedia data including text, video, and audio, it is essential to use a compression coding technique.

The basic principle of compressing data is to eliminate redundancy in the data. Spatial overlap, such as the same color or object repeating in an image, temporal overlap, such as when there is almost no change in adjacent frames in a movie frame, or the same note over and over in audio, or high frequency of human vision and perception Data can be compressed by removing the psychological duplication taking into account the insensitive to.

As such a video compression method, interest in H.264 / AVC, which has further improved compression efficiency compared to MPEG-4 (Moving Picture Experts Group-4), has recently increased.

H.264 is a digital video codec standard that has a very high data compression ratio, also called MPEG-4 Part 10 or Advanced Video Coding (AVC). This standard is based on the Video Coding Experts Group (VCEG) of the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) and the International Standardization Organization / International Electrotechnical Commission (ISO / IEC). This is the result of MPEG jointly forming and standardizing a Joint Video Team.

Various methods have been proposed to improve compression efficiency in compression encoding, and typical methods include a method using temporal prediction and a method using spatial prediction.

As shown in FIG. 1, the temporal prediction is performed by referring to a reference block 122 of another temporal frame 120 that is adjacent in time when predicting the current block 112 of the current frame 110. to be. That is, in inter-prediction of the current block 112 of the current frame 110, the adjacent reference frame 120 is searched for in time, and the reference block (the most similar to the current block 112 in the reference frame 120) 122). Here, the reference block 122 is a block that can best predict the current block 112, and the block having the smallest sum of absolute difference (SAD) with the current block 112 may be the reference block 122. . The reference block 122 becomes a prediction block of the current block 112, and generates a residual block by subtracting the reference block 122 from the current block 112. The generated residual block is encoded and inserted into the bitstream. In this case, the relative difference between the position of the current block 112 in the current frame 110 and the position of the reference block 122 in the reference frame 120 is called a motion vector 130, and the motion vector 130 is also a residual block. Is encoded as follows. Temporal prediction is also referred to as inter prediction or inter prediction.

Spatial prediction is to obtain the prediction pixel value of the target block by using the reconstructed pixel value of the reference block adjacent to the target block in one frame, and directional intra-prediction (hereinafter referred to simply as intra prediction) It is also called intra prediction. H. 264 specifies encoding / decoding using intra prediction.

Intra prediction is a method of predicting values of a current subblock by copying in a predetermined direction by using adjacent pixels in up and left directions for one sub-block, and encoding only the difference. In the intra prediction technique according to the H. 264 standard, the prediction block for the current block is generated based on another block having the previous coding order. A value obtained by subtracting the current block and the prediction block is coded. The video encoder according to H. 264 selects, for each block, a prediction mode in which the difference between the current block and the prediction block is minimal among the prediction modes.

Intra prediction according to the H. 264 standard is illustrated in FIG. 2 in consideration of the position of adjacent pixels and the direction of the prediction used to generate predicted pixel values of 4 x 4 luma blocks and 8 x 8 luma blocks. Nine prediction modes as defined. The nine prediction modes are vertical prediction mode (prediction mode 0), horizontal prediction mode (prediction mode 1), DC prediction mode (prediction mode 2), Diagonal_Down_Left prediction mode (prediction mode 3), Diagontal_Down_Right prediction mode (depending on the prediction direction). Prediction mode 4), Vertical_Right prediction mode (prediction mode 5), Horizontal_Down prediction mode (prediction mode 6), Vertical_Left prediction mode (prediction mode 7), and Horizontal_Up prediction mode (prediction mode 8). Here, the DC prediction mode uses an average value of eight adjacent pixels.

In addition, four prediction modes are used for intra prediction processing for a 16 × 16 luma block, a vertical prediction mode (prediction mode 0), a horizontal prediction mode (prediction mode 1), a DC prediction mode (prediction mode 2), and a plane prediction mode. (Prediction mode 3) is that. The same four prediction modes are also used for intra prediction processing on 8 x 8 chroma blocks.

FIG. 3 shows an example of labeling for explaining the nine prediction modes of FIG. 2. In this case, a prediction block (region including a to p) for the current block is generated using the samples A to M that are decoded in advance. If E, F, G, and H cannot be decoded in advance, E, F, G, and H can be virtually generated by copying D to their positions.

FIG. 4 is a diagram for describing nine prediction modes of FIG. 2 using FIG. 3. Referring to the figure, in the prediction mode 0, the prediction block predicts the pixel value with the same pixel value for each vertical line. That is, the pixels of the prediction block predict the pixel value from the nearest pixels of the reference block located above the prediction block, and the reconstructed pixel values of the adjacent pixel A are converted into the first column pixels a, pixel e, pixel i and Set to the predicted pixel value for pixel m. Further, in the same way, second column pixel b, pixel f, pixel j and pixel n are predicted from the reconstructed pixel values of adjacent pixel B, and third column pixel c, pixel g, pixel k and pixel o are Predicted from the reconstructed pixel values, fourth column pixel d, pixel h, pixel l and pixel p predicts from the reconstructed pixel values of adjacent pixel D. As a result, as shown in Fig. 5A, a prediction block is generated in which the prediction pixel values of each column are the pixel values of pixel A, pixel B, pixel C and pixel D.

In addition, in the prediction mode 1, the prediction block predicts the pixel value with the same pixel value for each horizontal line. That is, the pixels of the prediction block predict the pixel value from the nearest pixels of the reference block located to the left of the prediction block, and the reconstructed pixel value of the adjacent pixel I is determined by the first row of pixels a, pixel b, pixel c and Set to the predicted pixel value for pixel d. Also, in the same way, the second row pixels e, pixel f, pixel g and pixel h are predicted from the reconstructed pixel values of adjacent pixel J, and the third row pixel i, pixel j, pixel k and pixel l are Predicted from the reconstructed pixel values, the fourth row pixel m, pixel n, pixel o and pixel p predicts from the reconstructed pixel values of adjacent pixel D. As a result, as shown in Fig. 5B, a prediction block is generated in which the prediction pixel values of each row are the pixel values of pixel I, pixel J, pixel K, and pixel L.

Also, in prediction mode 2, the pixels of the prediction block are equally replaced by the average of the pixel values of the upper pixels A, B, C and D and the left pixels I, J, K and L.

On the other hand, the pixels of the prediction block in the prediction mode 3 are interpolated in the lower left direction at a 45 ° angle between the lower-left and the upper-right, and the prediction in the prediction mode 4 The pixels of the block are extrapolated in the lower right direction at a 45 ° angle. In addition, the pixels of the prediction block in the prediction mode 5 are extrapolated in the lower right direction at an angle of about 26.6 degrees (width / height = 1/2) from the vertical. In addition, the pixels of the prediction block in the prediction mode 6 are extrapolated in the lower right direction at an angle of about 26.6 ° horizontally, and the pixels of the prediction block in the prediction mode 7 are in the lower left direction at about 26.6 ° angle from the vertical Extrapolated, the pixels of the predictive block in the case of the prediction mode 8 are interpolated in an upward direction of about 26.6 degrees from the horizontal.

In prediction mode 3 to 8, the pixels of the prediction block may be generated from a weighted average of pixels A to M of the reference block to be decoded in advance. For example, in the prediction mode 4, the pixel d located at the top right of the prediction block may be estimated as in Equation 1. Here, round () is a function that rounds to integer places.

[Equation 1]

Meanwhile, as described above, the 16 × 16 prediction model for the luminance component includes four modes of prediction mode 0, prediction mode 1, prediction mode 2, and prediction mode 3.

In prediction mode 0, the pixels of the prediction block are extrapolated from the upper pixels, and in prediction mode 1, the pixels are extrapolated from the left pixels. In addition, in the prediction mode 2, the pixels of the prediction block are calculated as an average of upper pixels and left pixels. Finally, for prediction mode 3, a linear "plane" function is used that fits the upper and left pixels. This mode is more suitable for areas where the luminance changes smoothly.

As described above, in the H.264 standard, in each prediction mode except for the DC mode, the pixel value of the prediction block is generated according to the direction corresponding to each mode based on the adjacent pixels of the prediction block to be currently encoded.

However, in most cases, the current directional mode may be sufficient. However, since each prediction mode is limited depending on the image, the encoding efficiency may be poor, and thus the pixel value of the prediction block may not be accurately predicted. In this case, the gain of entropy coding cannot be properly seen due to incorrect intra prediction, which causes a problem that the bit rate is unnecessarily increased.

One embodiment of the present invention is to solve the above-described problem, by using a template matching method in addition to the directional intra prediction mode in the prediction within the same frame for the video, thereby increasing the prediction efficiency and accuracy, An object of the present invention is to provide a spatial prediction apparatus and a prediction method thereof, an image encoding apparatus and method using the same, and an image decoding apparatus and method capable of minimizing the increase.

In order to achieve the above object, an image encoding apparatus according to an embodiment of the present invention performs prediction on a target block using a template matching mode together with a directional intra prediction mode, among which A spatial prediction execution unit which selects a mode having the lowest cost based on distortion (distortion); And an integer conversion execution unit that performs integer conversion on the residual signal of the image predicted by the template matching mode when the template matching mode is selected by the spatial prediction execution unit.

Here, the integer conversion execution unit may perform integer conversion as in the following equation.

here,

Is the formula of the forward integer conversion used in H. 264, and the values of a, b, and d are respectively

Is the value of.

In addition, the spatial prediction execution unit may select a low cost mode by the following equation.

C = E + λB

Where C is the cost, E is the difference between the reconstructed signal and the original signal when decoding the coded bits, B is the amount of bits required for each coding, and λ is the Lagrangian coefficient, which reflects the reflection ratio of E and B. Represents an adjustable coefficient.

The image encoding apparatus may further include an MDDT execution unit that executes a Mode Dependent Directional Transform (MDDT) in consideration of the directionality when any one of nine modes of the directional intra prediction mode is selected by the spatial prediction execution unit. .

In this case, it is preferable that the MDDT execution unit transforms the residual signal of the predicted image according to a transform function corresponding to a selected mode among the preset transform functions corresponding to the directional intra prediction mode.

In order to achieve the above object, a spatial prediction apparatus according to an embodiment of the present invention, the intra prediction execution unit for performing the prediction for the target block using the directional intra prediction mode; A template prediction execution unit which executes the prediction on the target block using the template matching mode; A mode selection unit for selecting a mode having a lowest cost based on rate-distortion among a prediction mode executed by the intra prediction execution unit and a template matching mode executed by the template prediction execution unit; And a residual signal calculator configured to calculate a residual signal between the prediction block and the target block according to the selected mode.

In addition, in order to achieve the above object, an image decoding apparatus according to an embodiment of the present invention, a mode type determination unit for determining the mode type for the current block with respect to the bitstream encoded and input by spatial predictive encoding; If the mode type determination unit determines that the mode type of the current block is the template matching mode, the template matching is performed by dividing the current block into units of N × N blocks and performing template matching on each of the divided N × N blocks. part; And an inverse integer transform execution unit that performs inverse integer transform on the residual signal between the prediction block and the target block by the template matching.

Here, when it is determined by the mode type determination unit that the mode type of the current block is the directional intra prediction mode, the video decoding apparatus further includes an inverse MDDT execution unit that executes the inverse MDDT in consideration of the directionality.

In addition, the template matching execution unit may divide the current block into 2 x 2 block units and then perform template matching on each 2 x 2 block.

In order to achieve the above object, an image encoding method according to an embodiment of the present invention includes: performing prediction on a target block using a template matching mode together with a directional intra prediction mode; Selecting a mode having the lowest cost among the modes executed by the predictive execution step; Calculating a residual signal between the prediction block and the target block generated by the mode selected by the selecting step; And perform integer conversion on the residual signal calculated by the calculation step when the mode selected by the selection step is the template matching mode, and calculate the residual signal calculated by the calculation step when the mode selected by the selection step is the directional prediction mode. It characterized in that it comprises the step of executing the MDDT for.

Preferably, the image encoding method may further include selecting a transform function corresponding to the prediction mode among preset transformation functions when the mode selected by the selecting step is a directional prediction mode. In this case, the MDDT execution step preferably executes the MDDT according to the selected conversion function.

In order to achieve the above object, a spatial prediction method according to an embodiment of the present invention, performing the prediction for the target block using a template matching mode with a directional intra prediction mode; Selecting a mode having the lowest cost among the modes executed by the predictive execution step; And calculating a residual signal between the prediction block and the target block generated by the mode selected by the selecting step.

In order to achieve the above object, an image decoding method according to an embodiment of the present invention comprises the steps of: determining a mode type of a current block from a bitstream encoded and input by spatial predictive encoding; If it is determined that the mode type of the current block is a template matching mode, dividing the current block into units of N × N blocks and performing template matching on each of the divided N × N blocks; And performing inverse integer transform on the residual signal between the prediction block and the target block by template matching.

Preferably, the image decoding method may further include executing the inverse MDDT in consideration of the directionality if it is determined that the mode type of the current block is the directional intra prediction mode.

As described above, according to the embodiment of the present invention, compared to the H.264 standard, it is possible to increase the accuracy of intra prediction while reducing the bit rate without significantly increasing the overhead of the bitstream generator.

1 is a diagram illustrating a general inter prediction.

2 is a diagram illustrating directionality of the intra prediction mode.

FIG. 3 is a diagram illustrating an example of labeling for explaining an intra prediction mode of FIG. 2.

FIG. 4 is a diagram illustrating each of the intra prediction modes of FIG. 2.

FIG. 5A is a diagram illustrating the prediction mode 0 of the intra prediction modes of FIG. 2, and FIG. 5B is a diagram illustrating the prediction mode 1 of the intra prediction modes of FIG. 2. to be.

6 is a diagram schematically illustrating an image encoding apparatus according to an embodiment of the present invention.

7 is a diagram illustrating template matching used in an embodiment of the present invention.

8 is a diagram showing an example of the structure of a macroblock composed of four 8x8 partitions.

9 is a diagram showing an example of the structure of a macroblock consisting of 16 4x4 partitions.

10 is a diagram illustrating a zigzag scan for transform coefficients of a 4x4 partition.

11 is a flowchart illustrating a spatial prediction method according to an embodiment of the present invention.

12 is a flowchart illustrating a video encoding method according to another embodiment of the present invention.

FIG. 13 is a diagram illustrating an example of a structure of a bitstream generated by the video encoding apparatus of FIG. 6.

14 is a diagram illustrating an image decoding apparatus according to an embodiment of the present invention.

FIG. 15 is a flowchart illustrating an image decoding method by the image decoding apparatus of FIG. 14.

Hereinafter, some embodiments of the present invention will be described in detail through exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same reference numerals are assigned to the same components as much as possible even though they are shown in different drawings. In addition, in describing the present invention, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

In addition, in describing the component of this invention, terms, such as 1st, 2nd, A, B, (a), (b), can be used. These terms are only for distinguishing the components from other components, and the nature, order or order of the components are not limited by the terms. If a component is described as being "connected", "coupled" or "connected" to another component, that component may be directly connected or connected to that other component, but between components It will be understood that may be "connected", "coupled" or "connected".

6 is a diagram schematically illustrating an image encoding apparatus according to an embodiment of the present invention. Referring to the drawings, the image encoding apparatus 600 includes a spatial prediction execution unit 610, an integer transform execution unit 620, and a Mode Dependent Directional Transform (MDDT) 630. Here, the image encoding apparatus may further include a difference calculator, a quantizer, an inverse quantizer, a motion estimator, a motion compensator, etc. in addition to the illustrated components, but the components that are not directly related to an embodiment of the present invention Omitted to simplify the description.

The spatial prediction execution unit 610 executes the prediction for the target block using the template matching mode together with the directional intra prediction mode in the same frame, based on rate-distortion. Choose the lowest cost mode. Here, the spatial prediction execution unit 610 may be implemented as one component in the image encoding apparatus 600, but as illustrated, the intra prediction execution unit 612, the template prediction execution unit 614, and the mode selection unit may be implemented. 616 and the residual signal calculator 618 may be configured.

The intra prediction execution unit 612 performs the prediction on the target block by using the directional intra prediction mode. That is, the intra prediction execution unit 612 predicts pixel values according to each prediction mode from neighboring pixels of the target block in the same frame as shown in FIG. 4.

In addition, the template prediction execution unit 614 executes the prediction on the target block using the template matching mode.

The pixel value in the prediction frame for pixel p in the current frame can be determined by comparing the value N (p) of neighboring pixels in the current frame. Here, the value N (p) of the neighboring pixels to be compared is referred to as a template of the pixel p.

7 illustrates a search region adjacent to a 4 × 4 target block for which pixel values are to be predicted. As shown, the search area 700 is composed of the width of the x pixels and the height of the y pixels among the pixels that are first reproduced, but the portion that is not reproduced is excluded as shown in the lower right.

The 4 × 4 target block 710 is further divided into 2 × 2 target subblocks 720, and template matching is performed in units of each target subblock. In this case, for the target subblock 720, the pixels in the same frame and adjacent to the target subblock 720 become the template 730.

Template matching calculates the SAD between the corresponding pixels among the group of pixels having the same shape as the template 730 (inverted L-shape in the drawing) in the search area 700, and selects the area with the smallest SAD as the candidate neighbor. An area 740 is assumed. In this case, the candidate subblock 750 in contact with the candidate neighboring region 740 is determined as a texture signal for the target subblock 720. Template matching has been described using 4 x 4 blocks as an example to facilitate explanation, but is not limited thereto, and template matching is possible for various blocks.

The mode selector 616 selects a mode with the lowest cost based on rate-distortion among the prediction mode executed by the intra prediction execution unit 612 and the template matching mode executed by the template prediction execution unit 614. do. That is, if the target block is 4 x 4, the mode selector 616 may include nine directional intra prediction modes and the template prediction execution unit 614 executed by the intra prediction execution unit 612 according to the H. 264 standard. Of the template matching modes executed by, the mode with the lowest cost is selected. However, cost C is not limited to the rate-distortion basis and can be defined in various ways.

According to the rate-distortion which is a representative method for the cost, it can be calculated as Equation 2. Here, E denotes the difference between the signal reconstructed by decoding the encoded bit and the original signal, and B denotes the amount of bits required for each coding. In addition, λ is a Lagrangian coefficient and means a coefficient which can adjust the reflection ratio of E and B. FIG.

[Equation 2]

The residual signal calculator 618 calculates a residual signal between the prediction block and the target block according to the mode selected by the mode selector 616.

The integer transform execution unit 620 performs nine directional intra prediction modes and the template prediction execution unit 614 when the template matching mode is selected by the spatial prediction execution unit 610, that is, executed by the intra prediction execution unit 612. If the template matching mode is selected as the lowest cost by the mode selection unit 616 among the template matching modes performed by), integer conversion is performed on the residual signal of the image predicted by the template matching mode. In the case of the template matching mode, unlike the directional prediction mode, the adaptive transform described later is not defined, but the prediction block by the template matching mode may use an integer transform defined in the H. 264 standard.

Unlike the H 261, H. 263, MPEG-1, MPEG-2, and MPEG-4, the H. 264 standard adopts an integer transform, which may occur due to the lack of resolution when performing a transform operation. Mismatch was eliminated at the root. In other words, the Discrete Cosine Transform (DCT) operation, which is a transform used in the existing video and still image standards, adopts a floating point operation, and thus the result of the transform operation may vary depending on individual implementations. There was room left. However, in the H. 264 standard, the conversion is defined only by integer and bit shift operations, and the digital system eliminates the possibility of error in operation during standardization.

The size of the macroblock defined in the H. 264 standard is defined as a set of pixels having a size of 16 × 16 as shown in FIG. 8. The macroblock of FIG. 8 shows a state composed of four 8 × 8 partitions having indices of 0 to 3. FIG. This indicates that when encoding transform coefficients of four 8 x 8 partitions from 0 to 3, they are encoded in that order. In addition, the H.264 standard defines whether to determine the coded block pattern for Y (CBPY) based on the presence or absence of nonzero transform coefficients in each 8 x 8 partition.

9 shows that one macroblock is composed of sixteen 4 × 4 partitions. As described in FIG. 8, one macroblock is defined to be divided into four 8 × 8 partitions and processed through a specified order. Likewise, one 8 x 8 partition is defined to be divided into four 4 x 4 partitions and processed in the specified order. This series of configurations is as shown in FIG. In addition, the drawing shows that each DC component of the 16 4 x 4 compartments can be collected to reconstruct the 4 x 4 compartments. In other words, the darker part on the upper left of each 4 x 4 partition is conceptually a part of indicating the DC among the conversion coefficients of each 4 x 4 partition, and it is possible to collect these DC coefficients to form a separate 4 x 4 partition. Done.

On the other hand, the 4 x 4 integer transform is a transform used for the compression of the residual signal of the 4 x 4 partition in the intra and inter modes. Since all transforms in H. 264 can be implemented by addition and bit shift operations only, every basis is defined by a power of 2 and 1 or 2 or 2 only. The basic 4 x 4 integer transform is used to generate transform coefficients for performing a zig-zag scan on the 4 x 4 partition as shown in FIG.

The DCT transform for the 4 x 4 input X is given by equation (3).

[Equation 3]

here,

Has the value By factoring Equation 3, Equation 4 can be obtained.

[Equation 4]

Where E is a scaling factor matrix,

Is a symbol that multiplies the values of (CXC ^T ) and the same position in the E matrix by each other. And d = c / b-0.414. For simplification of Equation 4, assuming that d = 0.5, it is arranged as a determinant like Equation 5.

[Equation 5]

Here, the values of a, b, and d are respectively

Has the value of. Also, in Equation 5

Denotes the equation for forward integer conversion used in H. 264, which can be calculated as the product of the matrix. In addition, in Equation 5, the first and last matrices have only integer values of ± 1 and ± 2, and these values can be simply calculated by addition, subtraction, and shift operations. This is called 'multiplication-free' and can be used very efficiently in a reference encoder.

The MDDT execution unit 630 is any one of nine directional intra prediction modes selected by the spatial prediction execution unit 610, that is, any one of the directional intra prediction modes executed by the intra prediction execution unit 612. If is selected to have the lowest cost, execute the MDDT taking into account the direction.

Mode Dependent Directional Transform (MDDT) uses a basis vector designed based on the Karhunen Loeve Transform (KLT) according to the direction of the intra prediction method for a prediction error block generated after intra prediction is performed. This technique compresses the energy of the error block in the frequency domain. Since MDDT applies transform coding according to the direction of the intra prediction method, characteristics of quantized transform coefficients generated after quantization may also appear in different forms according to the direction. In order to encode these coefficients more efficiently, adaptive scanning may be used.

The MDDT may be selected as a set of transform functions classified according to the directional prediction mode, and such a set of transform functions may be considered as shown in Table 1 below.

Table 1

Here, f _xy denotes the x-th transform function corresponding to the y-th prediction mode. Table 1 describes that N + 1 functions are allocated to each prediction mode, but the present invention is not limited thereto, and the number of functions of each prediction mode may not be the same. For example, mode 0 may have N + 1 assigned transform functions, mode 1 may have N assigned transform functions, and mode 2 may have N − 1 assigned transform functions.

The MDDT execution unit 630 converts the residual signal of the image predicted according to the preset corresponding transform function in response to the directional intra prediction mode selected by the mode selection unit 616 of the spatial prediction execution unit 610. .

11 is a flowchart illustrating a spatial prediction method according to another embodiment of the present invention. Referring to FIG. 6, the spatial prediction execution unit 610 of FIG. 6 executes prediction on a target block using a template matching mode together with a directional intra prediction mode (S1101). In this case, the spatial prediction execution unit 610 compares the costs of the directional intra prediction mode and the template matching mode, and selects the mode having the lowest cost as an optimal mode (S1103).

In addition, when the optimal mode is selected from the directional intra prediction mode and the template matching mode as described above, the residual signal between the prediction block selected by the selected mode and the target block is calculated (S1105).

12 is a flowchart illustrating an image encoding method according to the image encoding apparatus of FIG. 6. Here, since steps S1201 to S1205 calculate the residual signal using the same spatial prediction method as that of FIG. 11, detailed description thereof will be omitted.

If the template matching mode is selected as the optimal mode by the spatial prediction execution unit 610 (S1207), the integer transform calculation unit 620 may perform the process between the prediction block and the target block executed by the template prediction execution unit 614. Integer conversion is performed on the residual signal (S1209).

If one of the directional intra prediction modes other than the template matching mode is selected as the optimal mode by the spatial prediction execution unit 610 (S1207), the MDDT execution unit 630 selects the prediction selected from the preset conversion functions. A transform function corresponding to the mode is selected (S1211), and the residual signal between the prediction block executed by the intra prediction execution unit 612 and the target block is executed using the selected transform function (S1213).

FIG. 13 is a diagram illustrating an example of a structure of a bitstream generated by the video encoding apparatus 600 of FIG. 6. In H.264, bitstreams are encoded in slice units. The bitstream includes a slice header 1310 and a slice date 1320, and the slice data 1320 includes a plurality of macroblock data (MBs) 1321 to 1324. . In addition, one macroblock data 1323 may include an mb_type field 1330, an mb_pred field 1335, and a texture data field 1335.

Here, a value indicating the type of macroblock is recorded in the mb_type field 1330. That is, it indicates whether the current macroblock is an intra macroblock or an inter macroblock.

In the mb_pred field 1335, a detailed prediction mode according to the type of macroblock is recorded. In the case of an intra macroblock, information of a prediction mode selected during intra prediction is recorded, and in case of an inter macroblock, information of a reference frame number and a motion vector is recorded for each macroblock partition. In addition, in the case of the template matching mode, only a bit for informing it may be recorded and the remaining information may be omitted to notify the decoder that the current mode is the template matching mode. For example, when the mode of the current block is selected as the template matching mode, bit 1 is transmitted and the remaining information is omitted. Otherwise, the remaining mode information may be encoded after bit 0 is transmitted.

When the mb-type field 1330 indicates an intra macroblock, the mb-pred field 1335 is divided into a plurality of block information 1342 to 1344, and each block information 1342 is a value of the main mode described above. It is divided into a main_mode field 1345 for recording a sub-mode field 1346 for recording a value of the above-described sub-mode.

Finally, the encoded residual image, that is, the texture data, is recorded in the texture data field 1339.

14 is a diagram schematically illustrating an image decoding apparatus according to an embodiment of the present invention. Referring to the drawings, an image decoding apparatus 1400 according to an embodiment of the present invention may include a mode type determination unit 1410, a template matching execution unit 1420, an inverse integer conversion execution unit 1430, and an inverse MDDT execution unit. 1440.

The mode type determination unit 1410 determines a mode type for the current block with respect to a bitstream encoded and input by spatial prediction encoding. That is, the mode type of the current block is read from the bitstream as shown in FIG. 13 to determine the mode type. For example, when bit 1 indicating that the mode type of the current block is a template matching mode is recorded in the input bitstream, it recognizes that the corresponding bitstream is encoded in the template matching mode and prepares decoding corresponding thereto. In addition, when bit 0 indicating that the mode type of the current block is the directional intra prediction mode is recorded in the input bitstream, decoding of the directional intra prediction block corresponding to the information recorded in the sub mode of the corresponding bitstream is referred to. Prepare.

If the mode type determination unit 1410 determines that the mode type of the current block is the template matching mode, the template matching unit 1420 divides the current block into units of N × N blocks and then divides each N × N block. Template matching is performed on the block. Preferably, the template matching unit 1420 divides the current block in units of 2 × 2 blocks and performs template matching on each of the divided 2 × 2 blocks. At this time, the method for template matching is the SAD between the corresponding pixel among the group of pixels having the same shape as the template 730 (inverted L-shape in the figure) in the search area 700, as shown in FIG. Is calculated, and the area with the smallest SAD is used as the candidate neighboring area 740. In this case, the candidate subblock 750 in contact with the candidate neighboring region 740 is determined as a texture signal for the target subblock 720.

The inverse integer transform execution unit 1430 executes an inverse integer transform on the residual signal between the prediction block matched by the template matching unit 1420 and the target block. In this case, the inverse integer conversion execution unit 1430 may perform inverse integer conversion on the residual signal using Equation 5. In other words, the inverse transform of the residual signal can be performed by inversely transforming the equation (5).

On the other hand, if the mode type determination unit 1410 determines that the mode type of the current block is the directional intra prediction mode, the inverse MDDT execution unit 1440 executes the inverse MDDT in consideration of the directionality of the input bitstream. . That is, when it is determined that the current block of the input bitstream is the directional intra prediction mode, the directional information is considered with reference to the directional information recorded in the sub mode of the bitstream, and the corresponding inverse MDDT is executed. For example, assuming that the set of transform functions is shown in Table 1, it can be seen that N + 1 transform functions are assigned to each prediction mode according to the direction, and the transform function and the directional information recorded in the bitstream Inverse MDDT can be executed based on. Here, the number of transform functions allocated to each prediction mode may be different depending on the direction.

Referring to the figure, the mode type determination unit 1410 determines the mode type of the current block from the input bitstream encoded and input by spatial predictive coding (S1501). That is, based on the structure of the bitstream as shown in FIG. 13, it is determined whether the mode type of the current block is a template matching mode or a directional intra prediction mode. Here, the mode type determination unit 1410 has been described as determining the mode type for the spatial predictive coding, but the present invention is not limited thereto. However, since temporal prediction coding is beyond the subject matter of the present invention, detailed description thereof is omitted.

If it is determined that the mode type of the current block of the input bitstream is the template matching mode (S1503), the template matching unit 1420 divides the current block into units of N × N blocks, and then divides each of the divided N × N blocks. Template matching is performed on the target block (S1505). In this case, as shown in FIG. 7, the template matching performing unit 1420 divides the 4 × 4 target block into 2 × 2 target subblocks (S1505), and executes template matching on each target subblock unit. It is preferable to carry out (S1507).

The block obtained by template matching becomes a result of intra prediction, and the inverse integer transform unit 1430 performs inverse quantization and inverse integer transformation on the residual signal between the generated prediction block and the target block (S1509). The result obtained through inverse quantization and inverse integer transformation is added to the template matching result to form a reconstructed image.

If it is determined that the mode type of the current block of the input bitstream is the directional intra prediction mode, as shown in FIG. 13, the inverse MDDT execution unit 1440 determines the directional intra prediction based on the structure of the input bitstream. The direction of the mode is determined, and inverse quantization and inverse MDDT are performed in consideration of the direction (S1511). In this case, the set of transform functions may be set as shown in Table 1, and the inverse MDDT may be executed based on the assigned transform function according to the direction of each prediction mode.

In the above description, it is described that all the components constituting the embodiments of the present invention are combined or operated in one, but the present invention is not necessarily limited to these embodiments. In other words, within the scope of the present invention, all of the components may be selectively operated in combination with one or more. In addition, although all of the components may be implemented in one independent hardware, each or all of the components may be selectively combined to perform some or all functions combined in one or a plurality of hardware. It may be implemented as a computer program having a. Codes and code segments constituting the computer program may be easily inferred by those skilled in the art. Such a computer program may be stored in a computer readable storage medium and read and executed by a computer, thereby implementing embodiments of the present invention. The storage medium of the computer program may include a magnetic recording medium, an optical recording medium, a carrier wave medium, and the like.

In addition, the terms "comprise", "comprise" or "having" described above mean that the corresponding component may be inherent unless specifically stated otherwise, and thus excludes other components. It should be construed that it may further include other components instead. All terms, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. Terms commonly used, such as terms defined in a dictionary, should be interpreted to coincide with the contextual meaning of the related art, and shall not be interpreted in an ideal or excessively formal sense unless explicitly defined in the present invention.

The above description is merely illustrative of the technical idea of the present invention, and those skilled in the art to which the present invention pertains may make various modifications and changes without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention but to describe the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of the present invention.

As described above, the embodiment of the present invention is applied to an intra prediction apparatus, an image encoding and decoding field, and compared to the H.264 standard, the intra prediction is performed while reducing the bit rate without greatly increasing the overhead of the bitstream generator. It is a very useful invention to produce an effect that can increase the accuracy of.

CROSS-REFERENCE TO RELATED APPLICATION

This patent application claims priority under patent application number 119 (a) (35 USC § 119 (a)) to patent application No. 10-2009-0119570, filed in Korea on December 04, 2009. All content is incorporated by reference in this patent application. In addition, if this patent application claims priority for the same reason as above for a country other than the United States, all the contents thereof are incorporated into this patent application by reference.

Claims

In the video encoding apparatus,

Performs prediction on the target block using Template Matching mode together with directional intra prediction mode, and performs spatial prediction to select the lowest cost mode based on rate-distortion. part; And

When the template matching mode is selected by the spatial prediction execution unit, an integer conversion execution unit that performs integer conversion on a residual signal of an image predicted by the template matching mode.

An image encoding apparatus comprising a.
The method of claim 1,

The integer conversion execution unit converts an integer using the following equation,

Is the formula of the forward integer conversion used in H. 264, and the values of a, b, and d are respectively
Video encoding apparatus.
The method of claim 1,

The spatial prediction executing unit selects a low cost mode by the following equation:

C = E + λB

Where C is the cost, E is the difference between the reconstructed signal and the original signal when decoding the coded bits, B is the amount of bits required for each coding, and λ is the Lagrangian coefficient, which reflects the reflection ratio of E and B. Indicates an adjustable factor.
The method of claim 1,

MDDT execution unit that executes a Mode Dependent Directional Transform (MDDT) in consideration of the directionality when any one of nine modes of the directional intra prediction mode is selected by the spatial prediction execution unit.

The image encoding apparatus further comprises.
The method of claim 4, wherein

And the MDDT executing unit converts the residual signal of the predicted image according to a transform function corresponding to the selected mode among preset transform functions corresponding to the directional intra prediction mode.
In the spatial prediction device,

An intra prediction execution unit that performs prediction on the target block using the directional intra prediction mode;

A template prediction execution unit that executes prediction on the target block using a template matching mode;

A mode selection unit for selecting a mode having a lowest cost based on rate-distortion among a prediction mode executed by the intra prediction execution unit and a template matching mode executed by the template prediction execution unit; And

Residual signal calculator for calculating a residual signal between the prediction block and the target block in the selected mode

Spatial prediction device comprising a.
In the video decoding apparatus,

A mode type determination unit for determining a mode type of a current block with respect to a bitstream encoded and input by spatial prediction encoding;

If it is determined by the mode type determination unit that the mode type of the current block is a template matching mode, the current block is divided into N x N block units and then template matching is performed on each of the divided N x N blocks. A template matching performing unit; And

Inverse integer transform execution unit for performing inverse integer transform on the residual signal between the prediction block and the target block by the template matching

Video decoding apparatus comprising a.
The method of claim 7, wherein

If the mode type determination unit determines that the mode type of the current block is a directional intra prediction mode, an inverse MDDT execution unit that executes an inverse MDDT in consideration of the directionality

The video decoding apparatus further comprises.
The method of claim 7, wherein

The template matching execution unit divides the current block into units of 2 × 2 blocks and performs template matching on each of the 2 × 2 blocks.
In the video encoding method,

Performing prediction for the target block using the template matching mode together with the directional intra prediction mode;

Selecting a mode having the lowest cost among the modes executed by the prediction execution step;

Calculating a residual signal between the prediction block generated by the mode selected by the selecting step and the target block; And

If the mode selected by the selection step is the template matching mode, integer conversion is performed on the residual signal calculated by the calculation step; and when the mode selected by the selection step is the directional prediction mode, the calculation step Executing an MDDT on the residual signal calculated by

Image encoding method comprising a.
The method of claim 10,

If the mode selected by the selecting step is the directional prediction mode, selecting a transform function corresponding to the corresponding prediction mode among preset conversion functions;

More,

And executing the MDDT according to the selected transform function.
In the spatial prediction method,

Performing prediction for the target block using the template matching mode together with the directional intra prediction mode;

Selecting a mode having the lowest cost among the modes executed by the prediction execution step; And

Calculating a residual signal between the prediction block generated by the mode selected by the selecting step and the target block;

Spatial prediction method comprising a.
In the video decoding method,

Determining a mode type of a current block from a bitstream encoded and input by spatial prediction encoding;

If it is determined that the mode type of the current block is a template matching mode, dividing the current block into units of N × N blocks and performing template matching on each of the divided N × N blocks; And

Performing inverse integer transform on the residual signal between the prediction block and the target block by the template matching;

Image decoding method comprising a.
The method of claim 13,

If it is determined that the mode type of the current block is a directional intra prediction mode, executing inverse MDDT in consideration of directionality;

The video decoding method further comprising.