WO2017043766A1

WO2017043766A1 - Video encoding and decoding method and device

Info

Publication number: WO2017043766A1
Application number: PCT/KR2016/008258
Authority: WO
Inventors: 이진영; 박민우; 김찬열
Original assignee: 삼성전자 주식회사
Priority date: 2015-09-10
Filing date: 2016-07-28
Publication date: 2017-03-16
Also published as: EP3297286A1; KR20180040517A; US20180199058A1; EP3297286A4; CN108028933A

Abstract

Disclosed is an encoding device. The present encoding device comprises a processor which: divides a target block of the current frame into a first area and a second area according to a predetermined division method and an interface communicating with a decoding device; searches for a first motion vector for the first area in a first reference frame, so as to generate a first prediction block including an area corresponding to the first area; divides the first prediction block into a third area and a fourth area according to the predetermined division method, and generates boundary information; searches for a second motion vector for the fourth area corresponding to the second area in a second reference frame, and generates a second prediction block including an area corresponding to the fourth area; merges the first prediction block and the second prediction block according to the boundary information so as to generate a third prediction block corresponding to the target block; and controls the interface to transmit the first motion vector and the second motion vector to the decoding device.

Description

Video encoding and decoding method and apparatus

The present invention relates to a video encoding and decoding method and apparatus, and more particularly, to a video encoding and decoding method and apparatus for performing inter prediction.

BACKGROUND With the development of electronic technology, high resolution images such as high definition (HD) images and ultra high definition (UHD) images are spreading. To provide high resolution images, high efficiency image compression technology is required. For example, when processing 30 RGB images per second with 8bits per component sample of 1280 × 720 resolution (HD), 1280 × 720 × 8 × 3 × 30 = 663,552,000 bits should be processed per second, but 3840 × 2160 resolution (UHD). In the case of processing 30 RGB images of 8 bits per component sample per second), it is necessary to process 3840 × 2160 × 8 × 3 × 30 = 5,971,968,000 bits per second. That is, as the resolution of the image increases, the bits to be processed increase exponentially, and the storage cost and the transmission cost of the image increase.

Image compression technology is a technique of compressing image data by dividing one frame into a plurality of blocks in order to reduce bits to be processed, and removing temporal and spatial redundancy for each block. . A method of compressing an image using pixels around a target block to be encoded is an example of a method of compressing an image by removing spatial redundancy, and the method is generally referred to as intra prediction encoding. A method of compressing an image using a reference block of another frame compressed before a target block is an example of a method of compressing an image by removing temporal redundancy, and the method is called inter prediction encoding.

In conventional inter prediction encoding, only a rectangular block is used to encode a target block. In each block, the width and length of the block border are parallel to the width and length of the frame, respectively.

However, in an actual image, a much larger number of objects represented by curves are distributed, and there is a problem in that the accuracy of encoding is inferior when the objects represented by curves are divided and encoded into rectangular blocks. Accordingly, a need arises for encoding by reflecting a boundary of an object included in an image.

SUMMARY OF THE INVENTION The present invention is directed to the above-described needs, and an object of the present invention is to provide a video encoding and decoding method and apparatus for performing inter prediction by dividing a target block of a current frame into a plurality of regions.

According to an embodiment of the present invention for achieving the above object, in the encoding method of the encoding apparatus, the target block of the current frame is divided into a first region and a second region according to a predetermined division method, the first Generating a first prediction block including a region corresponding to the first region by searching for a first motion vector with respect to the first region in a reference frame, according to the preset division method, Dividing a first prediction block into a third region and a fourth region, and generating boundary information; searching for a second motion vector of the fourth region corresponding to the second region in a second reference frame to search for the fourth region; Generating a second prediction block including an area corresponding to an area and merging the first prediction block and the second prediction block according to the boundary information to correspond to the target block; Generating a third predictive block.

According to an embodiment of the present invention for achieving the above object, in the decoding method of the decoding apparatus, the first motion vector and the first motion vector found in the first reference frame with respect to the target block to be decoded in the current frame Receiving a second motion vector searched in a second reference frame, in each of the first reference frame and the second reference frame, based on the first motion vector and the second motion vector, a first prediction block and a second motion vector; Generating a prediction block, dividing the first prediction block into a plurality of regions according to a predetermined division method, generating boundary information, and generating the first prediction block and the second prediction block according to the boundary information. Generating a third prediction block corresponding to the target block by merging.

Meanwhile, according to an embodiment of the present invention, the encoding apparatus divides a target block of a current frame into a first region and a second region according to an interface for communicating with a decoding apparatus and a predetermined division method, and includes a first reference. Search for a first motion vector for the first region in a reference frame to generate a first prediction block including a region corresponding to the first region, and according to the preset division method, the first prediction Splitting the block into third and fourth regions, generating boundary information, searching for a second motion vector for the fourth region corresponding to the second region in a second reference frame, and corresponding to the fourth region Generate a second prediction block including a region to be formed, and merge the first prediction block and the second prediction block according to the boundary information to form a third prediction block corresponding to the target block. Generation and includes the first motion vector and a processor for controlling the interface to transmit the second motion vector to said decryption device.

Meanwhile, according to an embodiment of the present invention, the decoding apparatus may include a first motion vector and a first motion vector searched in a first reference frame with respect to an interface communicating with the encoding apparatus and a target block to be decoded in the current frame. When the second motion vector searched in the second reference frame is received from the encoding apparatus, in each of the first reference frame and the second reference frame, a first prediction block is based on the first motion vector and the second motion vector. And generating a second prediction block, dividing the first prediction block into a plurality of regions according to a predetermined division method, generating boundary information, and generating the first prediction block and the second prediction according to the boundary information. And a processor for merging the blocks to generate a third prediction block corresponding to the target block.

According to various embodiments of the present disclosure as described above, the prediction accuracy may be improved by dividing the target block of the current frame into a plurality of regions according to pixel values of the target block.

1 is a block diagram showing a configuration of an encoding apparatus for helping understanding of the present invention.

2 is a block diagram showing a configuration of a decoding apparatus for helping understanding of the present invention.

3 is a simplified block diagram illustrating an encoding apparatus according to an embodiment of the present invention.

4A and 4B are diagrams for describing a method of dividing a target block, according to an exemplary embodiment.

5 is a diagram for describing a method of generating a prediction block according to an embodiment of the present invention.

6A and 6B are diagrams for describing boundary information according to an embodiment of the present invention.

7 is a diagram for describing a method of merging prediction blocks according to an embodiment of the present invention.

8 is a simplified block diagram illustrating a decoding apparatus according to an embodiment of the present invention.

9 is a flowchart illustrating a prediction block generation method of an encoding apparatus according to an embodiment of the present invention.

10 is a flowchart illustrating a method of generating a prediction block of a decoding apparatus according to an embodiment of the present invention.

Hereinafter, various embodiments of the present disclosure will be described with reference to the accompanying drawings. It is to be understood that the description herein is not intended to limit the scope of the invention to specific embodiments, but includes various modifications, equivalents, and / or alternatives to the embodiments. In connection with the description of the drawings, the same or similar reference numerals may be used for similar components.

In addition, in this specification, one component (eg, a first component) is coupled or connected to another component (eg, a second component) operatively or communicatively. Reference to “connected to” should be understood to include all components, even when each component is directly connected or indirectly connected through another component (eg, a third component). On the other hand, when a component (e.g., a first component) is said to be "directly connected" or "directly connected" to another component (e.g., a second component), it is different from a component. It may be understood that there is no other component (eg, a third component) between the elements.

The terminology used herein is for the purpose of describing any example embodiments only and is not intended to limit the scope of other embodiments. In addition, in the present specification, a singular expression may be used for convenience of description, but it may be interpreted as a meaning including a plurality of expressions unless the context clearly indicates otherwise. In addition, the terms used herein may have the same meaning as commonly understood by those of ordinary skill in the art. Among the terms used herein, the terms defined in the general dictionary may be interpreted as having the same or similar meaning as the meaning in the context of the related art, and unless it is clearly defined herein, the ideal or excessively formal meaning Not interpreted as In some cases, even if terms are defined herein, they may not be interpreted to exclude embodiments of the present disclosure.

Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram illustrating a configuration of an encoding apparatus 100 to help understanding of the present invention. As shown in FIG. 1, the encoding apparatus 100 may include a motion predictor 111, a motion compensator 112, an intra predictor 120, a switch 115, a subtractor 125, and a converter 130. And a quantization unit 140, an entropy encoding unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference image buffer 190.

The encoding apparatus 100 is an apparatus that encodes a video and changes it into another signal form. In this case, the video includes a plurality of frames, and each frame may include a plurality of pixels. For example, the encoding apparatus 100 may be an apparatus for compressing raw original data. Alternatively, the encoding apparatus 100 may be an apparatus for changing the precoded data into another signal form.

The encoding apparatus 100 may perform encoding by dividing each frame into a plurality of blocks. The encoding apparatus 100 may perform encoding on a block basis through temporal or spatial prediction, transformation, quantization, filtering, entropy encoding, and the like.

Prediction means generating a prediction block similar to a target block to be encoded. Here, a unit of a target block to be encoded may be defined as a prediction unit (PU), and prediction is divided into temporal prediction and spatial prediction.

Temporal prediction means inter prediction. The encoding apparatus 100 may store some reference pictures having a high correlation with the current image to be encoded, and may perform inter-screen prediction by using the reference pictures. That is, the encoding apparatus 100 may generate a prediction block from a reference image decoded after encoding at a previous time. In this case, the encoding apparatus 100 is said to be inter prediction encoding.

In the case of inter prediction encoding, the motion predictor 111 may search for a block having the highest temporal correlation with the target block from the reference image stored in the reference image buffer 190. The motion predictor 111 may search for a block having the highest temporal correlation with the target block in the interpolated image by interpolating the reference image.

Here, the reference picture buffer 190 is a space for storing the reference picture. The reference image buffer 190 is used only when performing inter prediction and may store some reference images having a high correlation with the image to be currently encoded. The reference image may be an image generated by sequentially transforming, quantizing, inverse quantization, inverse transformation, and filtering a difference block to be described later. That is, the reference picture may be a picture decoded after encoding.

The motion compensator 112 may generate a prediction block based on the motion information of the block having the highest temporal correlation with the target block found by the motion predictor 111. Here, the motion information may include a motion vector, a reference picture index, and the like.

Spatial prediction means intra prediction. The intra predictor 120 may generate a prediction value for the target block by performing spatial prediction from encoded neighboring pixels in the current image. In this case, the encoding apparatus 100 is said to be intra prediction encoding.

Inter prediction encoding or intra prediction encoding may be determined in units of coding units (CUs). Here, the coding unit may include at least one prediction unit. When the method of predictive encoding is determined, the position of the switch 115 may be changed to correspond to the method of predictive encoding.

Meanwhile, the reference picture decoded after temporal prediction may be an image to which filtering has been applied, or the adjacent pixels decoded after encoding in spatial prediction may be pixels to which no filtering is applied.

The subtractor 125 may generate a residual block by obtaining a difference between the target block and the prediction block obtained from the temporal prediction or the spatial prediction. The difference block may be a block from which a lot of redundancy has been removed by the prediction process, but may be a block including information to be encoded because the prediction is not completely performed.

The transformer 130 may output a transform coefficient of the frequency domain by transforming the difference block after prediction within the screen or between the screens in order to remove spatial redundancy. In this case, a unit of a transform is a transform unit (TU), and may be determined irrespective of a prediction unit. For example, a frame including a plurality of difference blocks may be divided into a plurality of transform units regardless of a prediction unit, and the transform unit 130 may perform the transform for each transform unit. The division of the transform unit may be determined according to the bit rate optimization.

However, the present invention is not limited thereto, and the transform unit may be determined in association with at least one of the coding unit and the prediction unit.

The converter 130 may perform a conversion to concentrate energy of each conversion unit in a specific frequency region. For example, the transform unit 130 may concentrate data in the low frequency region by performing a discrete cosine transform (DCT) based transformation on each transform unit. Alternatively, the transform unit 130 may perform a Discrete Fourier Transform (DFT) based transform or a Discrete Sine Transform (DST) based transform.

The quantization unit 140 performs quantization on the transform coefficients and approximates the transform coefficients to representative values of a predetermined number. That is, the quantization unit 140 may map input values in a specific range to one representative value. In this process, high frequency signals that are not well recognized by humans can be eliminated and information loss can occur.

The quantization unit 140 may use one of equalization and non-uniform quantization methods according to the probability distribution of the input data or the purpose of quantization. For example, when the probability distribution of the input data is equal, the quantization unit 140 may use an equalization quantization method. Alternatively, the quantization unit 140 may use a non-uniform quantization method when the probability distribution of the input data is not equal.

The entropy encoder 150 may reduce the amount of data by variably allocating the lengths of the symbols according to the occurrence probability of the symbol with respect to the data input from the quantization unit 140. That is, the entropy encoder 150 may generate a bit stream by expressing the input data as a bit string having a variable length consisting of 0 and 1 based on the probability model.

For example, the entropy encoder 150 may express input data by allocating a small number of bits to a symbol having a high occurrence probability and a large number of bits to a symbol having a low occurrence probability. Accordingly, the size of the bit string of the input data can be reduced, and the compression performance of video encoding can be improved.

The entropy encoder 150 may perform entropy coding by variable length coding or arithmetic coding such as Huffman coding and Exponential-Golomb coding.

The inverse quantization unit 160 and the inverse transform unit 170 may receive the input quantized transform coefficients and perform inverse transformation after inverse quantization, respectively, to generate a reconstructed differential block.

The adder 175 may generate the reconstructed block by adding the reconstructed difference block and the predictive block obtained from the temporal prediction or the spatial prediction.

The filter unit 180 may apply at least one of a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) to the reconstructed image. The filtered reconstructed image may be stored in the reference image buffer 190 and used as a reference image.

2 is a block diagram illustrating a configuration of a decoding apparatus 200 to help understanding of the present invention. As shown in FIG. 2, the decoding apparatus 200 may include an entropy decoder 210, an inverse quantizer 220, an inverse transformer 230, an adder 235, an intra predictor 240, and a motion compensator ( 250, a switch 255, a filter unit 260, and a reference image buffer 270.

The decoding apparatus 200 may reconstruct a video by receiving a bit stream generated by the encoding apparatus and performing decoding. The decoding apparatus 200 may perform decoding through entropy decoding, inverse quantization, inverse transformation, filtering, and the like on a block basis.

The entropy decoder 210 may entropy decode the input bit stream to generate quantized transform coefficients. In this case, the entropy decoding method may be a method in which the method used by the entropy encoder 150 is reversely applied to FIG. 1.

The inverse quantization unit 220 may receive inverse quantization by receiving a quantized transform coefficient. That is, according to the operations of the quantization unit 140 and the inverse quantization unit 220, an input value of a specific range is changed to any one reference input value within a specific range, and in this process, an input value and one reference input value As much as errors can occur.

The inverse transform unit 230 inversely transforms the data output from the inverse quantization unit 220, and inversely applies the method used by the transformer 130 inversely. The inverse transform unit 230 may generate a reconstructed difference block by performing inverse transform.

The adder 235 may generate the reconstructed block by adding the reconstructed difference block and the predictive block. Here, the prediction block may be a block generated by inter prediction encoding or intra prediction encoding.

In the case of inter prediction encoding, the motion compensator 250 receives or derives motion information about a target block to be decoded from the encoding apparatus 100 (derived from a neighboring block) and based on the received or derived motion information. A prediction block can be generated. Here, the motion compensator 250 may generate a prediction block from the reference picture stored in the reference picture buffer 270. The motion information may include a motion vector, a reference picture index, and the like, for a block having the highest temporal correlation with the target block.

Here, the reference image buffer 270 may store some reference images having a high correlation with the image to be currently decoded. The reference image may be an image generated by filtering the above-described reconstruction block. That is, the reference image may be an image obtained by decoding a bit stream generated by the encoding apparatus. In addition, the reference image used in the decoding apparatus may be the same as the reference image used in the encoding apparatus.

In the case of intra prediction encoding, the intra prediction unit 240 may generate a prediction value for a target block by performing spatial prediction from encoded adjacent pixels in the current image.

On the other hand, the switch 255 may be changed in position according to the method of predictive encoding of the target block.

The filter unit 260 may apply at least one of a deblocking filter, SAO, and ALF to the reconstructed image. The filtered reconstructed image may be stored in the reference image buffer 270 and used as a reference image.

Meanwhile, the decoding apparatus 200 may further include a parser (not shown) which parses information related to an encoded image included in a bit stream. The parser may include the entropy decoder 210 or may be included in the entropy decoder 210.

As described above, the encoding apparatus 100 may compress data of a video through an encoding process and transmit the compressed data to the decoding apparatus 200. The decoding apparatus 200 may reconstruct the video by decoding the compressed data.

Hereinafter, a method of performing motion prediction in the case of inter prediction encoding will be described in detail. In particular, a method of performing motion prediction by dividing a target block into a plurality of regions will be described.

3 is a simplified block diagram illustrating an encoding device 100 according to an embodiment of the present invention.

As shown in FIG. 3, the encoding apparatus 100 includes an interface 310 and a processor 320.

Meanwhile, FIG. 3 briefly illustrates various components, for example, when the encoding apparatus 100 is a device having a function such as a communication function and a control function. Therefore, according to the exemplary embodiment, some of the components shown in FIG. 3 may be omitted or changed, and other components may be further added.

The interface 310 may communicate with the decryption apparatus 200. In detail, the interface 310 may transmit the encoded bit stream, motion information, or the like to the decoding apparatus 200.

The interface 310 communicates with the decryption apparatus 200 using wired / wireless LAN, WAN, Ethernet, Bluetooth, Zigbee, IEEE 1394, Wifi, or Power Line Communication (PLC). Can be performed.

The processor 320 may divide the target block of the current frame to be encoded into a first region and a second region according to a preset division method. Here, the predetermined division method may be a method of dividing the target block into a plurality of areas based on pixel values of a plurality of pixels constituting the target block.

For example, the processor 320 may calculate an average value from pixel values of a plurality of pixels constituting the target block, and divide the target block into a first area and a second area based on the average value. Alternatively, the processor 320 may divide the target block using a predetermined value rather than an average value.

However, the present invention is not limited thereto, and the processor 320 may use any method as long as it can determine a boundary in the target block.

The processor 320 may generate a first prediction block including a region corresponding to the first region by searching for a first motion vector with respect to the first region in the first reference frame. Here, the reference frame may be one of the reference pictures.

Meanwhile, the motion vector may be represented by (Δx, Δy). For example, the first prediction block may be a region located at (-1, 5) with respect to the first region in a frame before one frame of the frame having the first region. In this case, the motion vector may be a difference between the same reference point of the first region and the first prediction block. For example, the motion vector may be a difference between coordinate values of an upper left point of the first region and an upper left point of the first prediction block.

The processor 320 may search for an area corresponding to only the first area and not an area corresponding to the entire target block. That is, the processor 320 may search for a block having the highest temporal correlation with the first block, instead of searching for a block having the highest temporal correlation with the target block.

Alternatively, the processor 320 may search for a first motion vector with respect to the first region in the first reference frame and apply a first prediction corresponding to a region in which different weights are applied to pixel values constituting the first region and the second region, respectively. You can also create blocks. Here, the processor 320 may determine a weight to be applied to the first region and the second region based on pixel values constituting the first region and the second region.

When the first prediction block is generated, the processor 320 may divide the first prediction block into a third region and a fourth region according to a predetermined division method, and generate boundary information. Here, the predetermined division method is the same as the method of dividing the target block.

The decoding apparatus 200 cannot divide the target block because there is no information about the target block (the original image). However, the decoding apparatus 200 may reconstruct the reference frame, and thus may split the first prediction block that is a part of the first reference frame.

Therefore, even if the encoding apparatus 100 splits the target block using a predetermined method and the decoding apparatus 200 splits the first prediction block using the same preset method, the splitting result is different, and an error is generated. Will occur.

However, when the encoding apparatus 100 divides the first prediction block into the third region and the fourth region, since the same first prediction block as the decoding apparatus 200 may be divided, an error does not occur. Therefore, the encoding apparatus 100 splits the first prediction block once again when the first prediction block is generated.

On the other hand, since the first prediction block includes a region corresponding to the first region of the target block, the third region divided by the same method has a form similar to that of the first region. Accordingly, the fourth region also has a similar shape to the second region.

The processor 320 may generate a second prediction block including a region corresponding to the fourth region by searching for a second motion vector for the fourth region corresponding to the second region in the second reference frame. Here, the second reference frame may be one of the reference pictures, and may be a different frame from the first reference frame. However, the present invention is not limited thereto, and the second reference frame and the first reference frame may be the same frame.

The processor 320 may search for an area corresponding only to the fourth area, not an area corresponding to the entire first prediction block. That is, the processor 320 may search for the block having the highest temporal correlation with the fourth region, instead of searching for the block having the highest temporal correlation with the first prediction block.

Alternatively, the processor 320 may search for a second motion vector with respect to a fourth region corresponding to the second region in the second reference frame, and apply a different weight to the pixel values constituting the third region and the fourth region. A corresponding second prediction block may be generated. Here, the processor 320 may determine weights to be applied to the third and fourth regions based on pixel values constituting the third and fourth regions.

The processor 320 may generate a third prediction block corresponding to the target block by merging the first prediction block and the second prediction block according to the boundary information. For example, the processor 320 may generate the third prediction block by merging the third region of the first prediction block and the region corresponding to the fourth region in the second prediction block based on the boundary information.

Meanwhile, after generating the third prediction block, the processor 320 may apply horizontal and vertical filtering to boundaries between regions corresponding to the third region and the fourth region.

In addition, the processor 320 may control the interface 310 to transmit the first motion vector and the second motion vector to the decoding apparatus 200.

The left side of FIG. 4A shows the current frame to be encoded, and the right side shows an enlarged view of the target block 410 of the current frame. The current frame is divided into a plurality of blocks of the same size, but this is only an example. For example, the current frame may be divided into a plurality of blocks of different sizes or may include rectangular blocks other than squares.

As shown in the right side of FIG. 4A, the processor 320 may divide the target block 410 into the first region 420 and the second region 430 according to a predetermined division method. For example, the processor 320 may divide the target block 410 based on a predetermined pixel value. The preset pixel value may be an average pixel value of a plurality of pixels constituting the target block 410. Alternatively, the preset pixel value may be an average pixel value of some pixels constituting the target block 410. Alternatively, the preset pixel value may be a pixel value set by the user.

The processor 320 may divide the target block 410 into two regions based on one pixel value, but is not limited thereto. For example, the processor 320 may divide the target block 410 into a plurality of regions based on the plurality of pixel values.

As illustrated in FIG. 4B, the processor 320 divides the target block 410 into the first region 420, the second region 430, and the third region 440 based on a predetermined pixel value. You may. In this case, the processor 320 ignores the third region 440 in consideration of the number of pixels constituting the third region 440, and replaces the target block 410 with the first region 420 and the second region ( 430 may be divided. Accordingly, the processor 320 may distinguish the target block 410 based on the most prominent boundary of the target block 410.

However, the present invention is not limited thereto, and the processor 320 may not divide the target block 410. For example, the processor 320 may not divide the target block 410 when the pixel values of the plurality of pixels constituting the target block 410 are irregular.

As shown in FIG. 5, the processor 320 searches for a first motion vector for the first area 510 in the reference frame to generate a first prediction block 530 including an area corresponding to the first area. can do. In this case, the processor 320 may perform prediction without considering the second region 520. Alternatively, prediction may be performed by considering a part of the second region.

Alternatively, the processor 320 searches for a first motion vector for the first region 510 in the reference frame, and applies different weights to pixel values constituting the first region 510 and the second region 520, respectively. The first prediction block 530 corresponding to the region may be generated. In this case, the processor 320 may determine a weight to be applied to the first region 510 and the second region 520 based on pixel values constituting the first region 510 and the second region 520.

For example, the processor 320 may determine a weight to be applied to each region so that the boundary of the first region 510 and the second region 520 is prominent.

However, the present invention is not limited thereto, and the processor 320 may set the shape of the first region 510 to be more important than the pixel value of the first region 510 and perform prediction. Alternatively, the processor 320 may set the shape of the first area 510 to be more important than the shape of the second area 520 and perform prediction.

In FIG. 5, only the unidirectional prediction is illustrated, but is not limited thereto. For example, the processor 320 may perform bidirectional prediction. In particular, the processor 320 may perform bidirectional prediction considering only the first region 510. In this case, the processor 320 may perform weighted prediction.

The processor 320 may generate the first prediction block and then divide the first prediction block by the same division method as the division method of the target block.

The first prediction block is similar to the target block, but may not be exactly the same. Therefore, as shown in FIG. 6A, the division boundary 610 of the target block and the division boundary 620 of the first prediction block may have an error.

The processor 320 may divide the first prediction block and generate boundary information. The boundary information may be generated as a mask for each region. Alternatively, the boundary information may be information indicating the coordinates of the boundary of each region.

The processor 320 divides the first prediction block into a third region and a fourth region, searches for a second motion vector for the fourth region corresponding to the second region in the second reference frame, and corresponds to the fourth region. A second prediction block including the region may be generated. Since this process is the same as the method of generating the prediction block of FIG. 5, a detailed description thereof is omitted.

As shown in FIG. 7, the processor 320 merges the first prediction block 710 and the second prediction block 720 according to the boundary information 735, and thus the third prediction block 730 corresponding to the target block. Can be generated. Here, the boundary information 735 may be information about a division boundary of the first prediction block.

In particular, the processor 320 may correspond to the region 725 corresponding to the fourth region 716 in the third region 715 and the second prediction block 720 of the first prediction block 710 based on the boundary information 735. ) May be merged to generate a third prediction block 730.

Alternatively, the processor 320 may generate the third prediction block 730 by masking the first prediction block 710 and the second prediction block 720. Alternatively, the processor 320 may generate the third prediction block 730 by applying different weights to the first prediction block 710 and the second prediction block 720, respectively.

Thereafter, the processor 320 generates the third prediction block 730 and then applies horizontal and vertical filtering to the boundary between the region 725 corresponding to the third region 715 and the fourth region 716. can do. In particular, the processor 320 may determine the filter coefficient and the size in consideration of the characteristics of the third prediction block 730.

The processor 320 may transmit the generated motion vector to the decoding apparatus 200. The processor 320 may transmit the absolute value of the generated motion vector to the decoding apparatus 200 or may transmit a difference value with the predicted motion vector. In this case, the processor 320 may use different prediction motion vectors for each divided region.

3 to 7 have described the prediction block generation operation of the encoding apparatus 100. Since the operation of the encoding apparatus 100 after generating the predictive block is the same as described with reference to FIG. 1, it is omitted.

8 is a simplified block diagram illustrating a decoding apparatus 200 according to an embodiment of the present invention.

As shown in FIG. 8, the decoding device 200 includes an interface 810 and a processor 820.

Meanwhile, FIG. 8 is a diagram briefly illustrating various components, for example, when the decoding apparatus 200 is a device having a function such as a communication function and a control function. Therefore, according to an embodiment, some of the components shown in FIG. 8 may be omitted or changed, and other components may be further added.

The interface 810 may communicate with the encoding apparatus 100. In detail, the interface 810 may receive an encoded bit stream, motion information, and the like from the encoding apparatus 100.

The interface 810 communicates with the encoding apparatus 100 using wired / wireless LAN, WAN, Ethernet, Bluetooth, Zigbee, IEEE 1394, Wifi, or Power Line Communication (PLC). Can be performed.

The processor 820 receives, from the encoding apparatus 100, the first motion vector searched in the first reference frame and the second motion vector searched in the second reference frame with respect to the target block to be decoded in the current frame. can do. Here, the processor 820 may receive the absolute values of the first motion vector and the second motion vector from the encoding apparatus 100.

Alternatively, the processor 820 may receive a difference value with the predicted motion vector. In this case, the processor 820 may receive a difference value using different prediction motion vectors for each divided region.

When the difference value is received, the processor 820 may calculate the motion vector by adding the predicted motion vector and the difference value.

The processor 820 may generate a first prediction block and a second prediction block based on the first motion vector and the second motion vector in each of the first reference frame and the second reference frame. Here, the first reference frame and the second reference frame may be the same reference frame. In addition, the reference frame may be one of the reference images.

The processor 820 may divide the first prediction block into a plurality of regions and generate boundary information according to a predetermined division method. Here, the preset division method is the same as the preset method of dividing the target block used by the encoding apparatus 100.

As an example of the division method, there may be a method of dividing the first prediction block into a plurality of regions based on pixel values of a plurality of pixels constituting the first prediction block.

The processor 820 may generate a third prediction block corresponding to the target block by merging the first prediction block and the second prediction block according to the boundary information.

In particular, the processor 820 divides the first prediction block into the first region and the second region according to a preset division method, and based on the boundary information, the processor 820 removes the first prediction block from the first region and the second prediction block of the first prediction block. The third prediction block may be generated by merging regions corresponding to the two regions. However, this is only an example. The third prediction block may be generated by dividing the first prediction block into three or more regions.

After generating the third prediction block, the processor 820 may apply horizontal and vertical filtering to boundaries between regions corresponding to the first region and the second region. In particular, the processor 820 may determine the filter coefficient and the size in consideration of the characteristics of the third prediction block.

In FIG. 8, a prediction block generation operation of the decoding apparatus 200 has been described. Except for performing the prediction, the operation of generating the prediction block is the same as the encoding apparatus 100, and thus, a detailed description thereof is omitted. In addition, since the operation of the decoding apparatus 200 after generating the predictive block is the same as described with reference to FIG. 2, it is omitted.

First, the target block of the current frame is divided into a first region and a second region according to a predetermined division method (S910). In operation S920, a first prediction block including a region corresponding to the first region is generated by searching for a first motion vector for the first region in the first reference frame. In operation S930, the first prediction block is divided into a third region and a fourth region, and boundary information is generated according to a predetermined division method. In operation S940, a second prediction block including a region corresponding to the fourth region is generated by searching for a second motion vector for the fourth region corresponding to the second region in the second reference frame. In operation S950, the first prediction block and the second prediction block are merged according to the boundary information to generate a third prediction block corresponding to the target block.

Here, the predetermined division method may be a method of dividing the target block into a plurality of areas based on pixel values of a plurality of pixels constituting the target block.

In operation S950, generating the third prediction block may generate a third prediction block by merging the third region of the first prediction block and the region corresponding to the fourth region in the second prediction block. Can be.

In particular, after generating the third prediction block, horizontal and vertical filtering may be applied to boundaries between regions corresponding to the third and fourth regions.

In operation S920, the generating of the first prediction block searches for a first motion vector for the first region in the first reference frame, and applies different weights to pixel values constituting the first region and the second region, respectively. Generating the first prediction block corresponding to the region and generating the second prediction block (S940) may be performed by searching for a second motion vector for the fourth region corresponding to the second region in the second reference frame to search for the third region. And second prediction blocks corresponding to regions in which different weights are applied to pixel values constituting the fourth region.

Here, a weight to be applied to the first region and the second region is determined based on pixel values configuring the first region and the second region, and the third region and the pixel value based on the pixel values configuring the third region and the fourth region. The weight to be applied to the fourth region may be determined.

First, a first motion vector searched in a first reference frame and a second motion vector searched in a second reference frame are received with respect to a target block to be decoded in the current frame (S1010). In each of the first reference frame and the second reference frame, a first prediction block and a second prediction block are generated based on the first motion vector and the second motion vector (S1020). In operation S1030, the first prediction block is divided into a plurality of areas according to a predetermined division method, and boundary information is generated. The first prediction block and the second prediction block are merged according to the boundary information to generate a third prediction block corresponding to the target block (S1040).

Here, the predetermined division method may be a method of dividing the first prediction block into a plurality of regions based on pixel values of the plurality of pixels constituting the first prediction block.

Meanwhile, the dividing operation (S1030) may be performed by dividing the first prediction block into a first region and a second region, and generating the third prediction block (S1040) according to a predetermined division method. A third prediction block may be generated by merging regions corresponding to the second region in the first region and the second prediction block of the first prediction block.

Here, after generating the third prediction block, horizontal and vertical filtering may be applied to boundaries between regions corresponding to the first region and the second region.

In the above description, the prediction block is generated by dividing the target block into two regions, but this is only an example. For example, the encoding apparatus may divide the target block into three regions and generate a moving vector for each region.

Meanwhile, the methods according to various embodiments of the present disclosure may be programmed and stored in various storage media. Accordingly, the method according to the various embodiments described above may be implemented in various types of encoding apparatuses and decoding apparatuses that execute a storage medium.

Specifically, a non-transitory computer readable medium may be provided in which a program for sequentially performing the above-described control method is stored.

The non-transitory readable medium refers to a medium that stores data semi-permanently and is readable by a device, not a medium storing data for a short time such as a register, a cache, a memory, and the like. Specifically, the various applications or programs described above may be stored and provided in a non-transitory readable medium such as a CD, a DVD, a hard disk, a Blu-ray disk, a USB, a memory card, a ROM, or the like.

In addition, although the preferred embodiment of the present invention has been shown and described above, the present invention is not limited to the above-described specific embodiment, the technical field to which the invention belongs without departing from the spirit of the invention claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

Claims

In the encoding method of the encoding device,

Dividing the target block of the current frame into a first region and a second region according to a preset division method;

Generating a first prediction block including a region corresponding to the first region by searching for a first motion vector for the first region in a first reference frame;

Dividing the first prediction block into a third region and a fourth region according to the preset division method, and generating boundary information;

Generating a second prediction block including a region corresponding to the fourth region by searching for a second motion vector of the fourth region corresponding to the second region in a second reference frame; And

And generating a third prediction block corresponding to the target block by merging the first prediction block and the second prediction block according to the boundary information.
The method of claim 1,

The predetermined division method,

And a method of dividing the target block into a plurality of areas based on pixel values of a plurality of pixels constituting the target block.
The method of claim 1,

Generating the third prediction block may include:

And a third prediction block is generated by merging regions corresponding to the fourth region in the third region and the second prediction block of the first prediction block based on the boundary information.
The method of claim 3,

After generating the third prediction block, applying horizontal and vertical filtering to a boundary between a region corresponding to the third region and the fourth region.
The method of claim 1,

Generating the first prediction block may include:

The first motion vector is searched for the first region in the first reference frame to generate a first prediction block corresponding to a region in which different weights are applied to pixel values constituting the first region and the second region. and,

Generating the second prediction block may include:

A second motion vector corresponding to the fourth area corresponding to the second area in the second reference frame, and corresponding to an area in which different weights are applied to pixel values constituting the third area and the fourth area, respectively. An encoding method for generating a second prediction block.
The method of claim 5,

Determine a weight to be applied to the first region and the second region based on pixel values constituting the first region and the second region,

And a weight to be applied to the third region and the fourth region based on pixel values constituting the third region and the fourth region.
In the decoding method of the decoding device,

Receiving, for a target block to be decoded in a current frame, a first motion vector searched in a first reference frame and a second motion vector searched in a second reference frame;

Generating a first prediction block and a second prediction block based on the first motion vector and the second motion vector in each of the first reference frame and the second reference frame;

Dividing the first prediction block into a plurality of regions and generating boundary information according to a predetermined division method; And

And generating a third prediction block corresponding to the target block by merging the first prediction block and the second prediction block according to the boundary information.
The method of claim 7, wherein

The predetermined division method,

And a method of dividing the first prediction block into a plurality of areas based on pixel values of a plurality of pixels constituting the first prediction block.
The method of claim 7, wherein

The dividing step,

According to the predetermined division method, the first prediction block is divided into a first region and a second region,

Generating the third prediction block may include:

And generating a third prediction block by merging regions corresponding to the second region from the first region and the second prediction block of the first prediction block based on the boundary information.
The method of claim 9,

After generating the third prediction block, applying horizontal and vertical filtering to boundaries between regions corresponding to the first region and the second region.
In the encoding device,

An interface for communicating with the decryption apparatus; And

The target block of the current frame is divided into a first region and a second region according to a preset division method, and a first motion vector for the first region is searched for in the first reference frame to search for the first region. Generating a first prediction block including a region corresponding to the first prediction block, dividing the first prediction block into a third region and a fourth region, generating boundary information, and generating a second reference frame according to the predetermined division method Generates a second prediction block including a region corresponding to the fourth region by searching for a second motion vector of the fourth region corresponding to the second region, and generates the first prediction block according to the boundary information. And generating a third prediction block corresponding to the target block by merging the second prediction block, and transmitting the first motion vector and the second motion vector to the decoding apparatus. And a processor for controlling the interface.
The method of claim 11,

The predetermined division method,

And a method of dividing the target block into a plurality of areas based on pixel values of a plurality of pixels constituting the target block.
The method of claim 11,

The processor,

And encoding the region corresponding to the fourth region in the third region and the second prediction block of the first prediction block to generate the third prediction block based on the boundary information.
The method of claim 13,

The processor,

And after generating the third prediction block, horizontal and vertical filtering are applied to boundaries between regions corresponding to the third region and the fourth region.
In the decoding device,

An interface for communicating with an encoding device; And

For the target block to be decoded in the current frame, when the first motion vector searched in the first reference frame and the second motion vector searched in the second reference frame are received from the encoding apparatus, the first reference frame And generating a first prediction block and a second prediction block in each of the second reference frames based on the first motion vector and the second motion vector, and according to a predetermined division method, a plurality of the first prediction blocks. And a processor configured to divide the data into regions of the region, generate boundary information, and merge the first prediction block and the second prediction block according to the boundary information to generate a third prediction block corresponding to the target block. Device.