WO2012081949A2

WO2012081949A2 - Method and apparatus for inter prediction

Info

Publication number: WO2012081949A2
Application number: PCT/KR2011/009772
Authority: WO
Inventors: 임성창; 김휘용; 정세윤; 조숙희; 김종호; 이하현; 이진호; 최진수; 김진웅; 안치득
Original assignee: 한국전자통신연구원
Priority date: 2010-12-17
Filing date: 2011-12-19
Publication date: 2012-06-21
Also published as: WO2012081949A3

Abstract

An inter prediction method according to the present invention comprises: a step for deriving reference motion information related to a unit to be decoded in a current picture; and a step for performing motion compensation for the unit to be decoded, using the reference motion information that has been derived. According to the present invention, image encoding/decoding efficiency can be enhanced.

Description

Inter prediction method and apparatus

The present invention relates to image processing, and more particularly, to an inter prediction method and apparatus.

Recently, as broadcasting services having high definition (HD) resolution have been expanded not only in Korea but also in the world, many users are accustomed to high resolution and high quality images, and many organizations are accelerating the development of next generation video equipment. In addition, as interest in Ultra High Definition (UHD), which has four times the resolution of HDTV, is increasing along with HDTV, a compression technology for higher resolution and higher quality images is required.

For image compression, an inter prediction technique for predicting a pixel value included in a current picture from a previous and / or subsequent picture in time, and for predicting a pixel value included in a current picture using pixel information in the current picture. An intra prediction technique, an entropy encoding technique of allocating a short code to a symbol with a high frequency of appearance and a long code to a symbol with a low frequency of appearance may be used.

An object of the present invention is to provide an image encoding method and apparatus for improving image encoding / decoding efficiency.

Another object of the present invention is to provide an image decoding method and apparatus for improving image encoding / decoding efficiency.

Another object of the present invention is to provide an inter prediction method and apparatus for improving image encoding / decoding efficiency.

Another object of the present invention is to provide a method and apparatus for encoding temporal motion information that can increase image encoding / decoding efficiency.

Another technical problem of the present invention is to provide a method and apparatus for decoding temporal motion information that can increase image encoding / decoding efficiency.

1. An embodiment of the present invention is an inter prediction method. The method includes deriving reference motion information for a decoding target unit in a current picture, and performing motion compensation on the decoding target unit using the derived reference motion information. Reference motion information is motion information included in a reference picture for the current picture, and includes a reference picture list, a reference picture index, a motion vector, and a prediction direction. And a motion vector predictor.

2. The method of claim 1, wherein the deriving of the reference motion information may further include extracting the reference motion information from the reference picture.

3. The method of claim 2, wherein the extracting the reference motion information comprises: counting the number of occurrences of each of the plurality of motion information in the reference picture, obtaining count information, and based on the obtained count information, the reference The method may further include selecting the reference motion information from among the plurality of motion information in the picture.

4. The method of claim 2, wherein the extracting of the reference motion information comprises: performing a median operation on motion information in the reference picture to derive a motion information median value and converting the motion information median value to the reference motion. The method may further include extracting the information.

5. The method of claim 2, wherein the extracting of the reference motion information may further include performing sub-sampling on the motion information in the reference picture.

6. The method of claim 5, wherein the performing of the subsampling includes selecting a block having a predetermined position among a plurality of second sized blocks included in the first sized block in the reference picture, and moving the corresponding block. The method may further include extracting the information as the reference motion information, and each motion information in the reference picture may be stored in units of the second size block.

7. The method of 6, wherein the predetermined position may be the leftmost upper position in the block of the first size.

8. The method of claim 2, wherein the extracting of the reference motion information comprises: grouping motion information in the reference picture into a plurality of groups and based on a frequency of occurrence of motion information in each of the plurality of groups. And selecting the number of pieces of motion information as the reference motion information, wherein in the grouping step, a depth value of a unit included in the reference picture, a size of a unit included in the reference picture, and The grouping may be performed based on at least one characteristic of a partition form of a unit included in the reference picture.

9. The method of claim 2, wherein the extracting of the reference motion information comprises: dividing the reference picture into a plurality of areas, and in each of the plurality of areas, a predetermined number of pieces of motion information are based on a frequency of occurrence of motion information. The method may further include selecting reference motion information.

10. The method of claim 2, wherein if the number of the reference pictures is two or more, the step of extracting the reference motion information may include selecting a predetermined number of motion information based on a frequency of occurrence of motion information in each of the reference pictures. It may further include.

11. The method of claim 10, wherein the step of extracting the reference motion information, for each of the reference picture, the step of deriving a temporal distance from the current picture and based on the derived temporal distance, the selected motion information Scaling may be further included.

12. The method of 1, wherein performing the motion compensation comprises: receiving and decoding a motion vector difference with respect to the decoding target unit, deriving a predicted motion vector with respect to the decoding target unit Deriving a motion vector for the decoding target unit using the decoded motion vector difference and the derived predicted motion vector, and using the derived motion vector, motion compensation for the decoding target unit. It may further comprise the step of performing.

13. The method of claim 12, wherein the predicting motion vector deriving step comprises: generating a motion vector candidate for the decoding target unit using the reference motion information and using the motion vector candidate, The method may further include deriving a predicted motion vector.

14. The method of claim 1, wherein the performing of motion compensation comprises: receiving and decoding a merge index, generating a merge candidate list using the reference motion information, and the merge candidate The method may further include selecting motion information indicated by the merge index from a merge candidate included in a list, and performing motion compensation on the decoding target unit by using the selected motion information.

15. The method of 1, wherein when the number of the reference motion information is 2 or more, the step of performing motion compensation includes: receiving and decoding an encoded motion information index; among the reference motion information, a motion indicated by the motion information index The method may further include selecting information and performing motion compensation on the decoding target unit by using the selected motion information.

16. The method of claim 1, wherein the deriving of the reference motion information may further include receiving encoded reference motion information and decoding the received reference motion information.

In 17. 16, when the number of the encoded reference motion information is two or more, in the decoding step, the received reference motion information may be decoded using differential pulse code modulation (DPCM).

According to the image encoding method according to the present invention, image encoding / decoding efficiency can be improved.

According to the image decoding method according to the present invention, the image encoding / decoding efficiency can be improved.

According to the inter prediction method according to the present invention, image encoding / decoding efficiency can be improved.

According to the temporal motion information encoding method according to the present invention, image encoding / decoding efficiency can be improved.

According to the temporal motion information decoding method according to the present invention, image encoding / decoding efficiency can be improved.

1 is a block diagram illustrating a configuration of an image encoding apparatus according to an embodiment of the present invention.

2 is a block diagram illustrating a configuration of an image decoding apparatus according to an embodiment of the present invention.

3 is a conceptual diagram schematically illustrating an embodiment in which one unit is divided into a plurality of sub-units.

4 is a flowchart schematically illustrating an embodiment of an inter prediction method in an encoder.

5 shows an embodiment of a reference picture division method.

FIG. 6 illustrates an embodiment of a reference picture used for inter prediction and / or motion compensation for an encoding target unit.

7 is a flowchart schematically illustrating an embodiment of an inter prediction method in a decoder.

EMBODIMENT OF THE INVENTION Hereinafter, embodiment of this invention is described concretely with reference to drawings. In describing the embodiments of the present specification, when it is determined that a detailed description of a related well-known configuration or function may obscure the gist of the present specification, the detailed description thereof will be omitted.

When a component is said to be “connected” or “connected” to another component, it may be directly connected to or connected to that other component, but it may be understood that another component may exist in between. Should be. In addition, the description "include" a specific configuration in the present invention does not exclude a configuration other than the configuration, it means that additional configuration may be included in the scope of the technical spirit of the present invention or the present invention.

Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

In addition, the components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, and do not mean that each component is made of separate hardware or one software component unit. In other words, each component is included in each component for convenience of description, and at least two of the components may be combined into one component, or one component may be divided into a plurality of components to perform a function. Integrated and separate embodiments of the components are also included within the scope of the present invention without departing from the spirit of the invention.

In addition, some of the components may not be essential components for performing essential functions in the present invention, but may be optional components for improving performance. The present invention can be implemented including only the components essential for implementing the essentials of the present invention except for the components used for improving performance, and the structure including only the essential components except for the optional components used for improving performance. Also included in the scope of the present invention.

Referring to FIG. 1, the image encoding apparatus 100 may include a motion predictor 111, a motion compensator 112, an intra predictor 120, a switch 115, a subtractor 125, and a converter 130. And a quantization unit 140, an entropy encoding unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference picture buffer 190.

The image encoding apparatus 100 may encode an input image in an intra mode or an inter mode and output a bitstream. Intra prediction means intra prediction and inter prediction means inter prediction. In the intra mode, the switch 115 may be switched to intra, and in the inter mode, the switch 115 may be switched to inter. The image encoding apparatus 100 may generate a prediction block for an input block of an input image and then encode a residual between the input block and the prediction block.

In the intra mode, the intra predictor 120 may generate a prediction block by performing spatial prediction using pixel values of blocks that are already encoded around the current block.

In the inter mode, the motion predictor 111 may obtain a motion vector by searching for a region that best matches an input block in the reference image stored in the reference picture buffer 190 during the motion prediction process. The motion compensator 112 may generate a prediction block by performing motion compensation using the motion vector. Here, the motion vector is a two-dimensional vector used for inter prediction, and may indicate an offset between the current encoding / decoding target picture and the reference picture.

The subtractor 125 may generate a residual block by the difference between the input block and the generated prediction block. The transform unit 130 may output a transform coefficient by performing a transform on the residual block. The quantization unit 140 may output the quantized coefficient by quantizing the input transform coefficient according to the quantization parameter.

The entropy encoder 150 may output a bit stream by performing entropy encoding based on the values calculated by the quantizer 140 or the encoding parameter values calculated in the encoding process.

When entropy encoding is applied, a small number of bits are assigned to a symbol having a high probability of occurrence and a large number of bits are assigned to a symbol having a low probability of occurrence, thereby representing bits for encoding symbols. The size of the heat can be reduced. Therefore, compression performance of image encoding may be increased through entropy encoding. The entropy encoder 150 may use an encoding method such as exponential golomb, context-adaptive variable length coding (CAVLC), or context-adaptive binary arithmetic coding (CABAC) for entropy encoding.

Since the image encoding apparatus according to the embodiment of FIG. 1 performs inter prediction encoding, that is, inter prediction encoding, the currently encoded image needs to be decoded and stored to be used as a reference image. Accordingly, the quantized coefficients are inversely quantized by the inverse quantizer 160 and inversely transformed by the inverse transformer 170. The inverse quantized and inverse transformed coefficients are added to the prediction block by the adder 175 and a reconstruction block is generated.

The reconstruction block passes through the filter unit 180, and the filter unit 180 applies at least one or more of a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) to the reconstruction block or the reconstruction picture. can do. The filter unit 180 may be referred to as an adaptive in-loop filter. The deblocking filter can remove block distortion generated at the boundary between blocks. SAO can add an appropriate offset to the pixel value to compensate for coding errors. The ALF may perform filtering based on a value obtained by comparing the reconstructed image with the original image. The reconstructed block that has passed through the filter unit 180 may be stored in the reference picture buffer 190.

Referring to FIG. 2, the image decoding apparatus 200 may include an entropy decoder 210, an inverse quantizer 220, an inverse transformer 230, an intra predictor 240, a motion compensator 250, and an adder ( 255, a filter unit 260, and a reference picture buffer 270.

The image decoding apparatus 200 may receive a bitstream output from the encoder and perform decoding in an intra mode or an inter mode, and output a reconstructed image, that is, a reconstructed image. In the intra mode, the switch may be switched to intra, and in the inter mode, the switch may be switched to inter. The image decoding apparatus 200 may obtain a residual block from the input bitstream, generate a prediction block, and then add the residual block and the prediction block to generate a reconstructed block, that is, a reconstruction block.

The entropy decoder 210 may entropy decode the input bitstream according to a probability distribution to generate symbols including symbols in the form of quantized coefficients. The entropy decoding method is similar to the entropy coding method described above.

When the entropy decoding method is applied, a small number of bits are allocated to a symbol having a high probability of occurrence and a large number of bits are allocated to a symbol having a low probability of occurrence, whereby the size of the bit string for each symbol is increased. Can be reduced. Therefore, the compression performance of image decoding can be improved through an entropy decoding method.

The quantized coefficient is inversely quantized by the inverse quantizer 220 and inversely transformed by the inverse transformer 230, and as a result of the inverse quantization / inverse transformation of the quantized coefficient, a residual block may be generated.

In the intra mode, the intra predictor 240 may generate a prediction block by performing spatial prediction using pixel values of blocks that are already encoded around the current block. In the inter mode, the motion compensator 250 may generate a predictive block by performing motion compensation using the reference image stored in the motion vector and the reference picture buffer 270.

The residual block and the prediction block may be added through the adder 255, and the added block may pass through the filter unit 260. The filter unit 260 may apply at least one or more of the deblocking filter, SAO, and ALF to the reconstructed block or the reconstructed picture. The filter unit 260 may output a reconstructed image, that is, a reconstructed image. The reconstructed picture may be stored in the reference picture buffer 270 and used for inter prediction.

Hereinafter, a unit means a unit of image encoding and decoding. When encoding or decoding an image, a coding or decoding unit refers to a divided unit when an image is divided and encoded or decoded. Thus, a block, a coding unit (CU), an encoding block, and a prediction unit (PU) are used. , A prediction block, a transform unit (TU), a transform block, and so on. One unit may be further divided into smaller sub-units.

Here, the prediction unit refers to a basic unit that is a unit of performing prediction and / or motion compensation. The prediction unit may be divided into a plurality of partitions, and each partition may be called a prediction unit partition. When the prediction unit is divided into a plurality of partitions, each of the plurality of partitions may be a basic unit that is a unit of performing prediction and / or motion compensation. Hereinafter, in the embodiment of the present invention, each partition in which the prediction unit is divided may also be called a prediction unit.

Meanwhile, as described above, in the inter mode, the encoder and the decoder may perform inter prediction and / or motion compensation on the encoding / decoding target unit. Here, the encoding / decoding target unit may mean a prediction unit and / or a prediction unit partition. In this case, the encoder and the decoder may improve the encoding / decoding efficiency by using a motion vector of a reconstructed neighbor unit and / or a collocated unit. Here, the reconstructed neighboring unit is a neighboring unit adjacent to the encoding / decoding target unit or located at a corner of the encoding / decoding target unit, and may mean a unit that has already been encoded or decoded. In addition, the same position unit may mean a unit that exists at the same spatial position as the encoding / decoding target unit in the reconstructed reference picture. Hereinafter, the motion vector of the unit included in the reference picture is referred to as a temporal motion vector. For example, the motion vector of the same location unit may be called a temporal motion vector.

For example, the encoder and the decoder may use the reconstructed motion vector and / or temporal motion vector as the motion vector of the encoding / decoding target unit. In this case, since the motion vector and the temporal motion vector of the reconstructed neighboring unit are used in the encoding / decoding target unit, the encoder may not encode the motion vector for the encoding / decoding target unit. Therefore, the amount of bits transmitted to the decoder can be reduced, and the coding efficiency can be improved. The inter prediction mode may include a skip mode and / or a direct mode.

In this case, the encoder may use an identifier and / or an index indicating whether one of the reconstructed neighboring units uses a motion vector. The inter prediction mode in which the identifier and / or index is used may be called a merge mode.

As another example, the encoder and the decoder may perform prediction and / or compensation by using the motion vector of the encoding / decoding target unit and then predict the encoding / decoding target unit when encoding the motion vector of the encoding / decoding target unit. A predicted motion vector can be used. Here, the predicted motion vector may be a motion vector or a temporal motion vector of the reconstructed neighboring unit. That is, the encoder and the decoder can efficiently encode the motion vector of the encoding / decoding target unit by using the reconstructed motion vector or temporal motion vector as the predictive motion vector.

The encoder may generate a motion vector difference by the difference between the motion vector of the encoding target unit and the predictive motion vector. Here, the motion vector difference may mean a difference value between the motion vector of the encoding target unit and the predictive motion vector. The encoder may encode the generated motion vector difference and transmit the encoded motion vector difference to the decoder. In this case, the decoder may decode the motion vector difference and derive the motion vector of the decoding target unit through the sum of the decoded motion vector difference and the predictive motion vector. Such an inter prediction method may be referred to as a motion vector prediction (MVP). By using MVP, the amount of information transmitted from the encoder to the decoder can be reduced and the coding efficiency can be improved.

In this case, the encoder may use an identifier and / or an index indicating which unit of the reconstructed neighboring unit is used. MVP, in which the identifier and / or index is additionally used, may be called Advanced Motion Vector Prediction (AMVP).

In the above-described skip mode, direct mode, merge mode, MVP, AMVP, etc., the motion information in the reference picture may be used for prediction and / or motion compensation of the current encoding / decoding target unit. Motion information in the reference picture used for prediction and / or motion compensation of the current encoding / decoding target unit may be referred to as temporal motion information. The temporal motion information may include, for example, a temporal motion vector.

Here, motion information refers to coding parameters used for inter prediction and motion compensation. The encoding parameter may include information that may be inferred in the encoding or decoding process as well as information encoded by the encoder and transmitted to the decoder, such as a syntax element, and refers to information required when encoding or decoding an image. do. The motion information may include at least one of a reference picture list, a reference picture index, a motion vector, a prediction direction, and a motion vector predictor. have.

Here, the reference picture list is a list consisting of a plurality of reference pictures used for inter prediction. Two reference picture lists may be used for inter prediction, one may be referred to as reference picture list 0 and the other may be referred to as reference picture list 1. In this case, the prediction direction included in the motion information may be information indicating which reference picture list is used for inter prediction. That is, the prediction direction may indicate whether reference picture list 0 is used, reference picture list 1 is used, or whether both reference picture list 0 and reference picture list 1 are used.

The reference picture index is an index indicating a reference picture used for inter prediction of the encoding / decoding target unit among the reference pictures included in the reference picture list. In addition, the motion vector predictor may mean a motion vector of a unit that is a prediction candidate and / or a unit that is a prediction candidate when the encoder and the decoder predict the motion vector.

The above-described encoding parameter may include not only motion information but also values and / or statistics such as an inter prediction mode, a coded block pattern (CBP), a block size, block partition information, and the like. Here, the block division information may include information about a depth of the unit. The depth information may indicate the number and / or degree of division of the unit.

One unit may be hierarchically divided with depth information based on a tree structure. Each divided subunit may have depth information. Since the depth information indicates the number and / or degree of division of the unit, the depth information may include information about the size of the sub-unit.

Referring to 310 of FIG. 3, the highest node may be called a root node and may have the smallest depth value. At this time, the highest node may have a depth of level 0 and may represent the first unit that is not divided.

A lower node having a depth of level 1 may indicate a unit in which the first unit is divided once, and a lower node having a depth of level 2 may indicate a unit in which the first unit is divided twice. For example, unit 320 corresponding to node a in 320 of FIG. 3 may be a unit divided once in an initial unit and may have a depth of level 1. FIG.

A leaf node of level 3 may indicate a unit in which the first unit is divided three times. For example, the unit d corresponding to the node d in 320 of FIG. 3 may be a unit divided three times in the first unit and may have a depth of level 3. FIG. Thus, the leaf node at level 3, which is the lowest node, may have the deepest depth.

As described above, when the encoder and the decoder perform inter prediction and / or motion compensation using skip mode, direct mode, merge mode, MVP, AMVP, and the like, temporal motion information (for example, temporal motion vector) Can be used. However, the inter prediction method using temporal motion information has a disadvantage in that a reference picture for a picture to be encoded / decoded must be stored in a memory. In addition, when the reference picture is lost, since motion information (eg, a temporal motion vector) in the reference picture cannot be used properly, an error may propagate as the encoding / decoding process proceeds. Therefore, the inter prediction method using temporal motion information may also have disadvantages in terms of error resiliency. Therefore, there is a need for an inter prediction method capable of efficiently encoding / decoding motion information in a reference picture and improving error robustness.

Referring to FIG. 4, the encoder may extract motion information of a reference picture with respect to the current picture (S410). In this case, the encoder may extract N pieces of motion information among the motion information included in the reference picture. The extracted motion information may be used for inter prediction and / or motion compensation of a coding target unit in a current picture. Here, N represents a positive integer, and in the embodiments described below, N means a positive integer.

Hereinafter, embodiments of a method of extracting motion information of a reference picture are described. The encoder may extract motion information of the reference picture by using at least one of the motion information extraction methods described below.

In an embodiment of the motion information extraction method, the encoder may extract the motion information according to the frequency of occurrence of the motion information in the reference picture. At this time, for example, the encoder may select and extract N pieces of motion information in order of occurrence frequency among the motion information in the reference picture. Table 1 below shows an embodiment of a method for extracting motion information according to the frequency of occurrence of motion information.

TABLE 1

Referring to Table 1, in the encoding process, the encoder may count the number of motion vector occurrences in the reference picture to obtain count information. In this case, the encoder may select N motion vectors in order of occurrence frequency. In the embodiment of Table 1, when N is 3, the extracted motion vector may be [0,0], [1,0], [0, -1].

As another example, the encoder may obtain count information for each component of the motion vector, not the motion vector itself, in the encoding process. Here, since the motion vector has a two-dimensional vector form and can be expressed as [x, y], each component of the motion vector may mean an x component and a y component. In this case, the encoder may select N components in order of occurrence frequency for each motion vector component. The encoder may extract N motion vectors using the selected motion vector component.

In the above-described embodiment, the encoder may quantize the motion information of the reference picture and then count the frequency of occurrence of the quantized motion information to obtain count information. For example, the encoder may quantize a motion vector of 1/4 pixel unit into a motion vector of integer pixel unit and count the frequency of occurrence of the quantized motion vector in the encoding process. In this case, the encoder may select and / or extract N motion vectors from the quantized motion vectors in order of occurrence frequency.

In this case, the encoder may perform quantization on the motion information according to the quantization step size. The information about the quantization step size may be stored in the encoder and the decoder in the same manner. In this case, since the decoder may know the quantization step size used in the encoder, the encoder may not transmit information about the quantization step size to the decoder. If the decoder does not have information about the quantization step size, the encoder may encode the information about the quantization step size and transmit it to the decoder through a bitstream. The decoder can decode the transmitted quantization step size information and use it for quantization of the motion information.

In another embodiment of the motion information extraction method, the encoder may extract temporal motion information by performing a predetermined process on the plurality of motion information included in the reference picture.

For example, the encoder may extract temporal motion information by performing a median operation on the plurality of motion information included in the reference picture. For example, assume that there are three motion vectors [0,0], [-3,5], and [-4,2] in the reference picture. In this case, the encoder may extract one motion vector [-3, 2] by performing an intermediate value operation on each component of the motion vector.

As another example, the encoder may extract N pieces of motion information by performing sub-sampling on a plurality of pieces of motion information included in a reference picture. For example, assume that motion vectors in a reference picture are arranged in a two-dimensional form as shown in Table 2 below.

TABLE 2

Referring to Table 2, the encoder may extract motion vectors existing in odd-numbered rows and odd-numbered columns in a two-dimensional motion vector array through subsampling. In this case, four motion vectors may be extracted, such as [-2, 4], [0, -1], [5, -1], and [-2, 1].

As another example of a method of extracting N pieces of motion information by performing subsampling on a plurality of pieces of motion information included in a reference picture, the encoder may extract a motion vector corresponding to a specific position in a two-dimensional motion vector array. have.

For example, when the motion vectors in the reference picture are arranged in a two-dimensional form as shown in Table 2, the encoder may extract a motion vector corresponding to a specific position in the two-dimensional motion vector array.

For example, the motion vectors may be stored in a two-dimensional motion vector array in units of 4 × 4 blocks. In this case, the encoder may select a block corresponding to a predetermined position from among 4x4 size blocks included in the 16x16 size block, and extract a motion vector corresponding to the selected block. Here, the predetermined position may be, for example, the leftmost upper position in the block of 16 × 16 size. In the embodiment of Table 2, the predetermined position may be a position corresponding to the motion vector of [-2, 4]. In this case, the encoder may extract a motion vector of [-2, 4].

In another embodiment of the motion information extraction method, the encoder may classify and / or group motion information in the reference picture into a plurality of groups based on characteristics of a unit included in the reference picture. The characteristics of the unit may include the depth of the unit, the size of the unit, and / or the partition form of the unit. In this case, the encoder may extract M motion information for each group according to a frequency of occurrence (hereinafter, M is a positive integer) and extract a total of N motion information.

For example, when depth values of units exist from 0 to P-1 (hereinafter, P is a positive integer) in a reference picture, the encoder may convert motion information in the reference picture into P groups based on the depth value of the unit. Can be classified. In this case, the encoder may obtain count information by counting the number of times the motion vector is generated for each group in the encoding process. That is, the encoder may obtain count information of the motion vector for each depth of the unit. The encoder may extract N motion vectors by selecting M motion vectors in order of occurrence frequency for each group based on the count information. Here, N may be M * P.

As another example, when a P-type unit exists in the reference picture, the encoder may classify motion information in the reference picture into P groups based on the size of the unit. In this case, the encoder may obtain count information by counting the number of times the motion vector is generated for each group in the encoding process. That is, the encoder can obtain count information of the motion vector for each unit size. The encoder may extract N motion vectors by selecting M motion vectors in order of occurrence frequency for each group based on the count information. Here, N may be M * P.

As another example, when there is a P type partition type in the reference picture, the encoder may classify motion information in the reference picture into P groups based on the partition type. In this case, the encoder may obtain count information by counting the number of times the motion vector is generated for each group in the encoding process. That is, the encoder can obtain count information of the motion vector for each partition type. The encoder may extract N motion vectors by selecting M motion vectors in order of occurrence frequency for each group based on the count information. Here, N may be N * P. Here, the partition of the unit may mean a basic unit used for inter prediction and motion compensation, and may have a size of L * K (L and K are positive integers).

In another embodiment of the motion information extraction method, the encoder may split the reference picture into a plurality of regions. Here, each of the plurality of regions may be a region divided by a slice, and the plurality of regions may have different motion vector generation distributions. In this case, the encoder may extract M (hereinafter, M is a positive integer) motion information for each region according to a frequency of occurrence, and extract a total of N temporal motion information.

5 shows an embodiment of a reference picture division method. Referring to FIG. 5, a reference picture may be divided into a first region 510 and a second region 520. Table 3 below shows an embodiment of a method for extracting motion information according to the frequency of occurrence of motion information when the reference picture is divided into a first area 510 and a second area 520.

TABLE 3

Referring to Table 3, the encoder may obtain count information by counting the number of motion vector generation for each region in the encoding process. That is, the encoder may obtain count information by counting the number of motion vector occurrences for each of the first region 510 and the second region 520. At this time, the encoder may extract N motion vectors by selecting M motion vectors in order of occurrence frequency for each region based on the count information. For example, when the reference picture is divided into two regions and M is 3, the number of extracted motion vectors may be six in total.

In another embodiment of the motion information extraction method, when a plurality of reference pictures are used for a coding target unit and / or a current picture, the encoder extracts M motion information for each reference picture according to a frequency of occurrence. A total of N temporal motion information can be extracted.

FIG. 6 illustrates an embodiment of a reference picture used for inter prediction and / or motion compensation for an encoding target unit. Referring to FIG. 6, the current picture 630 may include an encoding target unit. In this case, the first reference picture 610 and the second reference picture 620 may be used for inter prediction of the encoding target unit. Table 4 below shows an embodiment of a method for extracting motion information according to the frequency of occurrence of motion information when a plurality of reference pictures are used.

TABLE 4

Referring to Table 4, in the encoding process, the encoder may count the number of motion vector occurrences for each reference picture to obtain count information. That is, the encoder may obtain count information by counting the number of motion vector occurrences for each of the first reference picture 610 and the second reference picture 620. In this case, the encoder may extract M total motion vectors by selecting M motion vectors in order of occurrence frequency, for each reference picture, based on the count information. For example, if two reference pictures are used and M is 3, the number of extracted motion vectors may be six in total.

In another embodiment of the motion information extraction method, when a plurality of reference pictures are used for the encoding target unit and / or the current picture, the encoder selects M pieces of motion information for each reference picture and then scales the selected motion information. N pieces of temporal motion information may be extracted by scaling.

In this case, for example, the encoder may calculate a temporal distance from the current picture for each reference picture, and perform scaling using the calculated temporal distance. The temporal distance may be a distance determined based on the display order when the current picture and the plurality of reference pictures are listed in a display order.

For example, the encoder may acquire count information by counting the number of motion vector occurrences for each reference picture in the encoding process. That is, the encoder may count the number of motion vector occurrences for each reference picture to obtain count information. In this case, the encoder may select N motion vectors in total in order of occurrence frequency, for each reference picture, based on the count information, and select a total of N motion vectors.

The encoder may calculate, for each reference picture, a temporal distance from the current picture. The encoder may perform scaling on the selected motion vector using the calculated temporal distance, and extract the scaled motion vector as a temporal motion vector for the current picture and / or the current coding unit. In this case, when the plurality of motion vectors are the same among the scaled motion vectors, the encoder may extract only one motion vector from the same motion vector as the temporal motion vector.

For example, it is assumed that two reference pictures are used, and each of the reference pictures is a first reference picture and a second reference picture. In addition, it is assumed that the first temporal distance between the current picture and the first reference picture is 2 and the second temporal distance between the current picture and the second reference picture is 4.

If the motion vectors selected in the first reference picture are [0, 2], [1, 1] and the motion vectors selected in the second reference picture are [0, 4], [2, 4], the selected motion vector is [0. , 2], [1, 1], [0, 4], [2, 4]. In this case, the encoder may scale the motion vector selected from the first reference picture and / or the motion vector selected from the second reference picture based on the first temporal distance and the second temporal distance. For example, when the motion vectors [0, 4], [2, 4] selected in the second reference picture are scaled, the scaled motion vectors may be [0, 2], [1, 2]. At this time, since [0, 2] of the scaled motion vectors is the same as the motion vector selected from the first reference picture, the temporal motion vectors finally extracted are {[0, 2], [1, 1], [1, 2]}.

Meanwhile, the decoder may extract motion information of the reference picture by using the same method as the motion information extraction method used in the encoder. In this case, the encoder may not transmit the extracted motion information to the decoder. On the other hand, the above-described motion information extraction process may be performed only in the encoder, not the decoder. In this case, the encoder may encode the extracted motion information and transmit the encoded motion information to the decoder.

Referring to FIG. 4 again, the encoder may perform inter prediction and / or motion compensation on the encoding target unit in the current picture by using the extracted temporal motion information (S420).

For example, the encoder may use the extracted temporal motion information in performing a motion vector prediction (MVP) and / or an advanced motion vector prediction (AMVP) for the encoding target unit. In this case, the encoder may use the temporal motion vector extracted from the reference picture as one of motion vector candidates. When motion vector prediction and / or AMVP is applied, the encoder may perform inter prediction and / or motion compensation using a block matching algorithm, a skip mode, or a direct mode.

The block matching algorithm may mean an algorithm for determining a reference unit for the encoding target unit among the reconstructed units in the reference picture. The encoder may determine the reference unit for the encoding target unit from the reconstructed units in the reference picture by using the motion vector of the encoding target unit. In this case, the encoder may perform inter prediction and / or motion compensation on the encoding target unit by using the determined reference unit.

In the skip mode and the direct mode, the motion vector and temporal motion vector of the reconstructed peripheral unit may be used as the motion vector of the encoding target unit, and the reference picture index of the reconstructed peripheral unit may be used as the reference picture index of the encoding target unit. have. In the direct mode, a residual signal for the current encoding target unit may be encoded and transmitted to the decoder. However, since the residual signal may not exist in the skip mode, the encoder may not encode the residual signal.

As another example, the encoder may perform inter prediction and / or motion compensation using a merge mode. In the merge mode, the encoder may perform inter prediction and / or motion compensation by using at least one of a motion vector and a temporal motion vector of the reconstructed neighboring unit as the motion vector of the encoding target unit. In this case, the encoder may use the extracted temporal motion vector to derive the motion vector of the encoding target unit. For example, the encoder may use the extracted temporal motion vector as one of merge candidates included in a merge candidate list. That is, the encoder may generate a merge candidate list using the extracted temporal motion vector.

In this case, the encoder may encode the merge index and transmit the encoded index to the decoder. Here, the merge index may be an index indicating which candidate among merge candidates included in the merge candidate list is used for inter prediction and motion compensation of the encoding target unit. The decoder may receive and decode the merge index, and generate a merge candidate list in the same manner as the encoder. In this case, the decoder may derive motion information used for inter prediction and motion compensation of the decoding target unit by using the generated merge candidate list and the decoded merge index.

In addition, the residual signal for the encoding target unit may not exist in the merge mode. In this case, the encoder may not encode the residual signal, and such an encoding mode may be called a merge skip mode.

Meanwhile, the temporal motion information extracted from the reference picture may be two or more. In this case, the encoder may select one of the extracted plurality of temporal motion information and use the selected temporal motion information for inter prediction and / or motion compensation for the encoding target unit. In this case, the encoder may select the optimal temporal motion information by a rate-distortion optimization (RDO) method. Here, the rate-distortion optimization method may mean a method of selecting an optimal coding scheme in terms of distortion and distortion.

For example, the encoder may calculate a rate-distortion cost when encoding is performed for each of a plurality of temporal motion vectors. In this case, the encoder may select one temporal motion vector having a minimum rate-distortion cost value. The encoder may use the selected temporal motion vector in performing inter prediction and / or motion compensation. In addition, the encoder may encode a motion vector index for the selected temporal motion vector, and the encoded motion vector index may be included in a bitstream and transmitted to the decoder.

The encoder may use temporal motion information extracted from the reference picture to encode motion information of the encoding target unit. In this case, the temporal motion information extracted from the reference picture may be used as a prediction value for the motion information of the encoding target unit. Hereinafter, the predicted value for the motion information of the encoding target unit is called predictive motion information, and the predicted value for the motion vector of the encoding target unit is called a predicted motion vector.

For example, the encoder may use the temporal motion vector extracted from the reference picture as a prediction motion vector for the encoding target unit. The encoder may obtain a motion vector difference based on the difference between the motion vector of the encoding target unit and the predicted motion vector. In this case, the encoder may encode the obtained motion vector difference, and the encoded motion vector difference may be included in a bitstream and transmitted to the decoder. Equation 1 below shows an embodiment of a motion vector difference calculation method.

[Equation 1]

motion_vector_difference = motion_vector-extracted_motion_vector

Here, motion_vector_difference may indicate a motion vector difference. In addition, motion_vector may represent a motion vector of the encoding target unit, and extracted_motion_vector may represent a temporal motion vector extracted from the reference picture.

Referring back to FIG. 4, the encoder may encode motion information extracted from a reference picture and / or information related to the motion information (S430). The encoded information may be included in a sequence parameter set (SPS), a picture parameter set (PPS), or a slice header of the encoding target unit and transmitted to the decoder.

According to an embodiment, the encoder may encode a motion information encoding indicator indicating whether motion information extracted from a reference picture is encoded and transmitted to the decoder. In this case, the motion information encoding indicator may indicate whether the extracted motion information is used for inter prediction and motion compensation. The encoded motion information encoding indicator may be included in the bitstream and transmitted to the decoder.

For example, the motion information encoding indicator may be represented by a syntax element called coded_motion_vector_present_flag. The syntax element may be encoded in a picture parameter set or slice header.

If the value of coded_motion_vector_present_flag is 1, the encoder may encode the extracted temporal motion information and transmit the encoded temporal motion information to the decoder. At this time, the decoder may decode the transmitted temporal motion information. The decoder may perform inter prediction and motion compensation on a decoding target unit by using the decoded temporal motion information. If the value of coded_motion_vector_present_flag is 0, the encoder may not code the extracted temporal motion information. In this case, the decoder may not use the extracted temporal motion information in performing inter prediction and motion compensation.

In addition, the encoder may encode the motion information value extracted from the reference picture. The encoded motion information value may be included in the bitstream and transmitted to the decoder.

When a plurality of motion vectors are extracted from the reference picture, the encoder may encode the extracted motion vector values using difference pulse code modulation (DPCM). In this case, the encoder may perform prediction for each motion vector by using the DPCM.

For example, assume that two motion vectors are extracted from the reference picture. The extracted motion vectors are called first motion vectors and second motion vectors, respectively. Here, the first motion vector may be represented by extracted_motion_vector_1 and the second motion vector may be represented by extracted_motion_vector_2. When DPCM is used, the first motion vector value may be used as a prediction value for the second motion vector value. In this case, when the encoder encodes the second motion vector value, the encoder may obtain a motion vector difference with respect to the second motion vector by using the prediction value (first motion vector value). This can be represented by Equation 2 as an example.

[Equation 2]

motion_vector_difference_2

= extracted_motion_vector_2-extracted_motion_vector_1

Here, motion_vector_difference_2 may represent a motion vector difference with respect to the second motion vector. When the motion vector difference is derived by the above-described method, the encoder may encode the derived motion vector difference value and transmit the encoded motion vector difference value to the decoder.

When there is only one motion information extracted from the reference picture, the encoder may encode the extracted motion information value itself without performing prediction on the extracted motion information. In this case, the encoded motion information value may be included in the bitstream and transmitted to the decoder.

Meanwhile, as described above, the number of motion information extracted from the reference picture may be two or more. When the number of extracted motion information is two or more, the encoder may select one of the extracted plurality of motion information and use the inter prediction and / or motion compensation for the encoding target unit. In this case, the encoder may encode a motion information index indicating which motion information of the extracted plurality of motion information is used. The encoded motion information index may be included in the bitstream and transmitted to the decoder.

Table 5 below shows an embodiment of the extracted motion vectors when the number of motion vectors extracted from the reference picture is two or more.

TABLE 5

Referring to Table 5, each motion vector may be assigned a motion information index. For example, when [0, -1] of the extracted plurality of motion vectors is used, the encoder may encode and transmit the motion information index value 2 to the decoder. In this case, the decoder may derive a motion vector used for inter prediction and motion compensation by using the transmitted motion information index.

When there is one piece of motion information extracted from the reference picture, the encoder may not encode the motion information index.

As will be described later, the decoder can extract N (N is a positive integer) temporal motion information in the same manner as the encoder. The extracted temporal motion information may be used for inter prediction and motion compensation for a decoding target unit. In this case, the encoder may not encode the temporal motion information value and / or the motion information index extracted from the reference picture.

According to the above-described inter prediction method, the encoder can efficiently encode temporal motion information during image encoding. In addition, since the encoder may not store the motion information in the reference picture in the memory, the memory requirement and the memory bandwidth are reduced and the error robustness is reduced in inter prediction and motion compensation for the encoding target unit. Can be improved. Therefore, the overall image coding efficiency can be improved.

Referring to FIG. 7, the decoder may extract motion information of a reference picture with respect to the current picture (S710).

The decoder may extract N pieces of motion information among the motion information included in the reference picture. The extracted motion information may be used for inter prediction and / or motion compensation of a decoding target unit in a current picture. Here, N represents a positive integer, and in the embodiments described below, N means a positive integer.

The decoder may extract motion information of the reference picture by using the same method as the motion information extraction method used in the encoder. In this case, the decoder may extract the same motion information as the temporal motion information extracted by the encoder. Embodiments of the motion information extraction method have been described above with reference to FIG.

Meanwhile, as described above, the encoder may encode the motion information value extracted from the reference picture, and the encoded motion information value may be included in the bitstream and transmitted to the decoder. In this case, since the decoder may derive motion information of the reference picture from the transmitted bitstream, the decoder may not perform the motion information extraction process.

Referring to FIG. 7 again, the decoder may decode motion information and / or information related to the motion information of the reference picture transmitted from the encoder (S720). The decoder may decode the information in a sequence parameter set (SPS), a picture parameter set (PPS), or a slice header.

According to an embodiment, the decoder may decode a motion information encoding indicator indicating whether temporal motion information extracted from the encoder is encoded and transmitted to the decoder. In this case, the motion information encoding indicator may indicate whether temporal motion information extracted by the encoder is used for inter prediction and motion compensation.

For example, the motion information encoding indicator may be represented by a syntax element called coded_motion_vector_present_flag. The syntax element may be decoded in a picture parameter set or slice header.

When the value of coded_motion_vector_present_flag is 1, the encoder may encode temporal motion information extracted from the reference picture and transmit the encoded motion information to the decoder. At this time, the decoder may decode the transmitted temporal motion information. The decoder may perform inter prediction and motion compensation on a decoding target unit by using the decoded temporal motion information. When the value of coded_motion_vector_present_flag is 0, the encoder may not code temporal motion information extracted from the reference picture. In this case, the decoder may not decode the extracted temporal motion information, and may not use the extracted temporal motion information in performing inter prediction and motion compensation.

In addition, as described above, the encoder may encode the motion information value extracted from the reference picture and transmit the encoded information to the decoder. At this time, the decoder may receive and decode the transmitted motion information value.

When a plurality of motion vectors are extracted from the reference picture, the encoder predicts the extracted motion vector value using DPCM, and then encodes the difference between the predicted motion vector value and the extracted motion vector value, that is, the motion vector difference. have. The encoded motion vector difference may be transmitted to the decoder, and the decoder may derive temporal motion information using the transmitted motion vector difference.

For example, assume that two motion vectors are encoded in the encoder. The encoded motion vectors are called first motion vectors and second motion vectors, respectively. Here, the first motion vector may be represented by extracted_motion_vector_1 and the second motion vector may be represented by extracted_motion_vector_2. When DPCM is used, the first motion vector value may be used as a prediction value for the second motion vector value. At this time, as described above, the encoder may transmit the motion vector difference with respect to the second motion vector to the decoder. Here, the motion vector difference with respect to the second motion vector may be represented by motion_vector_difference_2.

The decoder may decode the first motion vector value. In addition, the decoder may decode a motion vector difference with respect to the second motion vector, and derive a second motion vector value by adding the decoded motion vector difference to the first motion vector value. This can be represented by Equation 3 below as an example.

[Equation 3]

extracted_motion_vector_2

= motion_vector_difference_2 + extracted_motion_vector_1

If there is only one motion information extracted from the reference picture, the encoder may encode the extracted motion information value itself and transmit the encoded motion information value itself to the decoder without performing prediction on the extracted motion information. In this case, the decoder may decode the transmitted motion information value itself without performing prediction on the motion information.

Meanwhile, the number of motion information extracted from the reference picture may be two or more. The motion information may be extracted by the encoder and transmitted to the decoder or may be extracted by the decoder. In this case, the decoder may select one of the plurality of motion information and use the inter prediction and / or motion compensation for the decoding target unit.

As described above, when the number of motion information extracted from the reference picture is two or more, the encoder may encode the motion information index and transmit it to the decoder. Here, the motion information index is an index indicating which motion information of the extracted plurality of motion information is used. In this case, the decoder may decode the received motion information index. Since the decoder may have a plurality of motion information identical to the motion information extracted by the encoder, the decoder may select motion information used for inter prediction and motion compensation using the decoded motion information index.

Table 6 below shows an embodiment of the extracted motion vectors when the number of motion vectors extracted from the reference picture is two or more.

TABLE 6

Referring to Table 6, each motion vector may be assigned a motion information index. For example, if the motion information index value transmitted from the encoder is 2, the decoder may use the motion vector [0, -1] assigned the index value of 2 for inter prediction and motion compensation.

When there is one piece of motion information extracted from the reference picture, the encoder may not encode the motion information index. In this case, since the encoder does not transmit the motion information index, the decoder may not decode the motion information index.

In addition, as described above, the decoder may extract N pieces of temporal motion information in the same manner as the encoder. The extracted temporal motion information may be used for inter prediction and motion compensation for a decoding target unit. In this case, the encoder may not encode the temporal motion information value and / or the motion information index extracted from the reference picture. In this case, the decoder may not decode the motion information value and the motion information index. That is, the above-described decoding process such as the motion information value and the motion information index may be omitted.

By the above-described motion information extraction process S710 and / or motion information decoding process S720, the decoder may derive temporal motion information of the reference picture. The motion information of the reference picture derived from the decoder may be the same as the temporal motion information extracted from the encoder. Hereinafter, the motion information of the reference picture derived from the decoder is referred to as reference motion information, and the motion vector of the reference picture derived from the decoder is called a reference motion vector.

Referring to FIG. 7 again, the decoder may perform inter prediction and / or motion compensation on the decoding target unit in the current picture by using the reference motion information derived by the above-described method (S730).

For example, the decoder may use the reference motion information when performing a motion vector prediction (MVP) and / or an advanced motion vector prediction (AMVP) for a decoding target unit. In this case, the decoder may use the reference motion vector as one of motion vector candidates. When motion vector prediction and / or AMVP is applied, the decoder may perform inter prediction and / or motion compensation using the reference motion vector.

In the skip mode and the direct mode, the motion vector and temporal motion vector of the reconstructed neighboring unit may be used as the motion vector of the decoded unit, and the reference picture index of the reconstructed neighboring unit may be used as the reference picture index of the decoded unit. have. In the direct mode, a residual signal for the current decoding target unit may be decoded. However, since the residual signal may not exist in the skip mode, the decoder may not decode the residual signal.

As another example, the decoder may perform inter prediction and / or motion compensation using a merge mode. In the merge mode, the decoder may perform inter prediction and / or motion compensation by using at least one of the reconstructed neighboring unit's motion vector and temporal motion vector as the motion vector of the decoding target unit. In this case, the decoder may use the reference motion vector to derive the motion vector of the decoding target unit. For example, the decoder may use the reference motion vector as one of the merge candidates included in the merge candidate list.

In the merge mode, the residual signal for the decoding target unit may not exist. In this case, the decoder may not decode the residual signal, and such an encoding mode may be called a merge skip mode.

Meanwhile, the reference motion information derived from the decoder may be two or more. In this case, the decoder may select one of the derived plurality of reference motion information and use the selected reference motion information for inter prediction and / or motion compensation for the encoding target unit. In this case, as an example, the decoder may decode the motion information index transmitted from the encoder and select reference motion information used for inter prediction and motion compensation using the decoded motion information index. Here, the motion information index may indicate which reference motion information is used among the derived plurality of reference motion information.

The decoder may use the reference motion information to derive the motion information of the decoding target unit. In this case, the reference motion information may be used as a prediction value for the motion information of the decoding target unit.

For example, the decoder may use the reference motion vector as the prediction motion vector for the decoding target unit. Here, the predicted motion vector may mean a predicted value for the motion vector of the decoding target unit. As described above, the encoder may derive the motion vector difference between the motion vector of the encoding target unit and the motion vector extracted from the reference picture and transmit the difference to the decoder. In this case, the decoder may receive and decode the motion vector difference, and may add a decoded motion vector difference and a reference motion vector to derive a motion vector of the decoding target unit. This can be represented, for example, by the following equation (4).

[Equation 4]

motion_vector = motion_vector_difference + extracted_motion_vector

Here, motion_vector may represent a motion vector of the decoding target unit. In addition, motion_vector_difference may indicate a motion vector difference, and extracted_motion_vector may indicate a reference motion vector.

According to the above-described inter prediction method, the decoder can efficiently decode temporal motion information during image decoding. In addition, since the decoder may not store the motion information in the reference picture in the memory, when inter prediction and motion compensation for the decoding target unit, the memory requirement and the memory bandwidth are reduced, and the error robustness is reduced. Can be improved. Therefore, the overall image decoding efficiency can be improved.

In the above embodiments, the methods are described based on a flowchart as a series of steps or blocks, but the present invention is not limited to the order of steps, and certain steps may occur in a different order or at the same time than other steps described above. Can be. Also, one of ordinary skill in the art appreciates that the steps shown in the flowcharts are not exclusive, that other steps may be included, or that one or more steps in the flowcharts may be deleted without affecting the scope of the present invention. I can understand.

The above-described embodiments include examples of various aspects. While not all possible combinations may be described to represent the various aspects, one of ordinary skill in the art will recognize that other combinations are possible. Accordingly, the invention is intended to embrace all other replacements, modifications and variations that fall within the scope of the following claims.

Claims

Deriving reference motion information for the decoding object unit in the current picture; And
Performing motion compensation on the decoding target unit by using the derived reference motion information;
The reference motion information is motion information included in a reference picture for the current picture, and includes a reference picture list, a reference picture index, a motion vector, and a prediction direction. And at least one of a motion vector predictor.
The method of claim 1, wherein the step of deriving the reference motion information,
And extracting the reference motion information from the reference picture.
The method of claim 2, wherein the extracting of the reference motion information comprises:
Counting the number of occurrences of each of the plurality of motion information in the reference picture to obtain count information; And
And selecting the reference motion information from among a plurality of motion information in the reference picture based on the obtained count information.
The method of claim 2, wherein the extracting of the reference motion information comprises:
Deriving a median motion information by performing a median operation on the motion information in the reference picture; And
And extracting the median motion information as the reference motion information.
The method of claim 2, wherein the extracting of the reference motion information comprises:
And performing sub-sampling on the motion information in the reference picture.
The method according to claim 5,
The subsampling step may include:
Selecting a block at a predetermined position from among a plurality of second sized blocks included in the first sized block in the reference picture; And
And extracting motion information corresponding to the selected block as the reference motion information.
The method according to claim 6,
And the predetermined position is the leftmost upper position in the block of the first size.
The method of claim 2, wherein the extracting of the reference motion information comprises:
Grouping motion information in the reference picture into a plurality of groups; And
In each of the plurality of groups, selecting a predetermined number of motion information as the reference motion information based on a frequency of occurrence of motion information,
In the grouping step, at least one characteristic of a depth value of a unit included in the reference picture, a size of a unit included in the reference picture, and a partition form of a unit included in the reference picture Based on the above, wherein the grouping is performed.
The method of claim 2, wherein the extracting of the reference motion information comprises:
Dividing the reference picture into a plurality of regions; And
And selecting a predetermined number of pieces of motion information as the reference motion information based on a frequency of occurrence of motion information in each of the plurality of regions.
The method according to claim 2,
If the number of the reference pictures is 2 or more,
The step of extracting the reference motion information,
And selecting, from each of the reference pictures, a predetermined number of motion information based on a frequency of occurrence of motion information.
The method of claim 10, wherein the extracting the reference motion information,
Deriving a temporal distance from the current picture for each of the reference pictures; And
Scaling the selected motion information based on the derived temporal distance.
The method according to claim 1,
The motion compensation performing step,
Receiving and decoding a motion vector difference with respect to the decoding target unit;
Deriving a predicted motion vector for the decoding target unit;
Deriving a motion vector for the decoding target unit by using the decoded motion vector difference and the derived predicted motion vector; And
And performing motion compensation on the decoding target unit by using the derived motion vector.
The method of claim 12, wherein the predicting motion vector derivation step,
Generating a motion vector candidate for the decoding target unit by using the reference motion information; And
And deriving the prediction motion vector using the motion vector candidate.
The method according to claim 1,
The motion compensation performing step,
Receiving and decoding a merge index;
Generating a merge candidate list using the reference motion information;
Selecting motion information indicated by the merge index among merge candidates included in the merge candidate list; And
And performing motion compensation on the decoding target unit by using the selected motion information.
The method according to claim 1,
If the number of the reference motion information is 2 or more,
The motion compensation performing step,
Receiving and decoding the encoded motion information index;
Selecting motion information indicated by the motion information index from the reference motion information; And
And performing motion compensation on the decoding target unit by using the selected motion information.
The method of claim 1, wherein the step of deriving the reference motion information,
Receiving encoded reference motion information; And
And decoding the received reference motion information.
The method according to claim 16,
When the number of encoded reference motion information is two or more,
In the decoding step,
And decoding the received reference motion information by using differential pulse code modulation (DPCM).