CN113455000B

CN113455000B - Bidirectional prediction method and video decoding apparatus

Info

Publication number: CN113455000B
Application number: CN201980092691.3A
Authority: CN
Inventors: 金在一; 李善暎; 罗太英; 孙世勋; 申在燮
Original assignee: SK Telecom Co Ltd
Current assignee: SK Telecom Co Ltd
Priority date: 2018-12-27
Filing date: 2019-12-26
Publication date: 2024-04-02
Anticipated expiration: 2039-12-26
Also published as: KR20200081201A; CN113455000A

Abstract

A bi-directional inter prediction method and an image decoding apparatus are disclosed. According to an embodiment of the present invention, there is provided a bi-prediction method for inter-predicting a current block using any one of a plurality of bi-prediction modes, including: decoding mode information from a bitstream, which indicates whether to apply a first mode included in a plurality of bi-prediction modes to a current block; when the mode information indicates that the first mode is applied to the current block, decoding first motion information including differential motion vector information and prediction motion vector information and second motion information excluding at least a portion of the differential motion vector information and the prediction motion vector information from the bitstream; deriving a first motion vector based on the first motion information, deriving a second motion vector based on at least a portion of the first motion information and the second motion information; and predicting the current block by using the reference block indicated by the first motion vector in the first reference picture and the reference block indicated by the second motion vector in the second reference picture.

Description

Bidirectional prediction method and video decoding apparatus

Technical Field

The present invention relates to encoding and decoding of video, and more particularly, to a bi-directional prediction method and a video decoding apparatus that improve encoding and decoding efficiency by effectively expressing motion information.

Background

Since the volume of video data is larger than that of voice data or still image data, storing or transmitting video data without compression processing requires a large amount of hardware resources including a memory.

Accordingly, when storing or transmitting video data, an encoder is generally used to compress the video data for storage or transmission. Then, the decoder receives the compressed video data, and decompresses and reproduces the video data. Compression techniques for such video include h.264/AVC and High Efficiency Video Coding (HEVC), which improves coding efficiency by about 40% over h.264/AVC.

However, the size, resolution and frame rate of video are gradually increasing, and accordingly the amount of data to be encoded is also increasing. Therefore, new compression techniques having better coding efficiency and higher image quality than existing compression techniques are needed.

Disclosure of Invention

Technical problem

It is an object of the present invention to provide improved video encoding and decoding techniques, and more particularly to techniques that improve the efficiency of encoding and decoding by using motion information in a particular direction to infer motion information in other directions.

Technical proposal

According to at least one aspect, the present disclosure provides a method of inter-predicting a current block using any one of a plurality of bi-prediction modes. The method comprises the following steps: mode information indicating whether to apply a first mode included in a plurality of bi-prediction modes to a current block is decoded from a bitstream. When the mode information indicates that the first mode is applied to the current block, the method further includes decoding, from the bitstream, first motion information including differential motion vector information and prediction motion vector information regarding the first motion vector and second motion information excluding at least a portion of the differential motion vector information and the prediction motion vector information regarding the second motion vector; and deriving a first motion vector based on the first motion information and deriving a second motion vector based on at least a portion of the first motion information and based on the second motion information. The method also includes predicting a current block using a reference block in the first reference picture indicated by the first motion vector and a reference block in the second reference picture indicated by the second motion vector.

According to another aspect, the present disclosure provides a video decoding apparatus. The apparatus includes a decoder configured to decode mode information from a bitstream, the mode information indicating whether a first mode included in a plurality of bi-predictive modes is applied to a current block. When the mode information indicates that the first mode is applied to the current block, the decoder decodes, from the bitstream, first motion information including differential motion vector information and prediction motion vector information regarding the first motion vector and second motion information excluding at least a portion of the differential motion vector information and the prediction motion vector information regarding the second motion vector. The apparatus comprises a prediction unit configured to derive a first motion vector based on the first motion information and to derive a second motion vector based on at least a portion of the first motion information and the second motion information. The predictor is configured to predict a current block using a reference block in the first reference picture indicated by the first motion vector and a reference block in the second reference picture indicated by the second motion vector.

Technical effects

As described above, according to an embodiment of the present invention, bit efficiency of motion representation can be improved by using motion in a specific direction to infer motion in other directions.

Drawings

Fig. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure.

Fig. 2 exemplarily shows a block partition structure using the QTBTTT structure.

Fig. 3 exemplarily illustrates a plurality of intra prediction modes.

Fig. 4 is an exemplary block diagram of a video decoding device capable of implementing the techniques of this disclosure.

Fig. 5 is a diagram for describing bi-prediction according to an embodiment of the present invention.

Fig. 6 is a diagram for describing the derivation of motion using a symmetrical relationship between differential motion vectors according to an embodiment of the present invention.

Fig. 7 and 8 are diagrams for describing deriving motion using a linear relationship according to an embodiment of the present invention.

Fig. 9 to 18 are diagrams for describing a deriving motion according to various embodiments of the present invention.

Fig. 19 and 20 are flowcharts for describing deriving motion using a reference picture determined at a high level according to an embodiment of the present invention.

Detailed Description

Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. It should be noted that when the reference numerals are added to constituent elements (constituent element) in the respective drawings, like reference numerals refer to like elements although these elements are shown in the different drawings. Furthermore, in the following description of the present disclosure, a detailed description of known functions and configurations incorporated herein will be omitted to avoid obscuring the subject matter of the present disclosure.

Fig. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure. Hereinafter, a video encoding apparatus and elements of the apparatus will be described with reference to fig. 1.

The video encoding apparatus includes a block divider 110, a predictor 120, a subtractor 130, a transformer 140, a quantizer 145, an encoder 150, an inverse quantizer 160, an inverse transformer 165, an adder 170, a filter unit 180, and a memory 190.

Each element of the video encoding device may be implemented in hardware or software, or a combination of hardware and software. The functions of the respective elements may be implemented in software, and a microprocessor may be implemented to perform the software functions corresponding to the respective elements.

A video is composed of a plurality of pictures. Each picture is divided into a plurality of regions, and encoding is performed on each region. For example, a picture is partitioned into one or more tiles and/or slices. Here, one or more tiles may be defined as a tile set. Each tile or slice is partitioned into one or more Coding Tree Units (CTUs). Each CTU is partitioned into one or more Coding Units (CUs) in a tree structure. Information applied to each CU is encoded as a syntax of the CU, and information commonly applied to the CUs contained in one CTU is encoded as a syntax of the CTU. Furthermore, information commonly applied to all blocks in one tile is encoded as a syntax of the tile or as a syntax of a tile group that is a set of a plurality of tiles, and information applied to all blocks constituting one picture is encoded in a Picture Parameter Set (PPS) or a picture header. Furthermore, information commonly referred to by a plurality of pictures is encoded in a Sequence Parameter Set (SPS). In addition, information commonly referenced by one or more SPS's is encoded in a Video Parameter Set (VPS).

The block divider 110 determines the size of a Coding Tree Unit (CTU). Information about the size of the CTU (CTU size) is encoded as a syntax of the SPS or PPS and transmitted to the video decoding apparatus.

The block divider 110 divides each picture constituting a video into a plurality of CTUs having a predetermined size, and then recursively divides the CTUs using a tree structure. In the tree structure, leaf nodes are used as Coding Units (CUs), which are the basic units of coding.

The tree structure may be a Quadtree (QT) in which a node (or a parent node) is divided into four child nodes (or child nodes) having the same size, a Binary Tree (BT) in which a node is divided into two child nodes, a Trigeminal Tree (TT) in which a node is divided into three child nodes at a ratio of 1:2:1, or a structure formed by a combination of two or more of a QT structure, a BT structure, and a TT structure. For example, QTBT (quadtree plus binary tree) structure or QTBTTT (quadtree plus binary tree trigeminal tree) structure may be used. The BTTT may be collectively referred to herein as a multi-type tree (MTT).

Fig. 2 shows a QTBTTT partition tree structure. As shown in fig. 2, CTUs may be initially partitioned in QT structure. QT segmentation may be repeated until the size of the segment reaches the minimum block size (MinQTSize) of leaf nodes allowed in QT. A first flag (qt_split_flag) indicating whether each node of the QT structure is partitioned into four lower-layer nodes is encoded and signaled to the video decoding apparatus by the encoder 150. When the leaf node of QT is not greater than the maximum block size (MaxBTSize) of the root node allowed in BT, the segmentation may be further performed in one or more of BT structure or TT structure. In the BT structure and/or the TT structure, there may be a plurality of division directions. For example, there may be two directions of horizontal division and vertical division as node blocks. As shown in fig. 2, when the MTT segmentation starts, a second flag (MTT _split_flag) indicating whether a node is segmented, a flag indicating a segmentation direction (vertical or horizontal), and/or a flag indicating a segmentation type (binary or trigeminal) are encoded and signaled to the video decoding apparatus by the encoder 150.

As another example of the tree structure, when a block is divided using the QTBTTT structure, information on a CU division flag (split_cu_flag) indicating that the block has been divided and a QT division flag (split_qt_flag) indicating that the division type is QT division is encoded and signaled to the video decoding apparatus by the encoder 150. When the value of split_cu_flag indicates that a block has not been partitioned, the block of the node becomes a leaf node in the partition tree structure and serves as a Coding Unit (CU) that is a basic unit of coding. When the value of split_cu_flag indicates that the block has not been split, whether the split type is QT or MTT is discriminated by the value of split_qt_flag. When the partition type is QT, there is no additional information. When the division type is MTT, a flag (MTT _split_cu_vertical_flag) indicating the MTT division direction (vertical or horizontal) and/or a flag (MTT _split_cu_binary_flag) indicating the MTT division type (binary or trigeminal) are encoded and signaled to the video decoding apparatus by the encoder 150.

As another example of the tree structure, when QTBT is used, there may be two partition types, which are a block horizontal partition (i.e., symmetric horizontal partition) and a vertical partition (i.e., symmetric vertical partition) of a node, divided into two blocks of the same size. A split flag (split_flag) indicating whether each node of the BT structure is split into lower blocks and split type information indicating a split type are encoded by the encoder 150 and transmitted to the video decoding apparatus. There may be additional types that divide the blocks of nodes into two asymmetric blocks. The asymmetric division type may include a type of dividing a block into two rectangular blocks at a size ratio of 1:3, and a type of diagonally dividing a block of a node.

A CU may have various sizes according to QTBT or QTBTTT partitions of the CTU. Hereinafter, a block corresponding to a CU to be encoded or decoded (i.e., a leaf node of QTBTTT) is referred to as a "current block".

The predictor 120 predicts a current block to generate a predicted block. Predictor 120 includes an intra predictor 122 and an inter predictor 124.

In general, each current block in a picture may be predictively encoded. The prediction of the current block may be performed using an intra prediction technique (performed based on data from a picture containing the current block) or an inter prediction technique (performed based on data from a picture encoded before the picture containing the current block). Inter prediction includes both unidirectional prediction and bi-directional prediction.

The intra predictor 122 predicts pixels in the current block using pixels (reference pixels) located around the current block in the current picture including the current block. Depending on the prediction direction, there are a variety of intra prediction modes. For example, as shown in fig. 3, the plurality of intra prediction modes may include a non-directional mode including a planar mode and a DC mode, and 65 directional modes. For each prediction mode, neighboring pixels and formulas to be used are defined differently.

The intra predictor 122 may determine an intra prediction mode to be used when encoding the current block. In some examples, intra predictor 122 may encode the current block using several intra prediction modes and select an appropriate intra prediction mode from among the modes under test to use. For example, the intra predictor 122 may calculate a rate distortion value using rate distortion analysis of several measured intra prediction modes, and may select an intra prediction mode having the best rate distortion characteristics among the measured modes.

The intra predictor 122 selects one intra prediction mode from among a plurality of intra prediction modes, and predicts the current block using neighboring pixels (reference pixels) and formulas determined according to the selected intra prediction mode. Information about the selected intra prediction mode is encoded by the encoder 150 and transmitted to the video decoding apparatus.

The inter predictor 124 generates a prediction block of the current block through a motion compensation process. The inter predictor searches for a block most similar to the current block among reference pictures encoded and decoded earlier than the current picture, and generates a prediction block of the current block based on the searched block. Then, the inter predictor generates a motion vector corresponding to a displacement between a current block in the current picture and a predicted block in the reference picture. In general, motion estimation is performed on a luminance component, and a motion vector calculated based on the luminance component is used for both the luminance component and the chrominance component. Motion information including information on a reference picture for predicting a current block and information on a motion vector is encoded by the encoder 150 and transmitted to a video decoding apparatus.

The subtractor 130 generates a residual block by subtracting the prediction block generated by the intra predictor 122 or the inter predictor 124 from the current block.

The transformer 140 transforms a residual signal in a residual block having pixel values in the spatial domain into transform coefficients in the frequency domain. The transformer 140 may transform the residual signal in the residual block using the total size of the current block as a transform unit. Alternatively, the transformer may divide the residual block into sub-blocks of the transform region and the non-transform region, and transform the residual signal using only the sub-blocks of the transform region as the transform unit. Here, the transform region sub-block may be one of two rectangular blocks having a size ratio of 1:1 based on a horizontal axis (or a vertical axis). In this case, a flag (cu_sbt_flag), direction (vertical/horizontal) information (cu_sbt_horizontal_flag), and/or position information (cu_sbt_pos_flag) indicating that only the sub-block has been transformed are encoded and signaled to the video decoding apparatus by the encoder 150. In addition, the transform region sub-block may have a size ratio of 1:3 based on the horizontal axis (or vertical axis). In this case, a flag (cu_sbt_quad_flag) for discriminating division is additionally encoded by the encoder 150 to signal the video decoding apparatus.

The quantizer 145 quantizes the transform coefficient output from the transformer 140 and outputs the quantized transform coefficient to the encoder 150.

The encoder 150 generates a bitstream by encoding the quantized transform coefficients using an encoding method such as context-based adaptive binary arithmetic coding (CABAC). The encoder 150 encodes information related to block division, such as CTU size, CU division flag, QT division flag, MTT division direction, and MTT division type, so that the video decoding apparatus divides blocks in the same manner as the video encoding apparatus.

Further, the encoder 150 encodes information on a prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and encodes intra prediction information (i.e., information on an intra prediction mode) or inter prediction information (information on a reference picture and a motion vector) according to the prediction type.

The inverse quantizer 160 inversely quantizes the quantized transform coefficient output from the quantizer 145 to generate a transform coefficient. The inverse transformer 165 transforms the transform coefficients output from the inverse quantizer 160 from the frequency domain to the spatial domain and reconstructs a residual block.

The adder 170 adds the reconstructed residual block to the prediction block generated by the predictor 120 to reconstruct the current block. The pixels in the reconstructed current block are used as reference pixels for intra prediction of the next block.

The filter unit 180 filters the reconstructed pixels to reduce block artifacts, ringing artifacts, and blurring artifacts due to block-based prediction and transform/quantization. The filter unit 180 may include a deblocking filter 182 and a Sample Adaptive Offset (SAO) filter 184.

The deblocking filter 180 filters boundaries between reconstructed blocks to remove block artifacts caused by block-wise encoding/decoding, and the SAO filter 184 additionally filters the deblock filtered video. The SAO filter 184 is a filter for compensating for differences between reconstructed pixels and original pixels caused by lossy encoding.

The reconstructed block filtered by the deblocking filter 182 and the SAO filter 184 is stored in the memory 190. Once all blocks in a picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of the next picture to be encoded.

Fig. 4 is an exemplary functional block diagram of a video decoding device capable of implementing the techniques of this disclosure. Hereinafter, a video decoding apparatus and elements of the apparatus will be described with reference to fig. 4.

The video decoding apparatus may include a decoder 410, an inverse quantizer 420, an inverse transformer 430, a predictor 440, an adder 450, a filter unit 460, and a memory 470.

Similar to the video encoding device of fig. 1, each element of the video decoding device may be implemented as hardware or software, or may be implemented as a combination of hardware and software. In addition, the function of each element may be implemented as software, and a microprocessor may be implemented to execute the function of the software corresponding to each element.

The decoder 410 determines a current block to be decoded by decoding a bitstream received from a video encoding apparatus and extracting information related to block division, and extracts prediction information required to reconstruct the current block and information about a residual signal.

The decoder 410 extracts information about the size of CTUs from a Sequence Parameter Set (SPS) or a Picture Parameter Set (PPS), determines the size of CTUs, and partitions the picture into CTUs of the determined size. Then, the decoder determines the CTU as the uppermost layer (i.e., the root node of the tree structure), and extracts partition information about the CTU to partition the CTU using the tree structure.

For example, when dividing CTUs using the QTBTTT structure, first a first flag (qt_split_flag) related to QT division is extracted, and each node is divided into four nodes of the lower layer. Then, for a node corresponding to a leaf node of QT, a second flag (mtt_split_flag) related to MTT split and information on a split direction (vertical/horizontal) and/or a split type (binary/trigeminal) are extracted, and the leaf node is split in an MTT structure. In this way, each node below the leaf node of QT is recursively split in the BT or TT structure.

As another example, when dividing CTUs using the QTBTTT structure, a CU division flag (split_cu_flag) indicating whether a CU is divided is first extracted. If the corresponding block is partitioned, a QT partition flag (split_qt_flag) is extracted. When the division type is not QT but MTT, a flag (MTT _split_cu_vertical_flag) indicating the MTT division direction (vertical or horizontal) and/or a flag (MTT _split_cu_binary_flag) indicating the MTT division type (binary or trigeminal) are additionally extracted. During the segmentation process, each node may undergo zero or more recursive QT segmentation and then undergo zero or more recursive MTT segmentation. For example, CTUs may be partitioned immediately by MTT, or may be partitioned only by QT multiple times.

As another example, when the CTU is partitioned using the QTBT structure and a first flag (qt_split_flag) related to QT partitioning, and each node is partitioned into four nodes of the lower layer. For a node corresponding to a leaf node of QT, split_flag and split direction information indicating whether the node is further BT split are extracted.

Once the current block to be decoded is determined through tree structure partitioning, the decoder 410 extracts information about a prediction type indicating whether the current block is subjected to intra prediction or inter prediction. When the prediction type information indicates intra prediction, the decoder 410 extracts syntax elements of intra prediction information (intra prediction mode) of the current block. When the prediction type information indicates inter prediction, the decoder 410 extracts syntax elements of inter prediction information, i.e., information indicating a motion vector and a reference picture to which the motion vector refers.

The decoder 410 extracts information on quantized transform coefficients of the current block as information on a residual signal.

The inverse quantizer 420 inversely quantizes the quantized transform coefficients and inversely transforms the inversely quantized transform coefficients from the frequency domain to the spatial domain, reconstructing a residual signal to generate a residual block of the current block.

In addition, when the inverse transformer 430 inversely transforms only a partial region (sub-block) of the transform block, a flag (cu_sbt_flag) indicating that only the sub-block of the transform block has been transformed, and direction information (vertical/horizontal) about the sub-block (cu_sbt_horizontal_flag) and/or sub-block position information (cu_sbt_pos_flag) are extracted. Then, the residual signal is reconstructed by inverse transforming transform coefficients of the sub-blocks from the frequency domain to the spatial domain. For regions that are not inverse transformed, the residual signal is padded with "0". Thus, a final residual block of the current block is created.

The predictor 440 may include an intra predictor 442 and an inter predictor 444. The intra predictor 442 is activated when the prediction type of the current block is intra prediction, and the inter predictor 444 is activated when the prediction type of the current block is inter prediction.

The intra predictor 442 determines an intra prediction mode of the current block among a plurality of intra prediction modes based on syntax elements of the intra prediction modes extracted from the decoder 410, and predicts the current block based on reference pixels around the current block according to the intra prediction mode.

The inter predictor 444 determines a motion vector of the current block and a reference picture to which the motion vector refers based on syntax elements of the intra prediction mode extracted from the decoder 410, and predicts the current block based on the motion vector and the reference picture.

The adder 450 reconstructs the current block by adding the residual block output from the inverse transformer and the prediction block output from the inter predictor or the intra predictor. The pixels in the reconstructed current block are used as reference pixels for intra prediction of the block to be decoded later.

The filter unit 460 may include a deblocking filter 462 and an SAO filter 464. Deblocking filter 462 performs deblocking filtering on boundaries between reconstructed blocks to remove block artifacts caused by block-by-block decoding. The SAO filter 464 performs additional filtering on the reconstructed block after deblocking filtering to compensate for differences between the reconstructed pixel and the original pixel caused by lossy encoding. The reconstructed block filtered by the deblocking filter 462 and the SAO filter 464 is stored in a memory 470. When all blocks in one picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of the blocks in the picture to be encoded later.

The inter-picture prediction encoding/decoding method (inter prediction method) of the HEVC standard may be classified into a skip mode, a merge mode, and an adaptive (or Advanced) Motion Vector Predictor (AMVP) mode.

In the skip mode, an index value indicating one of motion information candidates of the neighboring block is signaled. In the merge mode, an index value indicating one of motion information candidates of neighboring blocks and information obtained by encoding a predicted residual are signaled. In the AMVP mode, motion information of the current block and information obtained by encoding the predicted residual are signaled. The motion information signaled in the AMVP mode includes motion information of neighboring blocks (motion vector predictor (mvp)) and a difference value (motion vector difference (mvd)) between the motion information (mvp) and the motion information (mv) of the current block.

Describing the motion information signaled in the AMVP mode in more detail, the motion information may include reference picture information (reference picture index), prediction motion vector (mvp) information, and differential motion vector (mvd) information. In the case of bi-prediction, the above information is signaled separately for each direction. Table 1 below shows syntax elements regarding reference picture information, mvp information, and mvd information signaled for each direction.

TABLE 1

In the above table 1, inter_pred_idc is a syntax element indicating a prediction direction (prediction direction information), and may indicate any one of uni-L0 (uni-L0), uni-L1 (uni-L1), and bi-prediction (bi-prediction). According to the present invention, inter_pred_idc indicates bi-prediction since motion information in a specific direction is derived from motion information in another direction. ref_idx_l0 is a syntax element (reference picture information) indicating a reference picture in the L0 direction, and a reference picture for predicting the current block among reference pictures included in the reference picture list 0 is specified by the syntax element. ref_idx_l1 is a syntax element (reference picture information) indicating a reference picture in the L1 direction, and a reference picture for predicting the current block among the reference pictures included in the reference picture list 1 is specified by the syntax element. mvp_l0_flag is a syntax element (mvp information) indicating mvp for the L0 direction, and mvp to be used for prediction of the current block in the L0 direction is specified by the syntax element. mvp_l1_flag is a syntax element (mvp information) indicating mvp for the L1 direction, and mvp to be used for prediction of the current block in the L1 direction is specified by the syntax element.

Syntax elements constituting mvd information are shown in table 2 below.

TABLE 2

In the above table 2, abs_mvd_greate0_flag is a syntax element indicating whether the absolute value (size) of mvd exceeds 0, and abs_mvd_greate1_flag is a syntax element indicating whether the absolute value of mvd exceeds 1. In addition, abs_mvd_minus2 is a syntax element indicating a value obtained by subtracting 2 from the absolute value of mvd, and mvd_sign_flag corresponds to a syntax element indicating the sign of mvd.

As shown in table 2, mvd is represented by syntax elements (abs_mvd_greate0_flag, abs_mvd_greate1_flag, abs_mvd_minus2) indicating absolute values of each of the x component and the y component and syntax elements (mvd_sign_flag) indicating a symbol.

Table 3 below summarizes information for bi-prediction of the conventional AMVP mode that is signaled from the video encoding device to the video decoding device based on what is described in tables 1 and 2.

TABLE 3

As shown in table 3 above, in the conventional AMVP mode, in order to perform bi-prediction on the current block, reference picture information, mvp information, mvd information, etc. are signaled separately for each direction, which may be inefficient in terms of bit efficiency.

The present invention relates to estimating motion information in other directions from among motion information in a specific direction by estimating a reference picture for predicting a current block or using correlation between motion information in each direction in order to improve bit efficiency of bi-prediction.

The "specific direction" indicates a direction in which motion information is inferred or inferred based on information signaled from the video encoding device, and the "other direction" indicates a direction in which motion information is inferred or inferred based on motion information in the specific direction. In inferring motion information in other directions, at least some of the motion information in a particular direction and/or information signaled from a video encoding device may be used. In this specification, it is described that the specific direction corresponds to the direction L0 and the other directions correspond to the direction L1, but the specific direction may correspond to any one of the directions L0 and L1 and the other directions may correspond to the remaining directions which do not correspond to the specific direction among the two directions. Hereinafter, a specific direction is referred to as a first direction, and other directions are referred to as second directions. In addition, a motion vector in the first direction is referred to as a first motion vector, and a motion vector in the second direction is referred to as a second motion vector.

The correlation between pieces of motion information may include a symmetrical relationship, a linear relationship, a proportional relationship, a Picture Order Count (POC) difference relationship between reference pictures based on a current picture, etc. established between pieces of motion information. Such correlation may be established for all pieces of motion information, and may be established separately for each element (at least one of reference picture information, mvp information, and mvd information) included in the motion information. For example, a symmetrical relationship may be established between pieces of mvd information in two directions, and a linear relationship may be established between mvp information in two directions (indicated by mvp_flag) and mvd information in two directions. Here, establishing a linear relationship between mvp information and mvd information in two directions can be understood as establishing a linear relationship between motion vectors (motions) in two directions.

In connection with the names of the motion information referred to in the present specification, the motion information in a specific direction (first direction) is referred to as first motion information, and the motion information in the other direction (second direction) is referred to as second motion information or third motion information depending on the number or type of the contained elements. The third motion information is motion information in the second direction, and may be motion information including mvd information in the second direction and mvp information in the second direction. Both the second motion information and the third motion information correspond to motion information in the second direction, but may be classified according to whether both mvd information and mvp information included in the second direction are included or not included.

An embodiment of the present invention for inferring motion in a second direction is illustrated in fig. 5.

The video encoding device may signal the mode information (mode_info) by including the mode information (mode_info) in the bitstream. The bi-prediction mode proposed by the present invention may include a first mode in which the second motion information (motion_info_l1) is derived from the first motion information (motion_info_l0), a second mode in which the third motion information (motion_info_l2) is derived using the signaled information, and so on.

mode_info may correspond to information indicating any one of a plurality of prediction modes included in the plurality of bi-prediction modes. The mode information may be implemented in various forms such as a flag or an index depending on the number of available bi-predictive modes. Hereinafter, description will be made on the premise that mode_info indicates a prediction mode used for bi-prediction of a current block among the first mode and the second mode. On this premise, the mode_info may correspond to information indicating whether the first mode is applied to the current block. In addition, the case where mode_info does not indicate that the first mode is applied may be the same as indicating that the first mode is not applied or indicating that the second mode is applied.

When the mode_info indicates that the first mode is applied, the video encoding apparatus may signal the motion_info_l0 and the motion_info_l1 by including the motion_info_l0 and the motion_info_l1 in the bitstream. The motion_info_l0 may include differential motion vector information (mvd_l0) in the first direction and prediction motion vector information (mvp_l0_flag) in the first direction. motion_info_l1 may include some of mvd_l1 and mvp_l1_flag (in other words, motion_info_l1 may not include at least some of mvd_l1 and mvp_l1_flag). On the other hand, when the mode_info does not indicate the application of the first mode (when the mode_info indicates the application of the second mode), the video encoding apparatus may signal the motion_info_l0 and the motion_info_l2 by including the motion_info_l0 and the motion_info_l2 in the bitstream. motion_info_l2 can be both mvd_l1 and mvp_l1_flag.

The video decoding apparatus (decoding unit) may decode mode_info from the bitstream (S530). When the mode_info indicates that the first mode is applied (S540), the video decoding apparatus may decode the motion_info_l0 and the motion_info_l1 from the bitstream since the motion_info_l1 is included in the bitstream (S550).

The video decoding apparatus (prediction unit) may derive a first motion vector mv_l0 based on the motion_info_l0 and derive a second motion vector mv_l1 based on at least a portion of the motion_info_l0 and the motion_info_l1 (S560). Since motion_info_l0 includes mvd_l0 and mvp_l0_flag, mv_l0 can be derived by summing mvd_l0 and mvp_l0 in the following equation 1.

[ formula 1]

In the above equation 1, the number of the blocks,mvx ₀ representing the x-component of mv_l0, mvy ₀ Representing the y component of mv_l0. mvpx ₀ Representing the x component of mvp_l0, mvpy ₀ Representing the y component of mvp_l0. mvdx ₀ The x component representing mvd_l0, mvdy ₀ Representing the y component of mvd_l0.

Since motion_info_l1 does not include at least a portion of mvd_l1 and mvp_l1_flag, mv_l1 can be derived based on the correlation of motion. The detailed method of deriving mv_l1 will be described below.

The video decoding apparatus may predict a current block (generate a predicted block of the current block) using a first reference block indicated by mv_l0 within a first reference picture (ref_l0) that is a reference picture in a first direction, and a second reference block (ref_l1) indicated by mv_l1 within a second reference picture that is a reference picture in a second direction (S570). Ref_l0 and ref_l1 may be specified according to reference picture information (ref_idx_l0 and ref_idx_l1) signaled from the video encoding apparatus, or ref_l0 and ref_l1 may be derived based on POC differences between reference pictures and current pictures included in the reference picture list. Specific embodiments thereof will be described below.

Meanwhile, when the mode_info does not indicate the application of the first mode (when the mode_info indicates the application of the second mode) in operation S540, since the motion_info_l2 is included in the bitstream, the video decoding apparatus may decode the motion_info_l0 and the motion_info_l2 from the bitstream (S590). In this case, the video decoding apparatus may derive mv_l0 based on the motion_info_l0 and mv_l1 based on the motion_info_l2 (S560). In addition, the video decoding apparatus may predict the current block by using the first reference block indicated by mv_l0 and the second reference block indicated by mv_l1 (S570).

According to an embodiment, the video encoding apparatus may signal the enable information (enabled_flag) by further including the enable information (enabled_flag) in the bitstream. The enabled_flag may correspond to information indicating whether the first mode is enabled. When the enabled_flag indicates that the first mode is enabled, the video encoding device may encode the enabled_flag into a high level syntax such as a sequence level, a picture level, a tile group level, and a slice level, and signal the mode_info of each prediction unit (block) by including the mode_info of each prediction unit (block) in the bitstream. In this way, whether to apply the embodiments proposed by the present invention can be set for each block.

When the enabled_flag is encoded as a high level syntax and mode_info is encoded in units of blocks, the video decoding apparatus may decode the enabled_flag from the high level syntax (S510), and when the enabled_flag indicates that the first mode is enabled (S520), decode the motion_info from the bitstream (S530). Meanwhile, when the enabled_flag indicates that the first mode is not enabled, the mode_info may not be decoded. In this case, the video decoding apparatus may not apply the first mode to the current block by setting or estimating mode_info to "0" or "off" to indicate that the first mode is not applied (S580).

Hereinafter, various embodiments proposed by the present invention will be described according to whether some of reference picture information (ref_idx_l0 and ref_idx_l1), prediction motion vector information (mvp_l0_flag and mvp_l1_flag), differential motion vector information (mvd_l0 and mvd_l1) are included in motion information.

In the embodiments described below, the motion_info_l0 may include mvd_l0 and mvp_l0_flag, and the motion_info_l1 may not include at least some of mvd_l1 and mvp_l1_flag. In other words, motion_info_l0 may not include ref_idx_l0, and motion_info_l1 may not include one or more of ref_idx_l1, mvd_l1, and mvp_l1_flag.

First embodiment

The first embodiment corresponds to a method of estimating motion information by estimating mvd_l1 when ref_idx_l0, mvd_l0, and mvp_l0 are all contained in motion_info_l0 and ref_idx_l1 and mvp_l1 are contained in motion_info_l1.

In the first embodiment, mvd_l1, which is not signaled, may be derived from mvd_l0. Mvd_l1 may be derived based on the symmetry established between mvd_l1 and mvd_l0. That is, mvd_l1 may be set or derived to a value symmetrical to mvd_l0 (mvd_l1= -mvd_l0), and mvl 1 may be derived using the derived mvd_l1 and the signaled mvp_l1 (equation 2).

[ formula 2]

(mvx ₁ ，mvy ₁ )＝(mvpx ₁ -mvdx ₀ ，mvpy ₁ -mvdy ₀ )

The video encoding apparatus may signal the motion_info_l0 and the motion_info_l1 (except mvd_l1) by including the motion_info_l0 and the motion_info_l1 (except mvd_l1) in the bitstream through the same procedure as described above. As shown in fig. 6, the video decoding apparatus may derive mv_l0 by using mvd_l0 and mvp_l0 included in motion_info_l0. In addition, the video decoding apparatus may derive mv_l1 by using mvd_l1 (-mvd_l0) derived from mvd_l0 and mvp_l1 included in motion_info_l1.

The video decoding apparatus may predict the current block 620 located within the current picture 610 using the first reference block 630 indicated by mv_l0 within ref_l0 indicated by ref_idx_l0 and the second reference block 640 indicated by mv_l1 within ref_l1 indicated by ref_idx_l1.

Second embodiment

The second embodiment corresponds to a method of estimating motion information by estimating ref_l0 and ref_l1 when ref_idx_l0 is not included in motion_info_l0 and ref_idx_l1 is not included in motion_info_l1.

In the second embodiment, ref_l0 and ref_l1 may be determined or derived as reference pictures having the 0 th index (located at the first position) among reference pictures included in the reference picture list, or ref_l0 and ref_l1 may be determined or derived based on POC differences between the reference pictures included in the reference picture list and the current picture. Hereinafter, a method of deriving ref_l0 and ref_l1 based on the POC difference from the current picture will be described.

The video decoding apparatus may select any one of the reference pictures included in the reference picture list of the first direction based on a difference in POC value between the reference picture included in the reference picture list 0 (reference picture list of the first direction) and the current picture, and set the selected reference picture to ref_l0. For example, the video decoding apparatus may set a reference picture (closest reference picture) having the smallest POC value difference from the current picture to ref_l0.

In addition, the video decoding apparatus may select any one of the selected reference pictures included in the reference picture list of the second direction based on a difference in POC value between the reference picture included in the reference picture list 1 (reference picture list of the second direction) and the current picture, and set the selected reference picture to ref_l1. For example, the video decoding apparatus may set a reference picture (closest reference picture) having the smallest POC value difference from the current picture to ref_l1.

The video decoding apparatus may sequentially or in parallel compare the POC value of the reference picture included in the reference picture list with the POC value of the current picture to select any one of the reference pictures. When the closest reference picture is selected by sequentially comparing reference pictures included in the reference picture list, the video decoding apparatus may virtually set an index value of the reference picture to an index value (e.g., -1) not assigned to the reference picture list and then sequentially compare the reference pictures.

The reference picture selected from the reference picture list in the first direction and the reference picture selected from the reference picture list in the second direction may have a forward POC value or a backward POC value with respect to the POC value of the current picture. That is, the reference picture selected from the reference picture list in the first direction and the reference picture selected from the reference picture list in the second direction may be composed of a pair of a forward reference picture and a backward reference picture.

When deriving ref_l0 and ref_l1, the video decoding apparatus may predict the current block using the first reference block 630 indicated by mv_l0 in ref_l0 and the second reference block 640 indicated by mv_l1 in ref_l1.

According to an embodiment, the process of determining ref_l0 and ref_l1 may be performed at a high level higher than that of the current block. That is, among the elements contained in the motion_info_l0 and motion_info_l1, the remaining elements other than ref_l0 and ref_l1 may be derived or determined in units of blocks, and ref_l0 and ref_l1 may be determined in units of high levels. Here, the high level may be a higher level than the block level, such as a picture level, a tile group level, a slice level, a tile level, and a Coding Tree Unit (CTU) level.

The second embodiment may be implemented in combination with the first embodiment described above or the embodiments to be described below. That is, although it has been described that ref_idx_l0 and ref_idx_l1 are signaled in the first embodiment, when the second embodiment is applied, ref_idx_l0 and ref_idx_l1 are not signaled in the first embodiment, and thus the video decoding apparatus itself can derive ref_l0 and ref_l1.

Third embodiment

The third embodiment corresponds to a method of deducing second motion information from first motion information based on a linear relationship established between motion in a first direction and motion in a second direction.

The video encoding apparatus may signal the video decoding apparatus motion_info_l0 by including motion_info_l0 in the bitstream. motion_info_l0 may include mvp_l0_flag, mvd_l0, and/or ref_idx_l0. The information included in the motion_info_l0 may be different for each embodiment to be described later.

The video decoding apparatus may decode motion_info_l0 from the bitstream (S710). The video decoding apparatus may infer or derive mvl 0 by using mvp_l0_flag and mvd_l0 (S720). Mv_l0 can be derived by adding mvp_l0 and mvd_l0 as in equation 1 described above. Here, mvp_l0 may correspond to a motion vector of a neighboring block indicated by the decoded mvp_l0_flag.

When deriving mv_l0, the video decoding apparatus may derive mv_l1 by using ref_l0, ref_l1, and mv_l0 (S730). The derived mv_l1 may correspond to a motion vector having a linear relationship with mv_l0. ref_l0 may be a reference picture indicated by ref_idx_l0 signaled from the video encoding device or a separately defined reference picture. In addition, ref_l1 may be a reference picture indicated by ref_idx_l1 signaled from the video encoding apparatus or a separately defined reference picture.

Mv_l1 may be derived by applying a proportional relationship between "the difference in POC value between the current picture 610 and ref_l0" and "the difference in POC value between the current picture 610 and ref_l1" to mv_l0 as shown in the following equation 3.

[ formula 3]

In equation 3, mvx ₁ Representing the x-component of mv_l1, mvy ₁ Representing the y component of mv_l1. POC (Point of care) ₀ POC value representing ref_l0, POC ₁ POC value of ref_l1, POC _curr Representing the POC value of the current picture 610 containing the current block 620. In addition, POC _curr -POC ₀ Representing the difference in POC value between ref_l0 and the current picture 610, and POC _curr -POC ₁ Representing the difference in POC value between ref_l1 and the current picture 610.

When deriving mv_l1, the video decoding apparatus may predict the current block 620 based on the first reference block 630 indicated by mv_l0 and the second reference block 640 indicated by mv_l1 (S740).

According to an embodiment, various embodiments presented by the present invention may use syntax elements (e.g., linear_mv_coding_enabled_flag) indicating enablement/disablement and/or syntax elements (e.g., linear_mv_coding_flag or linear_mv_coding_idc) indicating a linear relationship of motion to determine whether to apply to the current block 620. Here, the syntax element indicating enable/disable may correspond to the enable information described above, and the syntax element indicating a linear relationship may correspond to the mode information described above.

The linear_mv_coding_enabled_flag is a high level syntax and may be defined at one or more positions among a sequence level, a picture level, a tile group level, and a slice level. The linear_mv_coding_flag may be signaled for each block corresponding to a decoding object.

When linear_mv_coding_enabled_flag=1, whether to apply the proposed embodiment of the present invention can be set for each block by signaling linear_mv_coding_flag for each prediction unit. When linear_mv_coding_flag=1, some or all of the motion_info_l1 is not signaled, and the signaled motion_info_l0 may be used to derive the motion_info_l1 (first mode). When linear_mv_coding_flag=0, motion_info_l1 (second mode) may be signaled as in the conventional method.

Hereinafter, various embodiments of the present invention will be described on the premise that linear_mv_coding_enabled_flag is defined as activation of a high-level function and linear_mv_coding_flag is set for each block.

Embodiment 3-1

Embodiment 3-1 corresponds to the following method: motion_info_l1 does not signal mvp_l1_flag and mvd_l1 during bi-prediction and mvp_l1_flag and mvd_l1 are derived from motion_info_l0 using the linear relationship of motion.

When the second direction is the L0 direction, motion information in the L0 direction can be derived from mvd and mvp in the L1 direction and the bi-directional reference picture by a linear relationship of motion. That is, mvp information and mvd information in the direction L0 are not signaled. When the second direction is the L1 direction, motion information in the L1 direction can be derived from mvd and mvp in the L0 direction and the bi-directional reference picture by a linear relationship of motion. That is, mvp information and mvd information in the direction L1 are not signaled.

When a motion vector in the direction L1 is derived using a linear relationship (the latter case), information signaled from the video encoding apparatus to the video decoding apparatus is expressed in syntax as shown in table 4 below.

TABLE 4

As shown in table 4, the motion_info_l0 may be signaled from the video encoding device to the video decoding device by being included in the bitstream. The signaled motion_info_l0 may include ref_idx_l0, mvd_l0, and mvp_l0_flag. ref_idx_l1 may also be signaled by being included in the bitstream. In embodiment 3-1, the reference pictures (ref_l0 and ref_l1) used for deriving mv_l1 correspond to the reference pictures indicated by ref_idx_l0 and ref_idx_l0 signaled from the video encoding apparatus.

When motion_info_l0 is decoded (S910), the video decoding apparatus may infer or derive mv_l0 by using the decoded mvp_l0_flag and mvd_l0 (S920). Equation 1 may be used in this process. Also, ref_idx_l1 may be decoded from the bitstream (S930).

The video decoding apparatus may use the linear_mv_coding_enabled_flag to determine whether the motion vector derivation function is activated/deactivated (S940). When the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is activated, the linear_mv_coding_flag may be decoded from the bitstream to determine whether to apply the derivation function proposed by the present invention (S950).

When the decoded linear_mv_coding_flag indicates that a linear relationship of motion is established (S960), the video decoding apparatus may derive mv_l1 on the premise that a linear relationship between mv_l0 and mv_l1 is established (S970). The process of deriving mv_l1 may be implemented by applying each of the reference pictures ref_l0 and ref_l1 and mv_l0 in each direction to equation 3.

Meanwhile, when it is indicated in the linear_mv_coding_enabled_flag that the motion vector derivation function is disabled in operation S940 or the linear_mv_coding_flag does not indicate that the linear relationship of motion is established in operation S960, mv_l1 may be derived through the second mode instead of the first mode. Specifically, the video decoding apparatus may decode mvp_l1_flag and mvd_l1 from the bitstream (S980 and S990), and derive mvl 1 by using mvp_l1_flag and mvd_l1 (S992).

The syntax elements of the above embodiment 3-1 are shown in table 5 below.

TABLE 5

Fig. 9 illustrates that the operation of determining the linear_mv_coding_enabled_flag (S940) and the operation of decoding and determining the linear_mv_coding_flag (S950 and S960) may be performed after the operation of decoding ref_idx_l1 (S930), but the operations S940 to S960 may be performed before the operation of decoding the motion_info_l0 (S910).

An example of deducing mv_l1 based on embodiment 3-1 is illustrated in fig. 10. Two types of current picture 610 and reference pictures ref_l0 and ref_l1 according to the size of POC value in bi-prediction are illustrated in fig. 10 (a) and 10 (B), respectively. The embodiments to be described below can be applied to two types illustrated in fig. 10.

In bi-prediction, as shown in fig. 10 (a), the current picture 610 may be based on a POC value (i.e., (POC) ₀ <POC _cur )&(POC _cur <POC ₁ ) Between the reference pictures (ref_l0 and ref_l1). In addition, as shown in fig. 10 (B), the value is based on the POC value (i.e., (POC) ₀ <POC _cur )&(POC ₁ <POC _cur ) Bi-prediction may include the case where the POC value of the current picture 610 is greater than the POC values of the reference pictures ref_l0 and ref_l1. Here, POC ₀ Indicating the POC value of ref_l0, POC ₁ Indicating the POC value of ref_l1, and POC _cur Indicating the POC value of the current picture 610.

In both bi-directional predictions, mv_l1 can be derived with the premise that a linear relationship is established between mv_l0 (solid arrow) and mv_l1 (dashed arrow). In this process, mv_l0 and reference pictures ref_l0 and ref_l1 in each direction may be used. When mv_l1 is derived, the current block 620 may be predicted based on the reference block 630 indicated by mv_l0 and the reference block 640 indicated by the derived mv_l1.

Embodiment 3-2

Embodiment 3-2 corresponds to a method of estimating mv_l1 based on a linear relation of motion and then correcting or adjusting mv_l1. Embodiment 3-2 is the same as embodiment 3-1 in that a motion vector is derived based on a linear relationship of motion, but is different from embodiment 3-1 in that mv_l1 is additionally corrected or adjusted using offset information.

The offset information for motion correction corresponds to information indicating a difference between mv_l1 and "adjusted mv_l1". In other words, the offset information corresponds to information indicating a difference between a motion vector (mv_l1) derived using a linear relation of motion and a measured (actual) motion vector (adjusted mv_l1) of the current block.

The offset information may include an offset vector or an offset index. The offset vector corresponds to information indicating a position indicated by "adjusted mv_l1" with respect to a position indicated by mv_l1. The offset index corresponds to information obtained by indexing candidates that may correspond to an offset vector. Hereinafter, each of the two types of offset information will be described by a separate embodiment.

Offset vector

In addition to motion_info_l0, the offset vector may be signaled by including the offset vector in the bitstream. As described above, since the offset vector corresponds to the difference between the adjusted mv_l1 and the (unadjusted) mv_l1, the offset vector may be expressed as a motion vector difference (mvd). In addition, since the offset vector corresponds to a difference between a motion vector derived using a motion linear relationship and a measured motion vector of the current block, the offset vector can be distinguished from mvd (a difference between mvp derived from a motion vector of a neighboring block and mv of the current block) used in the conventional method. In the present embodiment, information signaled from the video encoding apparatus to the video decoding apparatus for bi-prediction is expressed in syntax as shown in table 6 below.

TABLE 6

In table 6 above, mvd_l1 may be mvd or an offset vector used in the conventional method. For the current block 620, mvd used in the conventional method may be signaled as mvd_l1 when the linear relationship of motion is not established, and an offset vector may be signaled as mvd_l1 when the linear relationship of motion is established.

As shown in table 6, motion_info_l0 may be signaled from the video encoding device to the video decoding device. The signaled motion_info_l0 may include ref_idx_l0, mvd_l0, and mvp_l0_flag as shown in table 6. Ref_idx_l1 may also be signaled by including ref_idx_l1 in the bitstream.

The video decoding apparatus sets reference pictures indicated by the signaled reference picture information (ref_idx_l0 and ref_idx_l1) as reference pictures (ref_l0 and ref_l1) for inferring mv_l1 (for predicting the current block).

When motion_info_l0 is decoded (S1110), the video decoding apparatus may infer or derive mv_l0 by using mvp_l0_flag and mvd_l0 (S1120). Equation 1 may be used in this process. Also, the video decoding apparatus may decode ref_idx_l1 and mvd_l1 from the bitstream (S1130 and S1140). Here, mvd_l1 may correspond to any one of mvd and an offset vector of the conventional method depending on whether a linear relationship is established.

The video decoding apparatus may use the linear_mv_coding_enabled_flag to determine whether to activate/deactivate the motion vector derivation function (S1150). When the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is activated, the linear_mv_coding_flag may be decoded from the bitstream (S1160).

When the linear_mv_coding_flag indicates that the linear relationship of motion is established (S1170), the video decoding apparatus may derive mv_l1 on the premise that the linear relationship of motion is established (S1180). This process can be implemented by applying the reference pictures (ref_l0 and ref_l1) and mv_l0 to equation 3.

The video decoding apparatus may adjust or correct mv_l1 by applying an offset vector (mvd_l1) to the derived mv_l1 (S1182). Specifically, mv_l1 may be adjusted such that the adjusted mv_l1 indicates a position shifted by the offset vector mvd_l1 with the position indicated by mv_l1 as the origin. The adjustment of mv_l1 may be understood as applying the offset vector (mvd_l1) to the assumed predicted motion vector (mvp) under the assumption that the derived mv_l1 is the predicted motion vector (mvp) in the second direction.

Meanwhile, when the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is disabled in operation S1150, or the linear_mv_coding_flag does not indicate that the linear relationship of motion is established in operation S1170, the video decoding apparatus may derive mv_l1 by a conventional method instead of the derivation method proposed by the present invention. Specifically, the video decoding apparatus may decode mvp_l1_flag (S1190), and derive mv_l1 by summing mvp_l1 indicated by mvp_l1_flag with mvd_l1 decoded in S1140 (S1192). Here, mvd_l1 corresponds to mvd used in the conventional method.

Syntax elements for the above-described embodiments are shown in table 7 below.

TABLE 7

Fig. 11 illustrates an operation of determining the linear_mv_coding_enabled_flag (S1150) and an operation of decoding and determining the linear_mv_coding_flag (S1160 and S1170) performed after an operation of decoding mvd_l1 (S1140), but operations S1150 to S1170 may be performed before an operation of decoding motion_info_l0 (S1110).

An example of deriving mv_l1 based on the present embodiment is illustrated in fig. 12. As shown in fig. 12, mv_l1 may be derived on the premise that a linear relationship is established between mv_l0 (solid arrow) and mv_l1 (broken arrow).

Further, assuming that the derived mv_l1 is a predicted motion vector, mv_l1 may be adjusted by moving the position indicated by mv_l1 according to the direction and the magnitude indicated by the offset vector mvd_l1. May be based on the reference block 630 indicated by mv_l0 and the reference block indicated by the adjusted second motion vector (mv _A L 1) to predict the current block 620 with reference to block 640.

Offset index

In addition to motion_info_l0, the offset index may be signaled by including the offset index in the bitstream. As described above, the offset index corresponds to an index indicating any one of one or more preset offset vector candidates (candidates that may correspond to offset vectors).

In the present embodiment, information for bi-prediction signaled from the video encoding apparatus to the video decoding apparatus is expressed in syntax as shown in table 8 below.

TABLE 8

In table 8 above, mv_offset indicates a syntax element corresponding to an offset index. The motion_info_l0 may be signaled from the video encoding device to the video decoding device by including the motion_info_l0 in the bitstream. The signaled motion_info_l0 may include ref_idx_l0, mvd_l0, and mvp_l0_flag as shown in table 8. Ref_idx_l1 may also be signaled by including ref_idx_l1 in the bitstream. The video decoding apparatus sets the reference pictures indicated by the signaled reference picture information ref_idx_l0 and ref_idx_l1 to the reference pictures ref_l0 and ref_l1 used for inferring mv_l1.

When motion_info_l0 is decoded (S1310), the video decoding apparatus may infer or derive mv_l0 by using mvp_l0_flag and mvd_l0 included in motion_info_l0 (S1320). Equation 1 may be used in this process. Also, the video decoding apparatus may decode ref_idx_l1 (S1330).

The video decoding apparatus may determine whether to activate or deactivate the motion vector derivation function by analyzing the linear_mv_coding_enabled_flag (S1340). When the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is activated, the linear_mv_coding_flag may be decoded from the bitstream (S1350).

When the linear_mv_coding_flag indicates that a linear relationship of motion is established (S1360), the video decoding apparatus decodes the offset index mv_offset (S1370), and can derive mv_l1 on the premise that a linear relationship between mv_l0 and mv_l1 is established (S1380). This process can be implemented by applying mv_l0 and bi-directional reference pictures (ref_l0 and ref_l1) to equation 3.

The video decoding apparatus may adjust or correct mv_l1 by applying an offset vector candidate indicated by an offset index (mv_offset) to the derived mv_l1 (S1382). Specifically, mv_l1 may be adjusted by adding an offset vector candidate indicated by an offset index (mv_offset) to mv_l1. In other words, the adjustment mv_l1 can be understood as applying the offset vector candidate indicated by the offset index (mv_offset) to the assumed predicted motion vector on the assumption that the derived mv_l1 is the predicted motion vector (mvp) in the second direction.

Meanwhile, when the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is disabled in operation S1340 or the linear_mv_coding_flag does not indicate that the linear relationship of motion is established in operation S1360, mv_l1 may be derived by a conventional method instead of the derivation method proposed by the present invention. Specifically, the video decoding apparatus may decode mvd_l1 and mvp_l1_flag from the bitstream (S1390 and S1392), and derive mvl 1 by summing mvp_l1 and mvd_l1 indicated by mvp_l1_flag (S1394).

The syntax elements of the above embodiment are shown in table 9 below.

TABLE 9

Fig. 13 illustrates an operation of determining the linear_mv_coding_enabled_flag (S1340) and an operation of decoding and determining the linear_mv_coding_flag (S1350 and S1360) performed after an operation of decoding ref_idx_l1 (S1330), but operations S1340 to S1360 may be performed before an operation of decoding the motion_info_l0 (S1310).

Various types of offset vector candidates used in the present embodiment are illustrated in fig. 14. Fig. 14 (a) illustrates an offset vector candidate (circle with empty inside) when a motion of 4-point offset is allowed. The filled circles represent mv_l1 derived based on the linear relationship of motion. When motion of 4-point offset is allowed, an offset index of 2-bit Fixed Length (FL) may be used to indicate any one of the offset vector candidates.

The offset vector candidates when motion of 8-point offset is allowed are illustrated in fig. 14 (b). The 8-point offset vector candidate may be configured by adding four offset vector candidates (circles filled with vertical patterns) to the 4-point offset vector candidate. When allowing movement of 8-point offset, an offset index of 3-bit fixed length may be used to indicate any one of the offset vector candidates.

The offset vector candidates when motion of 16-point offset is allowed are illustrated in fig. 14 (c). The 16-point offset vector candidates may be configured by adding 8 offset vector candidates (circles filled with horizontal patterns) to the 8-point offset vector candidates. When motion of 16-point offset is allowed, an offset index of a fixed length of 4 bits may be used to indicate any one of the offset vector candidates.

Another example of the case of allowing the 16-point offset motion is illustrated in fig. 14 (d). The 16-point offset vector candidate may be configured by combining the 8-point offset vector candidate filled with the horizontal pattern and the 8-point offset vector candidate filled with the diagonal pattern. When motion of 16-point offset is allowed, an offset index of a fixed length of 4 bits may be used to indicate any one of the offset vector candidates.

Which of the various types of offset vector candidates described with reference to fig. 14 is set may be determined or defined at one or more locations of the picture level header, the tile group header, the tile header, and/or the CTU header. That is, the shape of the offset vector candidate may be determined using information (identification information) signaled from the video encoding apparatus, and the identification information may be defined at the above-described various positions. Since the identification information determines or identifies any one of various types of offset vector candidates, the number of offset vector candidates, the size of each candidate, and the direction of each candidate can be determined by the identification information.

In addition, which of the various types of offset vector candidates is set may be determined in advance by using the same rule at the video encoding apparatus and the video decoding apparatus.

Fourth embodiment

The fourth embodiment corresponds to the following method: motion_info_l0 is used without signaling to derive the motion of the direction that establishes a linear relationship among the horizontal and vertical motion directions, while additional signaled information (offset information) is used to adjust the motion of the direction that does not establish a linear relationship.

For example, when a linear relationship is established for only the horizontal axis component of motion, the horizontal axis component of derived mv_l1 may be used unmodified, but the vertical axis component for which no linear relationship is established is adjusted by using additional signaled offset information. As another example, when a linear relationship is established for only the vertical axis component of motion, the vertical axis component of the derived mv_l1 is used unmodified, but the horizontal axis component for which no linear relationship is established is adjusted by using additional signaled offset information.

The fourth embodiment may be implemented in a form of combining the above-described embodiments 3-1 or 3-2. Hereinafter, a form of combining the fourth embodiment with embodiment 3-1 and a form of combining the fourth embodiment with embodiment 3-2 will be described, respectively.

Embodiment 4-1

Embodiment 4-1 corresponds to a combination of the fourth embodiment and embodiment 3-1. In the present embodiment, information signaled from the video encoding apparatus to the video decoding apparatus for bi-prediction is expressed in syntax as shown in table 10 below.

TABLE 10

In table 10, mvd_l1 may be offset information (offset vector) or mvd of a conventional method. For example, mvd_l1 may be an offset vector of a horizontal axis component when no linear relationship is established for the horizontal axis component, and mvd_l1 may be an offset vector of a vertical axis component when no linear relationship is established for the vertical axis component. Further, when a linear relationship is not established for both the horizontal axis component and the vertical axis component, mvd_l1 may be mvd of the conventional method. When a linear relationship is established for both the horizontal axis component and the vertical axis component, mvd_l1 is not signaled.

motion_info_l0 may be signaled from the video encoding device to the video decoding device by being included in the bitstream. The signaled motion_info_l0 may include ref_idx_l0, mvd_l0, and mvp_l0_flag. Ref_idx_l1 may also be signaled by including ref_idx_l1 in the bitstream. The video decoding apparatus sets reference pictures indicated by the signaled reference picture information (ref_idx_l0 and ref_idx_l1) as reference pictures (ref_l0 and ref_l1) for inferring mv_l1.

When motion_info_l0 is decoded (S1510), the video decoding apparatus may infer or derive mv_l0 by using mvp_l0_flag and mvd_l0 (S1520). Equation 1 may be used in this process. Also, the video decoding apparatus may decode ref_idx_l1 from the bitstream (S1530).

When the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is activated (S1540), the video decoding apparatus decodes the linear_mv_coding_idc from the bitstream (S1550). Here, linear_mv_coding_idc is information indicating whether the motion has a linear relationship, and a component in which a linear relationship is established among a horizontal axis component and a vertical axis component of the motion can be indicated by using the information.

When linear_mv_coding_idc=none (S1560), mvp_l1_flag and mvd_l1 are signaled as in the conventional method since no linear relationship is established for both components. Accordingly, the video decoding apparatus may decode mvp_l1_flag and mvd_l0 from the bitstream (S1562) and derive mvl 1 by using the decoded information (S1564). Also, when the linear_mv_coding_enabled_flag does not indicate that the motion vector derivation function is activated in operation S1540, the video decoding apparatus may derive mv_l1 by using the decoded mvp_l1_flag and mvd_l1 (S1562 and S1564).

When linear_mv_coding_idc=x (S1570), since a linear relationship is established only for the horizontal axis component (x), an offset vector (mvd_l1, y) for the vertical axis component (y) for which no linear relationship is established is signaled. Accordingly, the video decoding apparatus decodes an offset vector (mvd_l1, y) for the vertical axis component (S1572), and derives mv_l1 using a linear relationship. Also, the video decoding apparatus may adjust mv_l1 by applying an offset vector (mvd_l1, y) for the vertical axis component to the derived mv_l1 (S1576).

The video decoding device may use the "derived mv_l1" for the horizontal axis component and the adjusted second motion vector (mva_l1) for the vertical axis component without modification. The horizontal axis component of the derived mv_l1 and the horizontal axis component of the adjusted second motion vector (mva_l1) may be the same.

When linear_mv_coding_idc=y (S1580), since a linear relationship is established only for the vertical axis component, an offset vector (mvd_l1, x) for the horizontal axis component for which a linear relationship is not established is signaled. Accordingly, the video decoding apparatus may decode the offset vector (mvd_l1, x) for the horizontal axis component (S1582), and apply the offset vector (mvd_l1, x) for the horizontal axis component to the derived mv_l1 (S1584) by using the linear relationship to adjust mv_l1 (S1586).

The video decoding device may use the "derived mv_l1" for the vertical axis component without modification and the adjusted second motion vector (mva_l1) for the horizontal axis component. The vertical axis component of the derived mv_l1 may be the same as the vertical axis component of the adjusted second motion vector (mva_l1).

When linear_mv_coding_idc= (x & y) (S1580), mvd_l1 (offset information or mvd information in the second direction) is not signaled because a linear relationship is established for both the horizontal axis component and the vertical axis component. In this case, the video decoding apparatus derives mv_l1 by using motion_info_l0 and ref_idx_l1 (S1590).

The syntax elements of embodiment 4-1 are shown in table 11 below.

TABLE 11

Fig. 15 illustrates that the operation of determining the linear_mv_coding_enabled_flag (S1540) and the operation of decoding and determining the linear_mv_coding_idc (S1550 to S1580) may be performed after the operation of decoding ref_idx_l1 (S1530), but the operations S1540 to S1580 may be performed before the operation of decoding the motion_info_l0 (S1510).

Embodiment 4-2

Embodiment 4-2 corresponds to a combination of the fourth embodiment and embodiment 3-2. In this embodiment, the information for bi-prediction signaled from the video encoding device to the video decoding device is expressed in syntax as shown in table 10 above.

In table 10, mvd_l1 may be offset information (offset vector) or mvd of a conventional method. For example, mvd_l1 may be an offset vector for a horizontal axis component when no linear relationship is established for the horizontal axis component, and mvd_l1 may be an offset vector for a vertical axis component when no linear relationship is established for the vertical axis component. Also, when a linear relationship is not established for both the horizontal axis component and the vertical axis component, mvd_l1 may be mvd of the conventional method. When a linear relationship is established for both the horizontal axis component and the vertical axis component, mvd_l1 may be an offset vector for both components.

When motion_info_l0 is decoded (S1610), the video decoding apparatus may infer or derive mv_l0 by using mvp_l0_flag and mvd_l0 (S1620). Equation 1 may be used in this process. Further, the video decoding apparatus may decode ref_idx_l1 from the bitstream (S1630).

When the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is activated (S1640), the video decoding apparatus decodes linear_mv_coding_idc from the bitstream (S1650).

When linear_mv_coding_idc=none (S1660), mvp_l1_flag and mvd_l1 are signaled as in the conventional method since no linear relationship is established for the two components. Accordingly, the video decoding apparatus may decode mvp_l1_flag and mvd_l1 from the bitstream (S1662) and derive mvl 1 by using the decoded information (S1664). Even when the linear_mv_coding_enabled_flag does not indicate that the motion vector derivation function is activated in operation S1640, the video decoding apparatus may derive mv_l1 by using the decoded mvp_l1_flag and mvd_l1 (S1662 and S1664).

When linear_mv_coding_idc=x (S1670), since a linear relationship is established only for the horizontal axis component, an offset vector (mvd_l1, y) for the vertical axis component for which no linear relationship is established is signaled. Accordingly, the video decoding apparatus decodes the offset vector (mvd_l1, y) for the vertical axis component (S1672), and derives mv_l1 using the linear relationship (1674). Then, the video decoding apparatus may adjust mv_l1 by applying an offset vector (mvd_l1, y) for the vertical axis component to the derived mv_l1 (S1676).

The video decoding device may use "derived mv_l1" for the horizontal axis component without change and use the adjusted second motion vector (mva_l1) for the vertical axis component. The horizontal axis component of the derived mv_l1 and the horizontal axis component of the adjusted second motion vector (mva_l1) may be the same.

When linear_mv_coding_idc=y (S1680), since a linear relationship is established only for the vertical axis component, an offset vector (mvd_l1, x) for the horizontal axis component for which a linear relationship is not established is signaled. Accordingly, the video decoding apparatus may decode the offset vector (mvd_l1, x) for the horizontal axis component (S1682), derive mv_l1 derived by using the linear relationship (S1684), and apply the offset vector (mvd_l1, x) for the horizontal axis component to the derived mv_l1 to adjust mv_l1 (S1686).

The video decoding apparatus may use the "derived mv_l1" for the vertical axis component without modification, and use the adjusted second motion vector (mva_l1) for the horizontal axis component. The vertical axis component of the derived mv_l1 may be the same as the vertical axis component of the adjusted second motion vector (mva_l1).

When linear_mv_coding_idc= (x & y) (S1680), since a linear relationship is established for both the horizontal axis component and the vertical axis component, offset vectors (mvd_l1, x and y) for both the horizontal axis component and the vertical axis component are signaled. Accordingly, the video decoding apparatus decodes offset vectors (mvd_l1, x, and y) for both the horizontal axis component and the vertical axis component from the bitstream (S1690), and can derive mv_l1 using a linear relationship by applying the offset vectors (mvd_l1, x, and y) (S1692) (S1694).

The syntax elements of embodiment 4-2 are shown in table 12 below.

TABLE 12

Fig. 16 illustrates that the operation of determining the linear_mv_coding_enabled_flag (S1640) and the operation of decoding and determining the linear_mv_coding_idc (S1650 to S1680) may be performed after the operation of decoding ref_idx_l1 (S1630), but the operations S1640 to S1680 may be performed before the operation of decoding the motion_info_l0 (S1610).

An example of deriving mv_l1 based on the fourth embodiment is illustrated in fig. 17. The example shown in fig. 17 corresponds to an example in which a linear relationship is established for the vertical axis component.

As shown in fig. 17, mv_l1 may be derived on the premise that a linear relationship is established between mv_l0 (solid arrow) and mv_l1 (broken arrow).

Since no linear relationship is established for the horizontal axis component, mv_l1 can be adjusted by moving the position indicated by the derived mv_l1 in the horizontal axis direction according to the magnitude indicated by the offset vector mvd_l1. By using the vertical axis component of mv_l1 without modification and the adjusted secondThe horizontal axis component of the motion vector mva_l1 to derive the final motion vector mv in the second direction _A L1. May be based on the reference block 630 indicated by mv_l0 and the reference block indicated by the adjusted second motion vector (mv _A L 1) to predict the current block 620 with reference to block 640.

Fifth embodiment

The fifth embodiment corresponds to a method of using a preset reference picture as a reference picture for deriving mv_l1. The preset reference picture refers to a reference picture to be used preset when the motion linearity relationship is established.

In the fifth embodiment, the reference picture information (ref_idx_l0 and ref_idx_l1) is not signaled in units of blocks, but may be signaled at a high level. Here, the high level may correspond to one or more of a picture level header, a tile group level header, a slice header, a tile header, and/or a CTU header. The preset reference picture may be referred to as a "representative reference picture" or a "linear reference picture", and the reference picture information signaled at a high level may be referred to as "representative reference picture information" or "linear reference picture information". When a linear relation of motion is established, a preset linear reference picture is used in units of blocks.

The linear reference picture information signaled in the tile set header is shown in table 13 below.

TABLE 13

In table 13, each of linear_ref_idx_l0 and linear_ref_idx_l1 represents linear reference picture information signaled for each direction.

Fig. 18 illustrates an example of a method of designating a reference picture by signaling reference picture information for each block in the conventional method or a method of designating a linear reference picture by the method proposed in the present invention.

The linear reference picture information (linear_ref_idx_l0 and linear_ref_idx_l1) may be signaled from the video encoding apparatus to the video decoding apparatus through the high level. The video decoding apparatus may set the linear reference pictures (linear_ref_l0 and linear_ref_l1) by selecting the reference pictures indicated by the signaled linear reference picture information (linear_ref_idx_l0 and linear_ref_idx_l1) within the reference picture list.

When the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is activated (S1810), the video decoding apparatus decodes the linear_mv_coding_flag from the bitstream (S1820).

When the linear_mv_coding_flag indicates that the linear relationship of motion is established (S1830), the video decoding apparatus may derive reference pictures (ref_l0 and ref_l1) for deriving mv_l1 using preset linear reference pictures (linear_ref_l0 and linear_ref_l1) (S1840 and S1850). That is, preset linear reference pictures (linear_refl0 and linear_refl1) may be set as the reference pictures (ref_l0 and ref_l1).

Meanwhile, when the linear_mv_coding_enabled_flag does not indicate that the motion vector derivation function is activated in operation S1810 or the linear_mv_coding_flag does not indicate that the linear relationship of motion is established in operation S1830, the reference picture information (ref_idx_l0 and ref_idx_l1) may be signaled. The video decoding apparatus may decode the reference picture information (ref_idx_l0 and ref_idx_l1) (S1860 and S1870) and set the reference picture using the reference picture information.

The method of setting a reference picture proposed by the present invention may be implemented in combination with the above-described embodiments. Fig. 19 illustrates a form in which the method of setting a reference picture proposed by the present invention and the above-described embodiment 3-1 are combined.

For the first direction, when the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is activated (S1910), the linear_mv_coding_flag is decoded (S1920). When the linear_mv_coding_flag indicates that the linear relationship of motion is established, a preset linear reference picture (linear_ref_l0) may be derived as the reference picture (ref_l0) (S1940). On the other hand, when the linear_mv_coding_enabled_flag does not indicate that the motion vector derivation function is activated or the linear_mv_coding_flag does not indicate that the linear relationship of motion is established, the reference picture (ref_l0) may be set by using the reference picture information (ref_idx_l0) decoded from the bitstream (S1962).

When the derivation or setting of the reference picture for the first direction is completed, mvd_l0 and mvp_l0_flag are decoded (S1950), and mv_l0 may be derived using the decoded information (S1960).

For the second direction, when the linear_mv_coding_flag indicates that the linear relationship of motion is established (S1970), a reference picture (ref_l1) may be derived or set using a preset linear reference picture (linear_ref_l1) (S1972). On the other hand, when the linear_mv_coding_flag does not indicate that the linear relationship of motion is established, the reference picture (ref_l1) may be set using the reference picture information (ref_idx_l1) decoded from the bitstream (S1974).

When the derivation or setting of the reference picture for the second direction is completed, mv_l1 having a linear relationship with mv_l0 may be derived when the linear_mv_coding_flag indicates that the linear relationship of motion is established (S1980) (S1982). On the other hand, when the linear_mv_coding_flag does not indicate that a linear relationship of motion is established (S1980), mvd_l1 and mvp_l1_flag decoded from the bitstream (S1990 and S1992) may be used to derive mvl 1 (S1994).

The syntax elements of the above embodiment are shown in table 14 below.

TABLE 14

Fig. 20 illustrates a combination of the method of setting a reference picture proposed by the present invention and the above-described embodiment 3-2.

For the first direction, when the linear_mv_coding_enabled_flag indicates that the motion vector derivation function is activated (S2010), the linear_mv_coding_flag is decoded (S2020). When the linear_mv_coding_flag indicates that the linear relationship of motion is established (S2030), a reference picture (ref_l0) may be derived or set using a preset linear reference picture (linear_ref_l0) (S2040). On the other hand, when the linear_mv_coding_enabled_flag does not indicate that the motion vector derivation function is activated (S2010) or the linear_mv_coding_flag does not indicate that the linear relationship of motion is established (S2030), the reference picture (ref_l0) may be set using the reference picture information (ref_idx_l0) decoded from the bitstream (S2062).

When the derivation or setting of the reference picture for the first direction is completed, mvd_l0 and mvp_l0_flag are decoded (S2050), and mv_l0 may be derived using the decoded information (S2060).

For the second direction, when the linear_mv_coding_flag indicates that the linear relationship of motion is established (S2070), a reference picture (ref_l1) may be derived or set using a preset linear reference picture (linear_ref_l1) (S2072). On the other hand, when the linear_mv_coding_flag does not indicate that the linear relationship of motion is established, the reference picture (ref_l1) may be set using the reference picture information (ref_idx_l1) decoded from the bitstream (S2074).

When the derivation or setting of the reference picture for the second direction is completed, mvd_l1 is decoded from the bitstream (S2080), and mvd_l1 corresponds to an offset vector or mvd of the conventional method, as in embodiment 3-2.

When the linear_mv_coding_flag indicates that the linear relationship of motion is established (S2090), mv_l1 having a linear relationship with mv_l0 is derived (S2092), and mv_l1 may be adjusted by applying an offset vector (mvd_l1) to the derived mv_l1 (S2094). On the other hand, when the linear_mv_coding_flag does not indicate that the linear relationship of motion is established (S2090), mvl 1 may be derived using mvp_l1_flag decoded from the bitstream (S2096 and S2098). In this process, mvp_l1 indicated by mvp_l1_flag and decoded mvd_l1 (mvd of the conventional method) may be used.

Although exemplary embodiments of the present invention have been described for illustrative purposes, those skilled in the art will appreciate that various modifications, adaptations, and variations are possible without departing from the spirit and scope of the present invention. For brevity and clarity, example embodiments have been described. Thus, it will be understood by those of ordinary skill in the art that the scope of the present invention is not limited by the embodiments explicitly described above, but includes the claims and their equivalents.

Cross Reference to Related Applications

The present application claims priority from patent application No. 10-2018-0171254 filed in korea on 12 months of 2018 and patent application No. 10-2019-0105769 filed in korea on 8 months of 2019, the contents of which are incorporated herein by reference in their entirety.

Claims

1. A method of inter-predicting a current block using any one of a plurality of bi-prediction modes, the method comprising the steps of:

decoding enabling information, the enabling information indicating whether a first mode of the plurality of bi-predictive modes is allowed;

decoding mode information at a block level of the current block in a bitstream when the enable information indicates that the first mode is allowed, the mode information indicating whether to apply the first mode to the current block;

when the mode information indicates that the first mode is applied to the current block,

decoding first motion information including differential motion vector information and predicted motion vector information for a first motion vector and second motion information excluding at least a portion of the differential motion vector information and predicted motion vector information for a second motion vector from the bitstream; and

Deriving the first motion vector based on the first motion information and deriving the second motion vector based on at least a portion of the first motion information and based on the second motion information; and

predicting the current block using a reference block in a first reference picture indicated by the first motion vector and a reference block in a second reference picture indicated by the second motion vector,

wherein in the first mode, the first reference picture and the second reference picture are determined at a high level higher than the block level.

2. The method of claim 1, further comprising the step of: when the mode information indicates that the first mode is not applied to the current block,

decoding the first motion information and third motion information including the differential motion vector information and the predicted motion vector information for the second motion vector from the bitstream; and

deriving the first motion vector based on the first motion information and the second motion vector based on the third motion information.

3. The method of claim 1, wherein when the enabling information indicates that the first mode is not activated, the mode information is not decoded from the bitstream and is set to indicate that the first mode is not applied.

4. The method of claim 1, wherein the enabling information is decoded at a sequence level, a picture level, a tile group level, or a slice level.

5. The method of claim 1, wherein the high level is a picture level, a tile group level, a slice level, a tile level, or a coding tree unit level.

6. The method of claim 1, wherein the first reference picture and the second reference picture are determined based on a picture order count, POC, difference between a reference picture and a current picture included in a reference picture list.

7. The method of claim 1, further comprising the step of: after deriving the second motion vector, adjusting the second motion vector by applying offset information included in the bitstream to the second motion vector,

wherein the current block is predicted by using a reference block in the second reference picture indicated by the adjusted second motion vector and a reference block in the first reference picture indicated by the first motion vector.

8. The method of claim 7, wherein the offset information is an offset vector having a position indicated by the second motion vector as an origin, and

The adjusting includes adjusting the second motion vector to a position indicated by the offset vector.

9. The method of claim 7, wherein the offset information is an offset index indicating any one of a plurality of preset offset vector candidates, and

the adjusting includes adjusting the second motion vector by applying an offset vector candidate indicated by the offset index to the second motion vector.

10. A video encoding method for inter-predicting a current block using any one of a plurality of bi-prediction modes, the method comprising the steps of:

encoding enablement information indicating whether a first mode of the plurality of bi-predictive modes is allowed;

when the enabling information indicates that the first mode is allowed, encoding mode information at a block level of the current block, the mode information indicating whether to apply the first mode to the current block; and

encoding and signaling first motion information including differential motion vector information and predictive motion vector information for a first motion vector and second motion information not including at least a portion of differential motion vector information and predictive motion vector information for a second motion vector, and

Encoding a residual block, which is a difference between the current block and a prediction block of the current block, wherein the prediction block is generated by using a reference block indicated by the first motion vector in a first reference picture and a reference block indicated by the second motion vector in a second reference picture,

11. A method for transmitting a bitstream containing encoded video data, the method comprising the steps of:

generating the bitstream by encoding the current block using any one of a plurality of bi-prediction modes; and

transmitting the bit stream to a video decoding device,

wherein generating the bitstream comprises:

encoding first motion information including differential motion vector information and predictive motion vector information for a first motion vector and second motion information excluding at least a part of differential motion vector information and predictive motion vector information for a second motion vector, and