CN117157976A

CN117157976A - Video coding method and apparatus using block vectors with adaptive spatial resolution

Info

Publication number: CN117157976A
Application number: CN202280026553.7A
Authority: CN
Inventors: 全炳宇; 金范允; 李侑津; 朴胜煜
Original assignee: Sungkyunkwan University School Industry Cooperation; Hyundai Motor Co; Kia Corp
Current assignee: Sungkyunkwan University School Industry Cooperation; Hyundai Motor Co; Kia Corp
Priority date: 2021-04-02
Filing date: 2022-03-28
Publication date: 2023-12-01

Abstract

The present application relates to methods and apparatus for video coding using block vectors with adaptive spatial resolution. The present embodiment provides a method and apparatus for video coding, in which, in order to improve coding efficiency when Intra Block Copy (IBC) is applied to a current block, spatial resolution of a Block Vector (BV) indicating a position of a reference block is adaptively signaled or a symbol of a block vector difference is adaptively signaled.

Description

Video coding method and apparatus using block vectors with adaptive spatial resolution

Cross Reference to Related Applications

The present application claims priority from korean patent application No. 10-2021-0043574, filed on 2 nd 4 th year 2021, and korean patent application No. 10-2022-0037102, filed on 25 th 3 rd year 2022, the entire contents of which are incorporated herein by reference.

Technical Field

The present disclosure relates to a video coding method and apparatus using block vectors with adaptive spatial resolution.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Since video data has a large amount of data compared with audio or still image data, the video data requires a large amount of hardware resources (including a memory) to store or transmit the video data without compression processing.

Thus, encoders are commonly used to compress and store or transmit video data. The decoder receives the compressed video data, decompresses the received compressed video data, and plays the decompressed video data. Video compression techniques include h.264/AVC, high Efficiency Video Coding (HEVC), and Versatile Video Coding (VVC), which have improved coding efficiency of over about 30% compared to HEVC.

However, since the image size, resolution, and frame rate gradually increase, the amount of data to be encoded also increases. Thus, new compression techniques are needed that provide higher coding efficiency and improved image enhancement than existing compression techniques.

Unlike video (hereinafter, referred to as 'natural video') generally acquired using an image sensor, video generated by a computer (hereinafter, referred to as 'screen content') shows characteristics excluding a repetitive pattern, strong edges, and noise. Because of these characteristics, when screen contents are encoded using conventional encoders such as HEVC and VVC, which focus on encoding natural video, there is a problem of low coding efficiency. This is because the characteristics of natural video and the characteristics of screen content are very different.

In order to solve this problem, a technology dedicated to encoding screen contents has recently been developed. Intra Block Copy (IBC) technology searches for a region (i.e., a reference block) most similar to a region to be encoded (or decoded) in a current picture (i.e., a current block), and then uses the searched region as a prediction value. When predicting a current block using the IBC mode, the reference block becomes a predicted value of the current block, and a displacement between the current block and the reference block may be expressed as a Block Vector (BV). The encoder encodes the block vector and transmits the encoded block vector to the decoder. In this case, in order to improve compression efficiency, the encoder does not transmit the block vector as it is. The encoder divides the BV into a Block Vector Predictor (BVP) and a Block Vector Difference (BVD). The encoder then encodes the BVP and BVD.

Meanwhile, in order to encode a block vector predictor, a merge mode and an Advanced Motion Vector Prediction (AMVP) mode used in encoding a motion vector may also be applied to encoding of a block vector. For example, in order to use a block vector used in a neighboring block as a block vector predictor, the encoder may signal an index indicating the selected block vector predictor to the decoder.

When signaling the block vector differences to the decoder, the encoder also transmits a flag indicating the spatial resolution of the block vector differences. In the prior art, the spatial resolution of the block vector difference is signaled as one of 1-pel and 4-pel according to the corresponding flag, and the encoder and decoder determine the spatial resolution of the block vector and the spatial resolution of the block vector difference from these resolutions. Thus, the block vector cannot be represented according to sub-pixel units smaller than one pixel unit. This is because the existing coding method is designed on the premise that: unlike natural video, conventional screen content is generated using a computer, and thus sub-pixel units smaller than pixel units are not required. However, unlike conventional screen contents, in the ultra-high definition screen contents recently generated, there are many cases in which it is advantageous to represent block vectors in units of subpixels. Therefore, in order to improve the coding efficiency of the screen contents, a method for efficiently encoding the spatial resolution of the block vector may need to be considered.

Disclosure of Invention

[ problem ]

The present disclosure is directed to a video coding method and apparatus for adaptively signaling a spatial resolution of a Block Vector (BV) indicating a position of a reference block or adaptively signaling a sign of a block vector difference to improve coding efficiency when an Intra Block Copy (IBC) is applied to a current block.

[ technical solution ]

At least one aspect of the present disclosure provides a method for adaptively generating a block vector of a current block in an Intra Block Copy (IBC) mode performed by a video decoding apparatus. The method includes decoding a block vector predictor index, an absolute value of a block vector difference, and a block vector spatial resolution precision index from a bitstream. The method also includes generating a block vector predictor candidate list for the current block, and generating a block vector predictor from the block vector predictor candidate list using the block vector predictor index. The method also includes generating a block vector spatial resolution candidate list, and generating a block vector spatial resolution from the block vector spatial resolution candidate list using the block vector spatial resolution precision index. The method also includes deriving or decoding a sign of the block vector difference using the block vector spatial resolution, the block vector predictor, and an absolute value of the block vector difference. The method also includes generating the block vector difference by combining the absolute value of the block vector difference and the symbol. The method also includes generating the block vector by combining the block vector predictor and the block vector difference.

Another aspect of the present invention provides a method for adaptively generating a block vector of a current block in an Intra Block Copy (IBC) mode performed by a video decoding apparatus. The method includes decoding a block vector predictor index and an absolute value of a block vector difference from a bitstream. The method also includes generating a block vector predictor candidate list for the current block, and generating a block vector predictor from the block vector predictor candidate list using the block vector predictor index. The method also includes deriving a block vector spatial resolution, and deriving or decoding a sign of the block vector difference using the block vector spatial resolution, the block vector predictor, and an absolute value of the block vector difference. The method also includes generating the block vector difference by combining the absolute value of the block vector difference and the symbol. The method also includes generating the block vector by combining the block vector predictor and the block vector difference.

Yet another aspect of the present invention provides a method for adaptively generating a block vector of a current block in an Intra Block Copy (IBC) mode performed by a video encoding apparatus. The method includes obtaining from a high level a block vector predictor index and an absolute value of a block vector difference. The method also includes generating a block vector predictor candidate list for the current block, and generating a block vector predictor from the block vector predictor candidate list using the block vector predictor index. The method also includes deriving a block vector spatial resolution, and deriving or obtaining a sign of the block vector difference using the block vector spatial resolution, the block vector predictor, and an absolute value of the block vector difference. The method also includes generating the block vector difference by combining the absolute value of the block vector difference and the symbol. The method also includes generating the block vector by combining the block vector predictor and the block vector difference.

[ beneficial effects ]

As described above, the present disclosure provides a video coding method and apparatus for adaptively signaling a spatial resolution of a Block Vector (BV) indicating a position of a reference block or adaptively signaling a symbol of a block vector difference when an Intra Block Copy (IBC) is applied to a current block. Thereby improving decoding efficiency.

Drawings

Fig. 1 is a block diagram illustrating a video encoding device that may implement the techniques of this disclosure.

Fig. 2 is a diagram illustrating a method for partitioning blocks using a quadtree plus binary tree trigeminal tree (QTBTTT) structure.

Fig. 3A and 3B are diagrams illustrating a plurality of intra prediction modes including a wide-angle intra prediction mode.

Fig. 4 is a view showing neighboring blocks of the current block.

Fig. 5 is a block diagram illustrating a video decoding device that may implement the techniques of this disclosure.

Fig. 6 is a diagram illustrating Intra Block Copy (IBC).

Fig. 7 is a flowchart showing a block vector transmission method for each IBC transmission method.

Fig. 8 is a flow chart illustrating a method for decoding a block vector difference.

Fig. 9 is a diagram illustrating a case of limiting symbols of a block vector difference according to an embodiment of the present disclosure.

Fig. 10 is a diagram illustrating block vector spatial resolution values of neighboring blocks according to an embodiment of the present disclosure.

Fig. 11 is a diagram illustrating the application of Adaptive Motion Vector Resolution (AMVR) to neighboring blocks according to an embodiment of the present disclosure.

Fig. 12 is a diagram illustrating block vector spatial resolution values of neighboring blocks according to another embodiment of the present disclosure.

Fig. 13A and 13B are diagrams illustrating positions of reference blocks according to block vectors according to another embodiment of the present disclosure.

Fig. 14 is a flowchart illustrating a method for adaptively generating a block vector of a current block, which is performed by a video encoding apparatus according to an embodiment of the present disclosure.

Fig. 15 is a flowchart illustrating a method performed by a video decoding apparatus for adaptively generating a block vector of a current block according to an embodiment of the present disclosure.

Fig. 16 is a flowchart illustrating a method performed by a video encoding apparatus for adaptively generating a block vector of a current block according to another embodiment of the present disclosure.

Fig. 17 is a flowchart illustrating a method performed by a video decoding apparatus for adaptively generating a block vector of a current block according to another embodiment of the present disclosure.

Detailed Description

Hereinafter, some embodiments of the present disclosure are described in detail with reference to the accompanying drawings. In the following description, like reference numerals denote like elements, although the elements are shown in different drawings. Furthermore, in the following description of some embodiments, detailed descriptions of related known components and functions may be omitted when it may be considered to obscure the subject matter of the present disclosure for the sake of clarity and conciseness.

Fig. 1 is a block diagram of a video encoding device in which the techniques of this disclosure may be implemented. Hereinafter, a video encoding apparatus and components of the apparatus are described with reference to the diagram of fig. 1.

The encoding apparatus may include a picture divider 110, a predictor 120, a subtractor 130, a transformer 140, a quantizer 145, a rearrangement unit 150, an entropy encoder 155, an inverse quantizer 160, an inverse transformer 165, an adder 170, a loop filter unit 180, and a memory 190.

Each component of the encoding apparatus may be implemented as hardware or software or as a combination of hardware and software. Further, the function of each component may be implemented as software, and a microprocessor may also be implemented to execute the function of the software corresponding to each component.

A video is made up of one or more sequences comprising a plurality of pictures. Each picture is divided into a plurality of regions, and encoding is performed on each region. For example, a picture is divided into one or more tiles (tiles) or/and slices (slices). Herein, one or more tiles may be defined as a tile set. Each tile or/and slice is divided into one or more Coding Tree Units (CTUs). In addition, each CTU is divided into one or more Coding Units (CUs) by a tree structure. Information applied to each CU is encoded as a syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as a syntax of the CTU. Further, information commonly applied to all blocks in one slice is encoded as a syntax of a slice header, and information applied to all blocks constituting one or more pictures is encoded as a Picture Parameter Set (PPS) or a picture header. Furthermore, information commonly referred to by a plurality of pictures is encoded as a Sequence Parameter Set (SPS). In addition, information commonly referenced by one or more SPS's is encoded as a Video Parameter Set (VPS). Furthermore, information commonly applied to one tile or group of tiles may also be encoded as the syntax of the tile or group of tiles header. The syntax included in the SPS, PPS, slice header, tile, or tile set header may be referred to as a high level syntax.

The picture divider 110 determines the size of a Coding Tree Unit (CTU). Information about the size of the CTU (CTU size) is encoded as a syntax of the SPS or PPS and delivered to the video decoding device.

The picture divider 110 divides each picture constituting a video into a plurality of Coding Tree Units (CTUs) having a predetermined size, and then recursively divides the CTUs by using a tree structure. Leaf nodes in the tree structure become Coding Units (CUs), which are the basic units of encoding.

The tree structure may be a Quadtree (QT) in which a higher node (or parent node) is divided into four lower nodes (or child nodes) of the same size. The tree structure may also be a Binary Tree (BT) in which a higher node is divided into two lower nodes. The tree structure may also be a Trigeminal Tree (TT), wherein the higher nodes are represented by 1:2: the ratio of 1 is divided into three lower nodes. The tree structure may also be a structure in which two or more of the QT structure, the BT structure, and the TT structure are mixed. For example, a quadtree plus binary tree (QTBT) structure may be used or a quadtree plus binary tree trigeminal tree (QTBTTT) structure may be used. Here, BTTT is added to the tree structure to be referred to as a multi-type tree (MTT).

Fig. 2 is a diagram for describing a method of dividing blocks by using the QTBTTT structure.

As shown in fig. 2, CTUs may be first divided into QT structures. Quadtree partitioning may be recursive until the size of the partitioned block reaches the minimum block size (MinQTSize) of leaf nodes allowed in QT. A first flag (qt_split_flag) indicating whether each node of the QT structure is divided into four nodes of a lower layer is encoded by the entropy encoder 155 and signaled to the video decoding apparatus. When the leaf node of QT is not greater than the maximum block size (MaxBTSize) of the root node allowed in BT, the leaf node may be further divided into at least one of BT structure and TT structure. There may be a plurality of division directions in the BT structure and/or the TT structure. For example, there may be two directions, i.e., a direction in which the blocks of the respective nodes are divided horizontally and a direction in which the blocks of the respective nodes are divided vertically. As shown in fig. 2, when the MTT division starts, a second flag (MTT _split_flag) indicating whether the node is divided and a flag additionally indicating a division direction (vertical or horizontal) and/or a flag indicating a division type (binary or ternary) if the node is divided are encoded by the entropy encoder 155 and signaled to the video decoding device.

Alternatively, a CU partition flag (split_cu_flag) indicating whether a node is partitioned may be encoded before a first flag (qt_split_flag) indicating whether each node is partitioned into four nodes of a lower layer is encoded. When the value of the CU partition flag (split_cu_flag) indicates that each node is not partitioned, the block of the corresponding node becomes a leaf node in the partition tree structure and becomes a CU as a basic unit of encoding. When the value of the CU partition flag (split_cu_flag) indicates that each node is partitioned, the video encoding apparatus first starts encoding the first flag through the above scheme.

When QTBT is used as another embodiment of the tree structure, there may be two types, i.e., a type in which a block of a corresponding node is horizontally divided into two blocks having the same size (i.e., symmetrical horizontal division) and a type in which a block of a corresponding node is vertically divided into two blocks having the same size (i.e., symmetrical vertical division). A partition flag (split_flag) indicating whether each node of the BT structure is partitioned into lower-layer blocks and partition type information indicating a partition type are encoded by the entropy encoder 155 and delivered to the video decoding apparatus. Meanwhile, a type in which a block of a corresponding node is divided into two blocks in an asymmetric form with each other may be additionally presented. The asymmetric form may include the blocks of the corresponding nodes being partitioned into blocks having 1:3, or may also include a form in which blocks of corresponding nodes are divided in a diagonal direction.

A CU may have different sizes according to QTBT or QTBTTT partitioning from CTUs. Hereinafter, a block corresponding to a CU to be encoded or decoded (i.e., a leaf node of QTBTTT) is referred to as a "current block". Since QTBTTT division is adopted, the shape of the current block may be a rectangular shape in addition to a square shape.

The predictor 120 predicts a current block to generate a predicted block. Predictor 120 includes an intra predictor 122 and an inter predictor 124.

In general, each of the current blocks in a picture may be predictively coded. In general, prediction of a current block may be performed by using an intra prediction technique (using data from a picture including the current block) or an inter prediction technique (using data from a picture coded before the picture including the current block). Inter prediction includes both unidirectional prediction and bi-directional prediction.

The intra predictor 122 predicts pixels in the current block by using pixels (reference pixels) located on neighbors of the current block in the current picture including the current block. Depending on the prediction direction, there are multiple intra prediction modes. For example, as shown in fig. 3A, the plurality of intra prediction modes may include 2 non-directional modes including a planar mode and a DC mode, and may include 65 directional modes. The neighboring pixels and the operational equation to be used are differently defined according to each prediction mode.

In order to perform efficient direction prediction on the current block having a rectangular shape, direction modes (# 67 to # 80), intra prediction modes # -1 to # -14) as indicated by dotted arrows in fig. 3B may be additionally used. The orientation mode may be referred to as a "wide-angle intra prediction mode". In fig. 3B, arrows indicate corresponding reference samples for prediction and do not represent the prediction direction. The predicted direction is opposite to the direction indicated by the arrow. When the current block has a rectangular shape, the wide-angle intra prediction mode is a mode in which prediction is performed in a direction opposite to a specific direction mode without additional bit transmission. In this case, in the wide-angle intra prediction mode, some wide-angle intra prediction modes available for the current block may be determined by a ratio of the width and the height of the current block having a rectangular shape. For example, when the current block has a rectangular shape with a height smaller than a width, wide-angle intra prediction modes (intra prediction modes #67 to # 80) having angles smaller than 45 degrees are available. When the current block has a rectangular shape having a width greater than a height, a wide-angle intra prediction mode having an angle greater than-135 degrees may be used.

The intra predictor 122 may determine intra prediction to be used for encoding the current block. In some embodiments, intra predictor 122 may encode the current block by using a plurality of intra prediction modes, and also select an appropriate intra prediction mode to be used from among the test modes. For example, the intra predictor 122 may calculate a rate-distortion value by using a rate-distortion analysis for a plurality of tested intra prediction modes, and also select an intra prediction mode having the best rate-distortion characteristics among the test modes.

The intra predictor 122 selects one intra prediction mode among a plurality of intra prediction modes, and predicts the current block by using neighboring pixels (reference pixels) and an operation equation determined according to the selected intra prediction mode. Information about the selected intra prediction mode is encoded by the entropy encoder 155 and delivered to a video decoding device.

The inter predictor 124 generates a prediction block for the current block by using a motion compensation process. The inter predictor 124 searches for a block most similar to the current block in a reference picture encoded and decoded earlier than the current picture, and generates a prediction block for the current block by using the searched block. In addition, a Motion Vector (MV) is generated, which corresponds to a displacement between a current block in the current picture and a predicted block in the reference picture. In general, motion estimation is performed on a luminance component, and a motion vector calculated based on the luminance component is used for both the luminance component and the chrominance component. Motion information including information on a reference picture and information on a motion vector for predicting a current block is encoded by the entropy encoder 155 and delivered to a video decoding apparatus.

The inter predictor 124 may also perform interpolation on reference pictures or reference blocks in order to increase the accuracy of prediction. In other words, sub-samples between two consecutive integer samples are interpolated by applying the filter coefficients to a plurality of consecutive integer samples comprising the two integer samples. When the process of searching for a block most similar to the current block is performed with respect to the interpolated reference picture, it is possible to represent not an integer-sampling-unit precision but a decimal-unit precision with respect to the motion vector. The precision or resolution of the motion vector may be set differently for each target region to be encoded (e.g., units such as slices, tiles, CTUs, CUs, etc.). When such Adaptive Motion Vector Resolution (AMVR) is applied, information on the motion vector resolution to be applied to each target region should be signaled for each target region. For example, when the target area is a CU, information about the resolution of a motion vector applied to each CU is signaled. The information on the resolution of the motion vector may be information representing the accuracy of a motion vector difference to be described below.

Meanwhile, the inter predictor 124 may perform inter prediction by using bi-directional prediction. In the case of bi-prediction, two reference pictures and two motion vectors representing block positions most similar to the current block in each reference picture are used. The inter predictor 124 selects a first reference picture and a second reference picture from the reference picture list0 (RefPicList 0) and the reference picture list1 (RefPicList 1), respectively. The inter predictor 124 also searches for a block most similar to the current block in the corresponding reference picture to generate a first reference block and a second reference block. In addition, a prediction block of the current block is generated by averaging or weighted-averaging the first reference block and the second reference block. In addition, motion information including information on two reference pictures for predicting the current block and information on two motion vectors is delivered to the entropy encoder 155. Here, the reference picture list0 may be constituted by a picture preceding a current picture among the pre-restored pictures in the display order, and the reference picture list1 may be constituted by a picture following the current picture among the pre-restored pictures in the display order. However, although not particularly limited thereto, a pre-restored picture following the current picture in display order may be additionally included in the reference picture list 0. Conversely, a pre-restored picture preceding the current picture may be additionally included in the reference picture list 1.

In order to minimize the number of bits consumed for encoding motion information, various methods may be used.

For example, when a reference picture and a motion vector of a current block are identical to those of a neighboring block, information capable of identifying the neighboring block is encoded to deliver motion information of the current block to a video decoding apparatus. This approach is called merge mode.

In the merge mode, the inter predictor 124 selects a predetermined number of merge candidate blocks (hereinafter, referred to as "merge candidates") from neighboring blocks of the current block.

As shown in fig. 4, all or some of a left block A0, a lower left block A1, an upper block B0, an upper right block B1, and an upper left block B2 adjacent to the current block in the current picture may be used as neighboring blocks for deriving the merge candidates. Furthermore, in addition to the current picture at which the current block is located, a block located within a reference picture (which may be the same as or different from the reference picture used to predict the current block) may also be used as a merge candidate. For example, a co-located (co-located) block with the current block within the reference picture or a block adjacent to the co-located block may be additionally used as a merge candidate. If the number of merging candidates selected by the method described above is smaller than the preset number, a zero vector is added to the merging candidates.

The inter predictor 124 configures a merge list including a predetermined number of merge candidates by using neighboring blocks. From among the merge candidates included in the merge list, a merge candidate to be used as motion information of the current block is selected, and merge index information for identifying the selected candidate is generated. The generated combined index information is encoded by the entropy encoder 155 and delivered to a video decoding device.

The merge skip mode is a special case of the merge mode. After quantization, when all transform coefficients used for entropy encoding are close to zero, only neighboring block selection information is transmitted without transmitting a residual signal. By using the merge skip mode, relatively high encoding efficiency can be achieved for images with slight motion, still images, screen content images, and the like.

Hereinafter, the merge mode and the merge skip mode are collectively referred to as a merge/skip mode.

Another method for encoding motion information is Advanced Motion Vector Prediction (AMVP) mode.

In AMVP mode, the inter predictor 124 derives a motion vector predictor candidate for a motion vector of a current block by using neighboring blocks of the current block. As neighboring blocks used to derive the motion vector predictor candidates, all or some of a left block A0, a lower left block A1, an upper block B0, an upper right block B1, and an upper left block B2, which are neighboring to the current block in the current picture shown in fig. 4, may be used. Furthermore, in addition to the current picture at which the current block is located, a block located within a reference picture (which may be the same as or different from a reference picture used to predict the current block) may also be used as a neighboring block used to derive a motion vector predictor candidate. For example, a block co-located with the current block within the reference picture or a block adjacent to the co-located block may be used. If the number of motion vector candidates selected by the above method is less than a preset number, a zero vector is added to the motion vector candidates.

The inter predictor 124 derives a motion vector predictor candidate by using motion vectors of neighboring blocks and determines a motion vector predictor for a motion vector of the current block by using the motion vector predictor candidate. In addition, a motion vector difference is calculated by subtracting a motion vector predictor from a motion vector of the current block.

The motion vector predictor may be obtained by applying a predefined function (e.g., a center value and average value calculation, etc.) to the motion vector predictor candidates. In this case, the video decoding device is also aware of the predefined function. Furthermore, since the neighboring block used to derive the motion vector predictor candidate is a block for which encoding and decoding have been completed, the video decoding apparatus may also already know the motion vector of the neighboring block. Therefore, the video encoding apparatus does not need to encode information for identifying motion vector predictor candidates. Thus, in this case, information on the motion vector difference and information on the reference picture for predicting the current block are encoded.

Meanwhile, the motion vector predictor may also be determined by selecting a scheme of any one of the motion vector predictor candidates. In this case, information for identifying the selected motion vector predictor candidate is additionally encoded in combination with information about the motion vector difference and information about the reference picture for predicting the current block.

The subtractor 130 generates a residual block by subtracting the prediction block generated by the intra predictor 122 or the inter predictor 124 from the current block.

The transformer 140 converts a residual signal in a residual block having pixel values of a spatial domain into transform coefficients of a frequency domain. The transformer 140 may transform a residual signal in the residual block by using the total size of the residual block as a transform unit, or may also divide the residual block into a plurality of sub-blocks and may perform the transform by using the sub-blocks as transform units. Optionally, the residual block is divided into two sub-blocks, a transform region and a non-transform region, respectively, to transform the residual signal using only the transform region sub-block as a transform unit. Here, the transform region sub-block may be 1 with a horizontal axis (or vertical axis) based: 1, one of two rectangular blocks of size ratio. In this case, a flag (cu_sbt_flag) indicates that only the sub-block is transformed, and directional (vertical/horizontal) information (cu_sbt_horizontal_flag) and/or position information (cu_sbt_pos_flag) is encoded by the entropy encoder 155 and signaled to the video decoding apparatus. Furthermore, the transform region sub-block may have a size of 1 based on the horizontal axis (or vertical axis): 3. In this case, a flag (cu_sbt_quad_flag) dividing the corresponding division is additionally encoded by the entropy encoder 155 and signaled to the video decoding apparatus.

Meanwhile, the transformer 140 may perform transformation on the residual block separately in the horizontal direction and the vertical direction. For the transformation, different types of transformation functions or transformation matrices may be used. For example, a pair of transform functions for horizontal transforms and vertical transforms may be defined as a Multiple Transform Set (MTS). The transformer 140 may select one transform function pair having the highest transform efficiency in the MTS and may transform the residual block in each of the horizontal and vertical directions. The information (mts_idx) of the transform function pairs in the MTS is encoded by the entropy encoder 155 and signaled to the video decoding device.

The quantizer 145 quantizes the transform coefficient output from the transformer 140 using quantization parameters and outputs the quantized transform coefficient to the entropy encoder 155. The quantizer 145 may also immediately quantize the relevant residual block without a transform for any block or frame. The quantizer 145 may also apply different quantization coefficients (scaling values) according to the positions of the transform coefficients in the transform block. Quantization matrices applied to transform coefficients arranged in a two-dimensional manner may be encoded and signaled to a video decoding apparatus.

The rearrangement unit 150 may perform rearrangement of coefficient values for quantized residual values.

The rearrangement unit 150 may change the 2D coefficient array to a 1D coefficient sequence by using coefficient scanning. For example, the rearrangement unit 150 may output a 1D coefficient sequence by scanning the DC coefficient into a high frequency domain coefficient using a zig-zag scan or a diagonal scan. Instead of zig-zag scanning, vertical scanning that scans the 2D coefficient array in the column direction and horizontal scanning that scans the 2D block type coefficients in the row direction may also be used, depending on the size of the transform unit and the intra prediction mode. In other words, according to the size of the transform unit and the intra prediction mode, a scan method to be used may be determined in zig-zag scan, diagonal scan, vertical scan, and horizontal scan.

The entropy encoder 155 generates a bitstream by encoding a sequence of 1D quantized transform coefficients output from the rearrangement unit 150 using various encoding schemes including context-based adaptive binary arithmetic coding (CABAC), exponential golomb, and the like.

Further, the entropy encoder 155 encodes information related to block division, such as a CTU size, a CTU division flag, a QT division flag, an MTT division type, an MTT division direction, etc., to allow the video decoding apparatus to equally divide blocks to the video encoding apparatus. Further, the entropy encoder 155 encodes information on a prediction type indicating whether the current block is encoded by intra prediction or inter prediction. The entropy encoder 155 encodes intra prediction information (i.e., information about an intra prediction mode) or inter prediction information (information about a reference picture index and a motion vector difference in the case of a merge mode, a merge index, and in the case of an AMVP mode) according to a prediction type. Further, the entropy encoder 155 encodes information related to quantization (i.e., information about quantization parameters and information about quantization matrices).

The inverse quantizer 160 dequantizes the quantized transform coefficients output from the quantizer 145 to generate transform coefficients. The inverse transformer 165 transforms the transform coefficients output from the inverse quantizer 160 from the frequency domain to the spatial domain to restore a residual block.

The adder 170 adds the restored residual block to the prediction block generated by the predictor 120 to restore the current block. When intra prediction is performed on the next sequential block, pixels in the restored current block may be used as reference pixels.

The loop filter unit 180 performs filtering on the restored pixels in order to reduce block artifacts, ringing artifacts, blurring artifacts, etc., which occur due to block-based prediction and transform/quantization. Loop filter unit 180, which is an in-loop filter, may include all or some of deblocking filter 182, sample Adaptive Offset (SAO) filter 184, and Adaptive Loop Filter (ALF) 186.

The deblocking filter 182 filters boundaries between restored blocks in order to remove block artifacts occurring due to block unit encoding/decoding, and the SAO filter 184 and ALF 186 perform additional filtering of the filtered video for deblocking. SAO filter 184 and ALF 186 are filters used to compensate for differences between restored pixels and original pixels that occur as a result of lossy coding. The SAO filter 184 applies an offset as a CTU unit to enhance subjective image quality and coding efficiency. On the other hand, the ALF 186 performs block unit filtering, and compensates for distortion by applying different filters by dividing boundaries of respective blocks and the degree of variation. Information about the filter coefficients to be used for the ALF may be encoded and signaled to the video decoding apparatus.

The restored blocks filtered by the deblocking filter 182, the SAO filter 184, and the ALF 186 are stored in the memory 190. When all blocks in one picture are restored, the restored picture may be used as a reference picture for inter-predicting blocks within a picture to be encoded later.

Fig. 5 is a functional block diagram of a video decoding device in which the techniques of this disclosure may be implemented. Hereinafter, with reference to fig. 5, a video decoding apparatus and components of the apparatus are described.

The video decoding apparatus may include an entropy decoder 510, a rearrangement unit 515, an inverse quantizer 520, an inverse transformer 530, a predictor 540, an adder 550, a loop filter unit 560, and a memory 570.

Similar to the video encoding apparatus of fig. 1, each component of the video decoding apparatus may be implemented as hardware or software or as a combination of hardware and software. Further, the function of each component may be implemented as software, and a microprocessor may also be implemented to execute the function of the software corresponding to each component.

The entropy decoder 510 extracts information related to block division by decoding a bitstream generated by a video encoding apparatus to determine a current block to be decoded, and extracts prediction information required to restore the current block and information on a residual signal.

The entropy decoder 510 determines the size of a CTU by extracting information about the CTU size from a Sequence Parameter Set (SPS) or a Picture Parameter Set (PPS), and divides a picture into CTUs having the determined size. In addition, the CTU is determined to be the highest layer of the tree structure, i.e., the root node, and division information of the CTU may be extracted to divide the CTU using the tree structure.

For example, when dividing CTUs using the QTBTTT structure, first a first flag (qt_split_flag) related to the division of QT is extracted to divide each node into four nodes of the lower layer. Further, for a node corresponding to a leaf node of QT, a second flag (MTT _split_flag), a division direction (vertical/horizontal), and/or a division type (binary/ternary) related to division of MTT are extracted to divide the corresponding leaf node into MTT structures. As a result, each node below the leaf node of QT is recursively divided into BT or TT structures.

As another embodiment, when CTUs are divided by using the QTBTTT structure, a CU division flag (split_cu_flag) indicating whether a CU is divided is extracted. The first flag (qt_split_flag) may also be extracted when the corresponding block is divided. During the partitioning process, 0 or more recursive MTT partitions may occur after 0 or more recursive QT partitions for each node. For example, MTT partitioning may occur immediately, or conversely, QT partitioning may occur only multiple times, relative to CTUs.

For another example, when dividing CTUs using the QTBT structure, a first flag (qt_split_flag) related to the division of QT is extracted to divide each node into four nodes of the lower layer. Further, a division flag (split_flag) indicating whether a node corresponding to a leaf node of QT is further divided into BT and division direction information are extracted.

Meanwhile, when the entropy decoder 510 determines a current block to be decoded by using the division of the tree structure, the entropy decoder 510 extracts information on a prediction type indicating whether the current block is intra prediction or inter prediction. When the prediction type information indicates intra prediction, the entropy decoder 510 extracts syntax elements for intra prediction information (intra prediction mode) of the current block. When the prediction type information indicates inter prediction, the entropy decoder 510 extracts information representing syntax elements (i.e., a motion vector and a reference picture of a motion neighbor reference) for the inter prediction information.

Further, the entropy decoder 510 extracts quantization related information and extracts information on quantized transform coefficients of the current block as information on a residual signal.

The rearrangement unit 515 may change the sequence of the 1D quantized transform coefficients entropy-decoded by the entropy decoder 510 into a 2D coefficient array (i.e., block) again in an order reverse to the coefficient scan order performed by the video encoding apparatus.

The inverse quantizer 520 dequantizes the quantized transform coefficients, and dequantizes the quantized transform coefficients by using quantization parameters. The inverse quantizer 520 may also apply different quantized coefficients (scaling values) to the quantized transform coefficients arranged in 2D. The inverse quantizer 520 may perform dequantization by applying a matrix (scaled value) of quantized coefficients from a video encoding device to a 2D array of quantized transform coefficients.

The inverse transformer 530 generates a residual block for the current block by restoring a residual signal through inverse transforming the dequantized transform coefficients from the frequency domain to the spatial domain.

Further, when the inverse transformer 530 inversely transforms a partial region (sub-block) of the transform block, the inverse transformer 530 extracts a flag (cu_sbt_flag) where only the sub-block of the transform block is transformed, direction (vertical/horizontal) information (cu_sbt_horizontal_flag) of the sub-block, and/or position information (cu_sbt_pos_flag) of the sub-block. The inverse transformer 530 also inversely transforms transform coefficients of the corresponding sub-block from the frequency domain to the spatial domain to restore a residual signal, and fills the region that is not inversely transformed with a value of "0" as the residual signal to generate a final residual block for the current block.

In addition, when applying MTS, the inverse transformer 530 determines a transform index or a transform matrix applied in each of the horizontal direction and the vertical direction by using MTS information (mts_idx) signaled from the video encoding apparatus. The inverse transformer 530 also performs inverse transformation on the transform coefficients in the transform block in the horizontal direction and the vertical direction by using the determined transform function.

The predictor 540 may include an intra predictor 542 and an inter predictor 544. The intra predictor 542 is activated when the prediction type of the current block is intra prediction, and the inter predictor 544 is activated when the prediction type of the current block is inter prediction.

The intra predictor 542 determines an intra prediction mode of the current block among a plurality of intra prediction modes according to a syntax element of the intra prediction mode extracted from the entropy decoder 510. The intra predictor 542 also predicts the current block by using neighboring reference pixels of the current block according to an intra prediction mode.

The inter predictor 544 determines a motion vector of the current block and a reference picture to which the motion vector refers by using syntax elements for the inter prediction mode extracted from the entropy decoder 510.

The adder 550 restores the current block by adding the residual block output from the inverse transformer 530 to the prediction block output from the inter predictor 544 or the intra predictor 542. In intra prediction of a block to be decoded later, pixels within the restored current block are used as reference pixels.

The loop filter unit 560, which is an in-loop filter, may include a deblocking filter 562, an SAO filter 564, and an ALF 566. The deblocking filter 562 performs deblocking filtering on boundaries between restored blocks to remove block artifacts occurring due to block unit decoding. The SAO filter 564 and ALF 566 perform additional filtering on the restored block after deblocking filtering to compensate for differences between restored pixels and original pixels that occur due to lossy coding. The filter coefficients of the ALF are determined by using information on the filter coefficients decoded from the bitstream.

The restored blocks filtered by the deblocking filter 562, the SAO filter 564, and the ALF 566 are stored in the memory 570. When all blocks in one picture are restored, the restored picture may be used as a reference picture for inter-predicting blocks within a picture to be encoded later.

In some implementations, the present disclosure relates to encoding and decoding video images as described above. More particularly, the present disclosure provides a video coding method and apparatus for adaptively signaling spatial resolution of a Block Vector (BV) indicating a position of a reference block or adaptively signaling a sign of a block vector difference when an Intra Block Copy (IBC) is applied to a current block.

The following embodiments are applicable to the intra predictor 122 in the video encoding apparatus. Further, the following embodiments are applicable to the entropy decoder 510 and the intra predictor 542 in the video decoding apparatus.

In the following description, the term "target block" to be encoded/decoded may be used in the same meaning as the current block or Coding Unit (CU) as described above, or the term "target block" may refer to a partial region of the coding unit.

Hereinafter, the specific flag true (true) indicates that the value of the corresponding flag is 1, and the specific flag false (false) indicates that the value of the corresponding flag is 0.

I. Intra Block Copy (IBC) techniques

When searching and predicting a current block in IBC mode, as shown in fig. 6, a reference block becomes a predicted value of the current block, and a displacement between the current block and the reference block is represented as a Block Vector (BV). In order to improve decoding efficiency, a video decoding apparatus does not transmit a block vector as it is, but divides the block vector into a Block Vector Predictor (BVP) and a Block Vector Difference (BVD), encodes the encoded BVP and the encoded BVD, and then may transmit the encoded BVP and the encoded BVD to the video decoding apparatus.

Hereinafter, the spatial resolution of the BVD and the spatial resolution of the block vector are considered to be the same. In addition, one identification may be used to determine spatial resolution values for horizontal and vertical elements of the block vector identically.

Meanwhile, as shown in fig. 7, according to the block vector transmission method, IBC techniques may be classified into an IBC skip mode, an IBC merge mode, and an IBC AMVP mode. The video encoding apparatus uses the same block vector transmission method as the IBC merge mode in the IBC skip mode, but the video encoding apparatus may not transmit a residual block corresponding to a difference between the current block and the prediction block. Meanwhile, the illustration shown in fig. 7 may be similarly applied to a video decoding apparatus, but the video decoding apparatus may parse necessary flags from a bitstream.

The video encoding apparatus recognizes whether or not the IBC skip mode is in (S700), and if not (no in S700), the video encoding apparatus checks whether or not the IBC merge mode is in (S704).

In the case of the IBC skip mode or the IBC merge mode (yes in S700, or yes in S704), the video encoding apparatus obtains a merge index merge_idx indicating one block vector included in the IBC merge list (S702 and S706, respectively). However, the video encoding device does not obtain a BVD. The IBC merge list may be composed by the video encoding apparatus and the video decoding apparatus in the same manner. After selecting the Block Vector Predictor (BVP) indicated by the merge index, the video encoding apparatus may use the selected BVP as a block vector. Meanwhile, the video encoding device transmits the merge index to the video decoding device, but the video encoding device does not transmit the BVD.

In the case of IBC AMVP mode (no in S704), the video encoding apparatus sequentially obtains mvp_10_flag, BVD, and amvr_precision_idx (S708, S710, and S712, respectively). Here, mvp_10_flag is an index indicating a predicted value of a motion vector, and is also used as an index indicating BVP for a block vector. Further, amyr_precision_idx is an index indicating the spatial resolution of a motion vector according to the application of Adaptive Motion Vector Resolution (AMVR), and is also used as an index indicating the spatial resolution of a block vector. The video encoding apparatus may select a block vector indicated by mvp_10_flag as BVP, and then may generate the block vector by adding the block vector to BVD. Meanwhile, the video encoding apparatus transmits mvp_i0_flag, BVD, and amvr_precision_idx to the video decoding apparatus.

Hereinafter, mvp_l0_flag is denoted as a block vector predictor index, and amvr_precision_idx is denoted as a block vector spatial resolution precision index.

In the illustration of fig. 7, when IBC AMVP mode is used, the video encoding apparatus may determine amvr_precision_idx indicating spatial resolution of block vector differences in terms of rate distortion optimization. The video encoding apparatus signals the amyr_flag and the amyr_precision_idx to the video decoding apparatus to transmit the spatial resolution of the block vector. In other words, the video encoding device transmits an amvr_flag to signal whether an AMVR technique is applied to a block vector. In addition, the video encoding apparatus may signal the spatial resolution for prediction by transmitting amvr_precision_idx indicating one of the spatial resolution candidate lists of the block vector. The video encoding apparatus and the video decoding apparatus share the same spatial resolution candidate list of block vectors. Meanwhile, when the AMVR technology is used in the existing IBC AMVP mode, since the amvr_flag is regarded as 1, the video encoding apparatus may omit transmission of the amvr_flag.

In the existing IBC AMVP mode, a block vector spatial resolution candidate list shared between a video encoding apparatus and a video decoding apparatus is {1-pel,4-pel }. When applying the AMVR technique based on the candidate list, the video encoding apparatus and the video decoding apparatus may determine the spatial resolution of the block vector as shown in table 1.

[ Table 1 ]

amvr_flag	amvr_precision_idx	Block vector spatial resolution
			1	0	1-pel
1	1	4-pel

As described above, according to table 1, one of 1-pel or 4-pel is selected as the spatial resolution of the block vector, and according to the selected resolution, the spatial resolution of the block vector and the spatial resolution of the block vector difference may be determined. In other words, the block vector cannot be represented based on sub-pixel units smaller than one pixel unit. This is because, unlike natural images, conventional encoding methods are designed without requiring sub-pixel units smaller than pixel units because conventional screen contents are generated using a computer. However, unlike conventional screen contents, in the ultra-high definition screen contents recently generated, there are many cases in which it is advantageous to represent block vectors in units of subpixels.

In general, prediction is more effectively performed in sub-pixel units to predict the motion of an object in a natural image. In addition, in the natural video encoding process, since the spatial resolution of the sub-pixel unit cannot be supported, the selection of IBC mode is not so much, and even if selected, the encoding efficiency is very low. On the other hand, the IBC mode is often selected in the process of encoding screen contents, and even when spatial resolution in units of pixels is used, relatively high decoding efficiency can be achieved. For these reasons, block vector spatial resolution in the sub-pixel unit is not required.

However, as image rendering techniques have recently evolved, various techniques are used for screen content to generate moving or smooth images. Thus, experimentally observing BV in sub-pixel units is an advantageous case (which has not been considered in terms of existing video coding techniques).

For example, in the case of a game graphic video including ray tracing, motion blur effect, deep learning supersampling (DLSS), antialiasing, and the like, various techniques are applied that make it very similar to natural video. While this video is computer generated screen content, it may have features that can be seen in real natural images, such as light blur and motion blur. If existing AMVR techniques with spatial resolution of integer pixels are applied to such graphical images, the IBC techniques may be very inefficient because detailed prediction is not performed. According to the present disclosure, this problem of the related art can be solved by using the spatial resolution of the block vector in units of sub-pixels.

In addition to the above-described spatial resolution problems, the prior art has additional inefficiencies. Additional inefficiency aspects are described below using the illustration of fig. 8.

Fig. 8 is a flowchart illustrating a method for decoding a block vector difference.

In IBC technology, in order to transmit a block vector, a video encoding apparatus divides the block vector into a block vector predictor and a block vector difference, and then transmits the block vector predictor and the block vector difference to a video decoding apparatus. The video decoding apparatus may decode the block vector difference as shown in fig. 8. Meanwhile, the illustration shown in fig. 8 can be similarly applied to a video encoding apparatus, however, the video encoding apparatus can obtain necessary flags from a high level.

The video decoding apparatus parses a flag abs_mvd_groter 0_flag indicating whether the absolute value of the block vector difference is greater than 0 (S800), and then checks the absolute value (S802). When the absolute value of the BVD is 0 or less (no in S802), the BVD analysis is terminated.

When the absolute value of the BVD is greater than 0 (yes in S802), the video decoding apparatus parses a flag abs_mvd_groter 1_flag indicating whether the absolute value of the block vector difference is greater than 1 (S804), and then the video decoding apparatus checks the absolute value (S806). When the absolute value of BVD is 1 or less (no in S806), it is determined that the absolute value of the block vector difference is 1, and the BVD symbol decoding step is performed (S810).

When the absolute value of BVD is greater than 1 (yes in S806), the video decoding apparatus decodes the value of |bvd-2| using golomb-rice coding (S808). The video decoding apparatus may use the value of |BVD-2| to generate the absolute value of the BVD.

The video decoding device decodes the BVD symbol (S810). Thereafter, the video decoding apparatus may combine the absolute value of the BVD with the sign of the BVD to ultimately produce a block vector difference. The video decoding apparatus may perform the above-described processing for each of the horizontal and vertical elements of the BVD.

Meanwhile, when the block vector predictor indicates that there may be a region near an edge of the reference block, a sign of the block vector difference may be limited according to the block vector difference. The regions where reference blocks may exist may be constructed based on pictures, slices, tiles, or CTUs, or may be constructed based on each individual virtual buffer. As shown in fig. 9, it is assumed that a reference block represented by a block vector predictor is located near the upper boundary of a region that can be referred to. In this case, if the sign of the vertical element of the block vector difference is a negative number, the current block cannot use the region indicated by the block vector as the reference block. In other words, the sign of the vertical element of the block vector difference must be positive. However, in conventional techniques, the coincidence of block vector differences is always signaled.

This problem of the prior art can be solved by pre-computing the block vectors before encoding or parsing the symbols of the block vector differences. Meanwhile, a region where a reference block may exist, a block vector predictor, a size of a block vector difference, a block vector spatial resolution, etc. may be considered for pre-calculating a block vector.

Hereinafter, implementation embodiments for solving the above-described problems are described.

Hereinafter, the present implementation embodiment is described focusing on adaptive encoding of a block vector of a current block by a video encoding apparatus. Such adaptive encoding of the block vectors may be performed by an intra predictor 122 in the video encoding device. On the other hand, for convenience of description, reference is made to the video decoding apparatus as necessary. Nevertheless, most of the embodiments described below are equally or similarly applicable to video decoding apparatuses. Meanwhile, the video encoding apparatus determines information (marks and indexes described later) related to adaptive encoding of block vectors in terms of rate distortion optimization. Thereafter, the video encoding device may encode the information to generate a bitstream, and then may transmit the bitstream to the video decoding device. Furthermore, the video encoding apparatus may determine the spatial resolution of the block vector of the current block by obtaining information related to adaptive encoding of the block vector from a high level.

Block vector with adaptive spatial resolution

< embodiment 1> method of signaling spatial resolution of block vector

In this implementation embodiment, the video encoding device selects one of the single or multiple block vector spatial resolution candidate lists, and then selects the block vector spatial resolution of the selected list according to amvr_precision_idx. In this case, each block vector spatial resolution candidate list may be differently configured according to an embodiment, and the configuration method of the list may also be variously implemented according to an embodiment. For example, the video encoding apparatus may use at least one of the following as an element of the candidate list to construct the block vector spatial resolution candidate list.

First, the block vector spatial resolution of neighboring blocks located at the upper, upper right, upper left, lower left, etc. of the current block may be used. In addition, the block vector spatial resolution previously used in the encoding (or decoding) order may be used. In addition, a preset block vector spatial resolution may be used. Finally, a block vector spatial resolution determined based on the frequency of use of the block vector spatial resolution may be used.

When distributing the block vector spatial resolution values of neighboring blocks as shown in fig. 10, the video encoding apparatus considers the frequency of use of the block vector spatial resolution values, and places the block vector spatial resolution values in front of the list when they approach the most frequently used resolution. For example, the list may be constructed as {1-pel,1/2-pel,4-pel }.

Meanwhile, as described above, one block vector spatial resolution candidate list may be constituted, but a plurality of block vector spatial resolution candidate lists may be constituted according to an embodiment.

To use block vector spatial resolutions in integer pixel units and sub-pixel units, a video encoding apparatus may construct a candidate list, and then may transmit amvr_precision_idx indicating the block vector spatial resolution to a video decoding apparatus. Hereinafter, detailed implementation embodiments related thereto are described.

Example 1-1: method for using a block vector spatial resolution list

In this implementation embodiment, the video encoding device constructs a block vector spatial resolution list and then signals the spatial resolution of the block vector using amvr_precision_idx.

The block vector spatial resolution list may be constructed using all or some of the various components as described above. For example, based on the diagram of fig. 10, as described above, when one block vector spatial resolution candidate list {1-pel,1/2-pel,4-pel } is constructed, the video encoding apparatus may transmit amvr_precision_idx to the video decoding apparatus and may indicate spatial resolution. For example, when amyr_precision_idx is transmitted as 1, the corresponding block vector has a spatial resolution of 1/2-pel.

Examples 1-2: method for spatial resolution list using multiple block vectors

In this implementation embodiment, the video encoding apparatus uses the signal to determine a block vector spatial resolution candidate list and the spatial resolution of the block vector. As described above, the video encoding apparatus classifies various block vector spatial resolution candidates into a plurality of groups according to preset conditions, and then constructs a plurality of block vector spatial resolution candidate lists.

In this case, as a criterion for classifying a plurality of groups, first, a type of spatial resolution of a block vector such as an integer pixel unit or a sub-pixel unit may be used. Next, the positions of neighboring blocks of the current block may be used. For example, one list may include block vector spatial resolutions of blocks located at an upper portion (top, upper right, upper left, etc.) of the current block, and another list may include block vector spatial resolutions of blocks located at a left side (left side, upper left, lower left, etc.) of the current block.

As an implementation, assume that the available spatial resolution is {1/4-pel,1-pel,2-pel,4-pel,1/2-pel }. When the block vector spatial resolution values are classified in integer pixel units and sub-pixel units, the video encoding apparatus may construct two spatial resolution candidate lists, as shown in table 2.

[ Table 2 ]

amvr_set_idx	Block vector spatial resolution candidate list
		0	{1,2,4}
1	{1/2,1/4}

According to table 2, the video encoding apparatus may select one of a plurality of block vector spatial resolution candidate lists using amvr_set_idx, and then may determine a block vector spatial resolution using amvr_precision_idx. For example, when amyr_set_idx is 1 and amyr_precision_idx is 1, the block vector spatial resolution is determined to be 1/4-pel.

< embodiment 2> use of one spatial resolution in a subpixel Unit without separate Signal Transmission

In this implementation embodiment, the video encoding device uses one block vector spatial resolution without separate signal transmission. In the related art, for a CU using the IBC mode, the amvr_precision_idx is always transmitted. However, in the present embodiment, the video encoding apparatus may use block vectors having various spatial resolutions without transmitting amvr_precision_idx. In order to use various block vector spatial resolutions without separate signal transmission, a video encoding apparatus may consider at least the following.

First, the video encoding apparatus considers whether to use a block vector of a neighboring block. Further, the video encoding apparatus considers block vector spatial resolution values of neighboring blocks (upper, upper right, upper left, lower left, etc.). In addition, the video encoding apparatus may consider a preset block vector spatial resolution.

For example, when the spatial resolution of the preset block vector is 1/2-pel, the video encoding apparatus may use a block vector having 1/2-pel spatial resolution without transmitting the amyr_flag and the amyr_precision_idx.

As another embodiment, the distribution of block vector spatial resolutions of neighboring blocks may be considered as follows. When the AMVR is not applied to the upper block and the left block, the video encoding apparatus uses a preset spatial resolution. Here, the preset spatial resolution may be one of spatial resolution values, such as 4-pel, 2-pel, 1/2-pel, or 1/4-pel. Alternatively, when the AMVR is applied to one of the upper block and the left block, the video encoding apparatus may use a block vector spatial resolution of the block to which the AMVR is applied. Alternatively, when the AMVR is applied to the upper block and the left block, the video encoding apparatus may use a block vector spatial resolution of one of the two blocks according to a preset method. Here, the preset method may be one of a method using a more accurate spatial resolution, a method using a less accurate spatial resolution, and a method using a preset spatial resolution.

For example, as shown in fig. 11, when the AMVR is applied to the upper block and the spatial resolution of the corresponding block vector is 1/2-pel and the AMVR is not applied to the left block, the video encoding apparatus may determine the block vector spatial resolution of the current block to be 1/2-pel.

As another embodiment, when there are a plurality of upper or left blocks, the video encoding apparatus may select a representative block from the plurality of blocks, and then may consider the spatial resolution of the block vector of the selected representative block, unlike the illustration of fig. 11. The representative block may be determined according to one of the following methods. Among the upper blocks, the leftmost block, the center block, or the rightmost block may be determined as a representative upper block. Alternatively, among the left blocks, the uppermost block, the center block, or the lowermost block may be determined as a representative left block.

Alternatively, the video encoding apparatus may consider the most frequently used block vector spatial resolution among the plurality of blocks without determining the representative block. For example, when the block vector spatial resolution distribution of the upper block is the same as the diagram of fig. 12, 1/4-pel, which is the most frequently used block vector spatial resolution, may be the upper block vector spatial resolution.

< example 3> determination of use of example 1 or example 2 based on flag

In this implementation embodiment, the video encoding apparatus may determine the application of embodiment 1 or embodiment 2 using the flag. In other words, the video encoding apparatus may use the methods of embodiment 1 and embodiment 2 using the abvr_enable_flag. For example, when the abvr_enable_flag is 1, the video encoding apparatus may use the method of embodiment 1 or embodiment 2, and when the abvr_enable_flag is 0, the video encoding apparatus may not apply the implementation embodiment. Abvr included in the name of a flag refers to adaptive block vector resolution.

III. adaptive signalling of symbols of block vector difference

< example 4> method of obtaining sign of block vector difference

In this implementation embodiment, when applying the IBC technique, the video encoding device derives the sign of the block vector difference and uses the derived sign when the block vector predictor is near the edge of the region that the current block can reference. The non-referenceable region of the current block may include a region that has not yet been restored, a region of another slice, a region of another CTU, a region other than the virtual buffer, and the like. By deriving the sign of the block vector difference, signaling and parsing of the sign indicating the sign of the difference is omitted, and decoding efficiency can be improved.

In the prior art, a video encoding apparatus first generates a block vector predictor and then obtains a block vector difference. The video encoding apparatus may calculate the final block vector by combining the block vector difference and the block vector spatial resolution, but in a specific case it is not necessary to transmit the sign of the block vector difference. This may occur more frequently when the block vector spatial resolution is obtained before the block vector difference.

For example, as in the illustration of fig. 13A, assume that the reference block indicated by BVP is an edge of a region to which the reference block can refer, the block vector spatial resolution is 4-pel, and the vertical element of the block vector difference is 2. When the sign of the vertical component of the block vector difference is negative (-), the position of the reference block according to the block vector may include an unrecovered area as shown in the diagram of fig. 13A. Thus, the sign of the vertical element of the block vector difference is unconditionally positive (+). On the other hand, as in the illustration of fig. 13B, if the vertical element of the block vector difference is 2 and the block vector spatial resolution is 1-pel, the sign of the vertical component of the block vector difference may be both positive (+) and negative (-). Thus, the video encoding apparatus may maximize the efficiency of this embodiment by first obtaining the block vector spatial resolution before obtaining the sign of the block vector difference.

Meanwhile, the video encoding apparatus may derive the block vector difference according to the present implementation embodiment as follows.

By BVP (BVP) _x 、BVP _y ) Representing block vector predictors and using BVD (BVD _x 、BVD _y ) Representing the application of block vector spatial resolution to the absolute value of the block vector difference. Here, BVD _x And BVD _y Representing the horizontal and vertical elements of the BVD, respectively. The video encoding apparatus may generate four block vector candidates as shown in equation 1 by differently combining the signs of the horizontal and vertical elements of the BVD.

[ Eq.1 ]

BV ₁ (BVP _x +BVD _x ，BVP _y +BVD _y )

BV ₂ (BVP _x +BVD _x ，BVP _y -BVD _y )

BV ₃ (BVP _x -BVD _x ，BVP _y +BVD _y )

BV ₄ (BVP _x -BVD _x ，BVP _y -BVD _y )

The video encoding apparatus generates a reference block at a position indicated by each block vector candidate using the four block vector candidates shown in equation 1. In this case, when the reference block at the position indicated by the specific block vector candidate includes an area that cannot be referred to, the video encoding apparatus cannot use the corresponding block vector candidate. When an available candidate among four candidates is determined in this way, the video encoding apparatus may omit coding of the sign of the block vector difference when the coincidence of the block vector difference can be determined as positive or negative. In addition, the video decoding apparatus may derive the sign of the block vector difference without parsing, and then may use the derived sign. For example, only BV may be used in four block vector candidates ₁ And BV (BV) ₂ When the sign of the horizontal element of the block vector difference is not encoded, because it is derived as a positive value, and the sign of the vertical element is encoded.

Meanwhile, when the horizontal element or the vertical element of the block vector difference is 0, two block vector candidates may be used instead of four block vector candidates. For example, when the horizontal element of the difference of the block vectors is 0, the video encoding apparatus may use the two candidates shown in equation 2, and when the vertical element is 0, the video encoding apparatus may use the two candidates shown in equation 3.

[ Eq.2 ]

BV ₁ (BVP _x ，BVP _y +BVD _y )

BV ₂ (BVP _x ，BVP _y -BVD _y )

[ Eq.3 ]

BV ₁ (BVP _x +BVD _x ，BVP _y )

BV ₂ (BVP _x -BVD _x ，BVP _y )

Hereinafter, a method for adaptively generating a block vector of a current block by a video encoding apparatus or a video decoding apparatus based on embodiment 1 and embodiment 4 is described using the diagrams in fig. 14 and 15.

The video encoding apparatus obtains a block vector predictor index, an absolute value of a block vector difference, and a block vector spatial resolution precision index from a high level (S1400). In addition, the video encoding apparatus may encode the block vector predictor index, the absolute value of the block vector difference, and the block vector spatial resolution precision index, and then may signal them to the video decoding apparatus.

After generating the block vector predictor candidate list of the current block, the video encoding apparatus generates a block vector predictor from the block vector predictor candidate list using the block vector predictor index (S1402). The video encoding apparatus may generate a block vector predictor candidate list of the current block similar to a method for generating an AMVP candidate list of the inter prediction.

After generating the block vector spatial resolution candidate list, the video encoding apparatus generates a block vector spatial resolution from the block vector spatial resolution candidate list using the block vector spatial resolution precision index (S1404).

The video encoding apparatus may construct various block vector spatial resolution candidate lists according to an embodiment, and may use various methods of constructing the lists. For example, the video encoding apparatus may construct a block vector spatial resolution candidate list using at least one of the following as an element of the candidate list.

First, the block vector spatial resolution of neighboring blocks located at the upper, upper right, upper left, lower left, etc. of the current block may be used. In addition, the block vector spatial resolution previously used in the encoding order may be used. In addition, a preset block vector spatial resolution may be used. Finally, a block vector spatial resolution determined based on the frequency of use of the block vector spatial resolution may be used. When constructing a candidate list based on the frequencies of use, the spatial resolution of the block vector with the higher frequencies of use may be placed in front of the candidate list.

Meanwhile, the block vector spatial resolution candidate list may include at least one spatial resolution value in units of integer pixels and at least one spatial resolution value in units of sub-pixels.

The video encoding apparatus derives or obtains a sign of the block vector difference by using the block vector spatial resolution, the block vector predictor, and the absolute value of the block vector difference (S1406).

The video encoding apparatus may derive or obtain the sign of the block vector difference using the following steps.

The video encoding apparatus applies the block vector spatial resolution to the block vector predictor and the absolute value of the block vector difference (S1420).

The video encoding apparatus generates block vector candidates by differently combining the signs of the horizontal and vertical elements of the block vector difference with the block vector predictor (S1422).

The video encoding apparatus generates a reference block of the current block at a position indicated by each block vector candidate (S1424).

The video encoding apparatus determines whether a corresponding block vector candidate is included in the available block vector candidates according to whether the reference block includes a region that cannot be referenced (S1426).

The video encoding apparatus derives or obtains a sign of the block vector difference based on the available block vector candidates (S1428).

When determining available block vector candidates, the video encoding apparatus derives the sign of the block vector difference in the case where the sign of the block vector difference is determined to be positive or negative. Alternatively, in the case where the sign of the block vector difference is not determined to be positive or negative, the video encoding apparatus may obtain the sign of the block vector difference from a high level.

The video encoding apparatus generates a block vector difference by combining the absolute value of the block vector difference and the sign (S1408).

The video encoding apparatus generates a block vector by combining the block vector predictor and the block vector difference (S1410).

The video decoding apparatus decodes a block vector predictor index, an absolute value of a block vector difference, and a block vector spatial resolution precision index from a bitstream (S1500).

After generating the block vector predictor candidate list of the current block, the video decoding apparatus generates a block vector predictor from the block vector predictor candidate list using the block vector predictor index (S1502). The video decoding apparatus may generate a block vector predictor candidate list of the current block similar to the method for generating the AMVP candidate list of the inter prediction.

After generating the block vector spatial resolution candidate list, the video decoding apparatus generates a block vector spatial resolution from the block vector spatial resolution candidate list using the block vector spatial resolution precision index (S1504). Since the video decoding apparatus generates the block vector spatial resolution using the same method as the video encoding apparatus described above, an additional detailed description is omitted.

The video decoding apparatus derives or decodes a symbol of the block vector difference by using the block vector spatial resolution, the block vector predictor, and the absolute value of the block vector difference (S1506).

The video decoding apparatus may derive or decode the symbol of the block vector difference using the following steps.

The video decoding apparatus applies the block vector spatial resolution to the block vector predictor and the absolute value of the block vector difference (S1520).

The video decoding apparatus generates block vector candidates by differentially combining the signs of the horizontal element and the vertical element of the block vector difference with the block vector predictor (S1522).

The video decoding apparatus generates a reference block of the current block at a position indicated by each block vector candidate (S1524).

The video decoding apparatus determines whether a corresponding block vector candidate is included among available block vector candidates according to whether the reference block includes an area that cannot be referenced (S1526).

The video decoding apparatus derives or obtains a symbol of the block vector difference based on the available block vector candidates (S1528).

When determining available block vector candidates, the video decoding apparatus derives the sign of the block vector difference in the case where the sign of the block vector difference is determined to be positive or negative. Alternatively, in the case where the sign of the block vector difference is not determined to be positive or negative, the video decoding apparatus may decode the sign of the block vector difference from the bitstream.

The video decoding apparatus generates a block vector difference by combining the absolute value of the block vector difference and the sign (S1508).

The video decoding apparatus generates a block vector by combining the block vector predictor and the block vector difference (S1510).

Hereinafter, a method for adaptively generating a block vector of a current block by a video encoding apparatus or a video decoding apparatus based on embodiment 2 and embodiment 4 is described using the diagrams in fig. 16 and 17.

The video encoding apparatus obtains a block vector predictor index and an absolute value of a block vector difference from a high level (S1600). In addition, the video encoding apparatus may encode the block vector predictor index and the absolute value of the block vector difference, and then may signal them to the video decoding apparatus.

After generating the block vector predictor candidate list of the current block, the video encoding apparatus generates a block vector predictor from the block vector predictor candidate list using the block vector predictor index (S1602).

The video encoding apparatus derives a block vector spatial resolution (S1604).

The video encoding apparatus may use a preset block vector spatial resolution as the block vector spatial resolution. In this case, the preset block vector spatial resolution may be spatial resolution in units of integer pixels or in units of sub-pixel units.

When the adaptive spatial resolution is not applied to the upper block and the left block of the current block, the video encoding apparatus may use a preset block vector spatial resolution as the block vector spatial resolution. In this case, the preset block spatial resolution may be spatial resolution in units of integer pixel units or sub-pixel units. In addition, when the adaptive spatial resolution is applied to one of an upper block or a left block of the current block, the video encoding apparatus may use the spatial resolution of the block to which the adaptive spatial resolution is applied as the block vector spatial resolution. In addition, when the adaptive spatial resolution is applied to the upper block and the left block of the current block, the video encoding apparatus may use the spatial resolution of one of the upper block and the left block as the block vector spatial resolution according to a preset method. Here, the preset method may be one of a method using a more accurate spatial resolution, a method using a less accurate spatial resolution, or a method using a preset spatial resolution.

The video encoding apparatus derives or obtains a sign of the block vector difference using the block vector spatial resolution, the block vector predictor, and the absolute value of the block vector difference (S1606). Since the step of deriving or acquiring the sign of the block vector difference by the video encoding apparatus is the same as in the diagram of fig. 14, an additional description is omitted.

The video encoding apparatus generates a block vector difference by combining the absolute value of the block vector difference and the sign (S1608).

The video encoding apparatus generates a block vector by combining the block vector predictor and the block vector difference (S1610).

The video decoding apparatus decodes the block vector predictor index and the absolute value of the block vector difference from the bitstream (S1700).

After generating the block vector predictor candidate list of the current block, the video decoding apparatus generates a block vector predictor from the block vector predictor candidate list using the block vector predictor index (S1702).

The video decoding apparatus derives a block vector spatial resolution (S1704). Since the video decoding apparatus pushes the block vector spatial resolution using the same method as the video encoding apparatus described above, an additional detailed description is omitted.

The video decoding apparatus derives or decodes a symbol of the block vector difference using the block vector spatial resolution, the block vector predictor, and the absolute value of the block vector difference (S1706). Since the step of deriving or decoding the sign of the block vector difference by the video decoding apparatus is the same as in the illustration of fig. 15, an additional description is omitted.

The video decoding apparatus generates a block vector difference by combining the absolute value of the block vector difference and the sign (S1708).

The video decoding apparatus generates a block vector by combining the block vector predictor and the block vector difference (S1710).

Although the steps in the various flowcharts are described as being performed sequentially, these steps merely exemplify the technical concepts of some embodiments of the present disclosure. Accordingly, one of ordinary skill in the art to which the present disclosure pertains may perform the steps by changing the order depicted in the various figures or by performing more than two steps in parallel. Therefore, the steps in the respective flowcharts are not limited to the time series order shown.

It should be understood that the above description presents illustrative embodiments that may be implemented in various other ways. The functionality described in some embodiments may be implemented by hardware, software, firmware, and/or combinations thereof. It should also be appreciated that the functional components described in this specification are labeled with a "… … unit" to strongly emphasize their independent implementation possibilities.

Meanwhile, various methods or functions described in some embodiments may be implemented as instructions stored in a non-transitory recording medium that can be read and executed by one or more processors. For example, the non-transitory recording medium may include various types of recording apparatuses in which data is stored in a form readable by a computer system. For example, the non-transitory recording medium may include a storage medium such as an erasable programmable read-only memory (EPROM), a flash memory drive, an optical disk drive, a magnetic hard disk drive, a Solid State Drive (SSD), and the like.

Although embodiments of the present disclosure have been described for illustrative purposes, those skilled in the art to which the present disclosure pertains will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the present disclosure. Accordingly, embodiments of the present disclosure have been described for brevity and clarity. The scope of the technical idea of the embodiments of the present disclosure is not limited by the drawings. Thus, it will be understood by those of ordinary skill in the art to which this disclosure pertains that the scope of this disclosure should not be limited by the embodiments explicitly described above, but rather by the claims and their equivalents.

(reference numerals)

124. Inter-frame predictor

510. Entropy decoder

544. An inter predictor.

Claims

1. A method performed by a video decoding apparatus of adaptively generating a block vector of a current block in an Intra Block Copy (IBC) mode, the method comprising:

decoding a block vector predictor index, an absolute value of a block vector difference, and a block vector spatial resolution precision index from a bitstream;

generating a block vector predictor candidate list for the current block, and generating a block vector predictor from the block vector predictor candidate list using the block vector predictor index;

Generating a block vector spatial resolution candidate list, and generating a block vector spatial resolution from the block vector spatial resolution candidate list using the block vector spatial resolution precision index;

deriving or decoding a sign of the block vector difference using the block vector spatial resolution, the block vector predictor and an absolute value of the block vector difference;

generating the block vector difference by combining an absolute value of the block vector difference and the symbol; and

the block vector is generated by combining the block vector predictor and the block vector difference.

2. The method of claim 1, wherein the block vector spatial resolution candidate list includes all or some of block vector spatial resolutions of neighboring blocks of the current block, previously used block vector spatial resolutions in decoding order, preset block vector spatial resolutions, and block vector spatial resolutions determined based on a use frequency, as candidates of the block vector spatial resolutions.

3. The method of claim 2, wherein the block vector spatial resolution candidate list places a block vector spatial resolution with a higher frequency of use in front of the block vector spatial resolution candidate list.

4. The method of claim 1, wherein the block vector spatial resolution candidate list comprises at least one spatial resolution value in integer pixels and at least one spatial resolution value in sub-pixels.

5. The method of claim 1, wherein deriving or decoding the symbol comprises:

applying the block vector spatial resolution to the block vector predictor and the absolute value of the block vector difference;

generating block vector candidates by differently combining the symbols of the block vector differences with the block vector predictors;

generating a reference block for the current block at a location indicated by each of the block vector candidates;

determining whether a corresponding block vector candidate is included in available block vector candidates according to whether the reference block includes an area that cannot be referred to; and

the symbols of the block vector differences are derived or decoded based on the available block vector candidates.

6. The method of claim 5, wherein deriving or decoding the symbol when determining the available block vector candidates comprises:

deriving the sign of the block vector difference if the sign of the block vector difference is determined to be positive or negative; and

The symbol of the block vector difference is decoded from the bitstream if the symbol of the block vector difference is not determined to be positive or negative.

7. A method performed by a video decoding apparatus of adaptively generating a block vector of a current block in an Intra Block Copy (IBC) mode, the method comprising:

decoding from the bitstream an absolute value of the block vector predictor index and the block vector difference;

deriving a block vector spatial resolution;

8. The method of claim 7, wherein deriving the block vector spatial resolution comprises:

and taking a preset block vector spatial resolution as the block vector spatial resolution, wherein the preset block vector spatial resolution is spatial resolution in units of integer pixels or sub-pixels.

9. The method of claim 7, wherein deriving the block vector spatial resolution comprises:

when the adaptive spatial resolution is not applied to the upper block and the left block of the current block, a preset block vector spatial resolution is used as the block vector spatial resolution, wherein the preset block vector spatial resolution is a spatial resolution in integer pixel units or sub-pixel units.

10. The method of claim 9, wherein deriving the block vector spatial resolution comprises:

when the adaptive spatial resolution is applied to one of an upper block and a left block of the current block, a spatial resolution of a block to which the adaptive spatial resolution is applied is used as the block vector spatial resolution.

11. The method of claim 10, wherein deriving the block vector spatial resolution comprises:

when the adaptive spatial resolution is applied to an upper block and a left block of the current block, the spatial resolution of one of the upper block and the left block is used as the block vector spatial resolution according to a preset method.

12. A method performed by a video encoding apparatus of adaptively generating a block vector of a current block in an Intra Block Copy (IBC) mode, the method comprising:

Obtaining a block vector predictor index and an absolute value of a block vector difference from a high level;

deriving a block vector spatial resolution;

deriving or obtaining a sign of the block vector difference using the block vector spatial resolution, the block vector predictor and an absolute value of the block vector difference;

13. The method of claim 12, wherein deriving the block vector spatial resolution comprises:

14. The method of claim 12, wherein deriving the block vector spatial resolution comprises:

when the adaptive spatial resolution is not applied to the upper and left blocks of the current block, a preset block vector spatial resolution is used as the block vector spatial resolution, wherein the preset block vector spatial resolution is a spatial resolution in integer pixel units or sub-pixel units.

15. The method of claim 14, wherein deriving the block vector spatial resolution comprises:

16. The method of claim 15, wherein deriving the block vector spatial resolution comprises:

17. The method of claim 12, wherein deriving or obtaining the symbol comprises:

The symbols of the block vector differences are derived or obtained based on the available block vector candidates.

18. The method of claim 17, wherein deriving or obtaining the symbol when determining the available block vector candidates comprises:

in the event that it is determined that the sign of the block vector difference is not positive or negative, the sign of the block vector difference is obtained from the high level.