WO2022114669A2 - Codage d'image au moyen d'un réseau neuronal - Google Patents

Codage d'image au moyen d'un réseau neuronal Download PDF

Info

Publication number
WO2022114669A2
WO2022114669A2 PCT/KR2021/016973 KR2021016973W WO2022114669A2 WO 2022114669 A2 WO2022114669 A2 WO 2022114669A2 KR 2021016973 W KR2021016973 W KR 2021016973W WO 2022114669 A2 WO2022114669 A2 WO 2022114669A2
Authority
WO
WIPO (PCT)
Prior art keywords
encoding
tree
target
test
block
Prior art date
Application number
PCT/KR2021/016973
Other languages
English (en)
Korean (ko)
Other versions
WO2022114669A3 (fr
Inventor
박상효
Original Assignee
경북대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020210030727A external-priority patent/KR102466258B1/ko
Application filed by 경북대학교 산학협력단 filed Critical 경북대학교 산학협력단
Publication of WO2022114669A2 publication Critical patent/WO2022114669A2/fr
Publication of WO2022114669A3 publication Critical patent/WO2022114669A3/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding

Definitions

  • the present disclosure relates to encoding of an image (video) using a neural network.
  • video data Since video data has a large amount of data compared to audio data or still image data, it requires a lot of hardware resources including memory to store or transmit itself without compression processing.
  • an encoder when storing or transmitting video data, an encoder is used to compress and store or transmit the video data, and a decoder receives, decompresses, and reproduces the compressed video data.
  • video compression technologies there are H.264/AVC, High Efficiency Video Coding (HEVC), and the like, as well as Versatile Video Coding (VVC), which improves coding efficiency by about 30% or more compared to HEVC.
  • MTT multi-type tree
  • BT binary tree
  • TT ternary tree
  • the present disclosure relates to a video encoding method for performing versatile video coding (VVC), in order to reduce computational complexity for intra prediction, weight reduction based on explicit and derivative properties extracted in a coding unit (CU) encoding process
  • An object of the present invention is to provide an image encoding method for determining a ternary tree among multi-type trees (MTT) using a light-weight neural network (LNN).
  • determining the division type and direction of a target CU (Coding Unit) corresponding to any one node of the tree structure A method of encoding an image for performing an encoding test for generating an explicit characteristic and a derivative characteristic for the target CU, wherein the explicit characteristic includes at least some of coding information generated according to an encoding test for the target CU, and the derivative characteristic is the encoding costs and derived based on at least one of a ternary tree (TT) division direction to be tested; and determining whether to skip an encoding test for TT segmentation in a horizontal or vertical direction by inputting the explicit and derived characteristics into one or more deep neural networks.
  • MTT multi-type tree
  • generating a screening result by performing a screening test based on a structure of a dual tree for the target CU and a structure of the MTT the method further comprising: When the screening result is true, the step of determining whether to skip the encoding test is performed, and when the screening result is false, the horizontal TT division and the vertical TT division are performed to generate a corresponding encoding cost.
  • An image encoding method is provided.
  • a division type and direction of a target coding unit (CU) corresponding to any one node of the tree structure is determined a cost estimator for performing an encoding test on the target CU to generate encoding costs for each of non-segmentation, quad tree (QT) segmentation, and binary tree (BT) segmentation; a feature extracting unit for generating an explicit feature and a derivative feature for the target CU, the explicit feature includes at least some of coding information generated according to an encoding test for the target CU, the derivative feature is the encoding derived based on at least one of costs and a ternary tree (TT) splitting direction to be tested; and a first TT division unit that determines whether to skip an encoding test for TT division in a horizontal or vertical direction by inputting the explicit characteristics and derivative characteristics into one or more deep neural networks.
  • a screening test unit for generating a screening result by performing a screening test on the target CU based on a structure of a dual tree (dual tree) and the structure of the MTT; and a second TT division unit, wherein when the screening result is true, the first TT division unit determines whether to skip the encoding test for the TT division in the horizontal or vertical direction, and the screening result is false.
  • the second TT division unit performs TT division in the horizontal and vertical directions to generate corresponding encoding costs, respectively.
  • a ternary tree is determined during MTT (Multi-Type Tree) using a lightweight neural network based on explicit and derived characteristics extracted in a coding unit (CU) encoding process.
  • MTT Multi-Type Tree
  • CU coding unit
  • FIG. 1 is an exemplary block diagram of an image encoding apparatus that can implement techniques of the present disclosure.
  • FIG. 2 is a diagram for explaining a method of dividing a block using a QTBTTT structure.
  • FIG. 3 is a diagram illustrating a plurality of intra prediction modes including wide-angle intra prediction modes.
  • FIG. 4 is an exemplary diagram of a neighboring block of the current block.
  • 5 is a schematic flowchart of rate-distortion optimization in intra prediction among video encoding methods.
  • FIG. 6 is a schematic flowchart of rate-distortion optimization in intra prediction according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic illustration of a method for determining a ternary tree according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic block diagram of an image encoding apparatus for determining a terrestrial tree according to an embodiment of the present disclosure.
  • FIG. 1 is an exemplary block diagram of an image encoding apparatus that can implement techniques of the present disclosure.
  • VVC versatile video coding
  • the image encoding apparatus includes a picture division unit 110 , a prediction unit 120 , a subtractor 130 , a transform unit 140 , a quantization unit 145 , a reordering unit 150 , an entropy encoding unit 155 , and an inverse quantization unit. 160 , an inverse transform unit 165 , an adder 170 , a loop filter unit 180 , and a memory 190 may be included.
  • Each component of the image encoding apparatus may be implemented as hardware or software, or a combination of hardware and software.
  • the function of each component may be implemented as software and the microprocessor may be implemented to execute the function of software corresponding to each component.
  • One image is composed of one or more sequences including a plurality of pictures.
  • Each picture is divided into a plurality of regions, and encoding is performed for each region.
  • one picture is divided into one or more tiles and/or slices.
  • one or more tiles may be defined as a tile group.
  • Each tile or/slice is divided into one or more Coding Tree Units (CTUs).
  • CTUs Coding Tree Units
  • each CTU is divided into one or more CUs (Coding Units) by a tree structure.
  • Information applied to each CU is encoded as a syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as a syntax of the CTU.
  • information commonly applied to all blocks in one slice is encoded as a syntax of a slice header
  • information applied to all blocks constituting one or more pictures is a picture parameter set (PPS) or a picture. encoded in the header.
  • PPS picture parameter set
  • information commonly referenced by a plurality of pictures is encoded in a sequence parameter set (SPS).
  • SPS sequence parameter set
  • VPS video parameter set
  • information commonly applied to one tile or tile group may be encoded as a syntax of a tile or tile group header. Syntax included in the SPS, PPS, slice header, tile or tile group header may be referred to as high-level syntax.
  • the picture divider 110 determines the size of a coding tree unit (CTU).
  • CTU size Information on the size of the CTU (CTU size) is encoded as a syntax of the SPS or PPS and transmitted to the video decoding apparatus.
  • the picture divider 110 divides each picture constituting an image into a plurality of coding tree units (CTUs) having a predetermined size, and then repeatedly divides the CTUs using a tree structure. (recursively) divide.
  • a leaf node in the tree structure becomes a coding unit (CU), which is a basic unit of encoding.
  • CU coding unit
  • a quadtree in which a parent node (or parent node) is divided into four child nodes (or child nodes) of the same size, or a binary tree (BinaryTree) in which a parent node is divided into two child nodes , BT), or a ternary tree (TT) in which a parent node is divided into three child nodes in a 1:2:1 ratio, or a structure in which two or more of these QT structures, BT structures, and TT structures are mixed have.
  • a QuadTree plus BinaryTree (QTBT) structure may be used, or a QuadTree plus BinaryTree TernaryTree (QTBTTT) structure may be used.
  • BTTT may be combined to be referred to as a Multiple-Type Tree (MTT).
  • MTT Multiple-Type Tree
  • FIG. 2 is a diagram for explaining a method of dividing a block using a QTBTTT structure.
  • the CTU may be first divided into a QT structure.
  • the quadtree splitting may be repeated until the size of a splitting block reaches the minimum block size of a leaf node (MinQTSize) allowed in QT.
  • a first flag (QT_split_flag) indicating whether each node of the QT structure is divided into four nodes of a lower layer is encoded by the entropy encoder 155 and signaled to the image decoding apparatus. If the leaf node of the QT is not larger than the maximum block size (MaxBTSize) of the root node allowed in the BT, it may be further divided into any one or more of the BT structure or the TT structure.
  • MaxBTSize maximum block size
  • a plurality of division directions may exist in the BT structure and/or the TT structure. For example, there may be two directions in which the block of the corresponding node is divided horizontally and vertically.
  • a second flag indicating whether or not nodes are split, and a flag indicating additionally splitting direction (vertical or horizontal) if split and/or splitting type (Binary) or Ternary) is encoded by the entropy encoder 155 and signaled to the video decoding apparatus.
  • a CU split flag (split_cu_flag) indicating whether the node is split is encoded it might be
  • the CU split flag (split_cu_flag) value indicates that it is not split
  • the block of the corresponding node becomes a leaf node in the split tree structure and becomes a coding unit (CU), which is a basic unit of coding.
  • the CU split flag (split_cu_flag) value indicates to be split, the image encoding apparatus starts encoding from the first flag in the above-described manner.
  • split_flag split flag indicating whether each node of the BT structure is split into blocks of a lower layer and split type information indicating a split type are encoded by the entropy encoder 155 and transmitted to the image decoding apparatus.
  • a type for dividing the block of the corresponding node into two blocks having an asymmetric shape may further exist.
  • the asymmetric form may include a form in which the block of the corresponding node is divided into two rectangular blocks having a size ratio of 1:3, or a form in which the block of the corresponding node is divided in a diagonal direction.
  • a CU may have various sizes depending on the QTBT or QTBTTT split from the CTU.
  • a block corresponding to a CU to be encoded or decoded ie, a leaf node of QTBTTT
  • a 'current block' a block corresponding to a CU to be encoded or decoded
  • the shape of the current block may be not only a square but also a rectangle.
  • the prediction unit 120 generates a prediction block by predicting the current block.
  • the prediction unit 120 includes an intra prediction unit 122 and an inter prediction unit 124 .
  • each of the current blocks in a picture may be predictively coded.
  • prediction of the current block is performed using an intra prediction technique (using data from the picture containing the current block) or inter prediction technique (using data from a picture coded before the picture containing the current block). can be performed.
  • Inter prediction includes both uni-prediction and bi-prediction.
  • the intra prediction unit 122 predicts pixels in the current block by using pixels (reference pixels) located around the current block in the current picture including the current block.
  • a plurality of intra prediction modes exist according to a prediction direction.
  • the plurality of intra prediction modes may include two non-directional modes including a planar mode and a DC mode and 65 directional modes. According to each prediction mode, the neighboring pixels to be used and the calculation expression are defined differently.
  • directional modes Nos. 67 to 80 and No. -1 to No. -14 intra prediction modes
  • These may be referred to as “wide angle intra-prediction modes”.
  • Arrows in FIG. 3B indicate corresponding reference samples used for prediction, not prediction directions. The prediction direction is opposite to the direction indicated by the arrow.
  • the wide-angle intra prediction modes are modes in which a specific directional mode is predicted in the opposite direction without additional bit transmission when the current block is rectangular. In this case, among the wide-angle intra prediction modes, some wide-angle intra prediction modes available for the current block may be determined by the ratio of the width to the height of the rectangular current block.
  • the wide-angle intra prediction modes having an angle smaller than 45 degrees are available when the current block has a rectangular shape with a height smaller than the width, and a wide angle having an angle greater than -135 degrees.
  • the intra prediction modes are available when the current block has a rectangular shape with a width greater than a height.
  • the intra prediction unit 122 may determine an intra prediction mode to be used for encoding the current block.
  • the intra prediction unit 122 may encode the current block using several intra prediction modes and select an appropriate intra prediction mode to use from the tested modes.
  • the intra prediction unit 122 calculates rate distortion values using rate-distortion (RD) analysis for several tested intra prediction modes, and has the best rate distortion characteristic among the tested modes. It is also possible to select an intra prediction mode having .
  • RD rate-distortion
  • the intra prediction unit 122 selects one intra prediction mode from among a plurality of intra prediction modes, and predicts the current block by using a neighboring pixel (reference pixel) determined according to the selected intra prediction mode and an equation.
  • Information on the selected intra prediction mode is encoded by the entropy encoder 155 and transmitted to an image decoding apparatus.
  • the inter prediction unit 124 generates a prediction block for the current block by using a motion compensation process.
  • the inter prediction unit 124 searches for a block most similar to the current block in the reference picture encoded and decoded before the current picture, and generates a prediction block for the current block using the searched block. Then, a motion vector (MV) corresponding to displacement between the current block in the current picture and the prediction block in the reference picture is generated.
  • MV motion vector
  • motion estimation is performed for a luma component, and a motion vector calculated based on the luma component is used for both the luma component and the chroma component.
  • Motion information including information on a reference picture and information on a motion vector used to predict the current block is encoded by the entropy encoder 155 and transmitted to the image decoding apparatus.
  • the inter prediction unit 124 may perform interpolation on a reference picture or reference block in order to increase prediction accuracy. That is, subsamples between two consecutive integer samples are interpolated by applying filter coefficients to a plurality of consecutive integer samples including the two integer samples.
  • the motion vector can be expressed up to the precision of the decimal unit rather than the precision of the integer sample unit.
  • the precision or resolution of the motion vector may be set differently for each unit of a target region to be encoded, for example, a slice, a tile, a CTU, or a CU.
  • AMVR adaptive motion vector resolution
  • information on the motion vector resolution to be applied to each target region should be signaled for each target region.
  • the target region is a CU
  • information on motion vector resolution applied to each CU is signaled.
  • the information on the motion vector resolution may be information indicating the precision of a differential motion vector, which will be described later.
  • the inter prediction unit 124 may perform inter prediction using bi-prediction.
  • bi-directional prediction two reference pictures and two motion vectors indicating the position of a block most similar to the current block in each reference picture are used.
  • the inter prediction unit 124 selects a first reference picture and a second reference picture from the reference picture list 0 (RefPicList0) and the reference picture list 1 (RefPicList1), respectively, and searches for a block similar to the current block in each reference picture. A first reference block and a second reference block are generated. Then, the first reference block and the second reference block are averaged or weighted to generate a prediction block for the current block.
  • reference picture list 0 consists of pictures before the current picture in display order among the restored pictures
  • reference picture list 1 consists of pictures after the current picture in display order among the restored pictures.
  • the present invention is not necessarily limited thereto, and in display order, the restored pictures after the current picture may be further included in the reference picture list 0, and conversely, the restored pictures before the current picture are additionally added to the reference picture list 1. may be included.
  • the motion information of the current block may be transmitted to the image decoding apparatus by encoding information for identifying the neighboring block. This method is called 'merge mode'.
  • the inter prediction unit 124 selects a predetermined number of merge candidate blocks (hereinafter referred to as 'merge candidates') from neighboring blocks of the current block.
  • a block located in a reference picture (which may be the same as or different from the reference picture used to predict the current block) other than the current picture in which the current block is located may be used as a merge candidate.
  • a block co-located with the current block in the reference picture or blocks adjacent to the co-located block may be further used as merge candidates.
  • the inter prediction unit 124 constructs a merge list including a predetermined number of merge candidates by using these neighboring blocks.
  • a merge candidate to be used as motion information of the current block is selected from among the merge candidates included in the merge list, and merge index information for identifying the selected candidate is generated.
  • the generated merge index information is encoded by the encoder 150 and transmitted to the image decoding apparatus.
  • AMVP Advanced Motion Vector Prediction
  • the inter prediction unit 124 derives motion vector prediction candidates for the motion vector of the current block using neighboring blocks of the current block.
  • the neighboring blocks used to derive the prediction motion vector candidates the left block (L), the upper block (A), the upper right block (AR), the lower left block ( BL), all or part of the upper left block AL may be used.
  • a block located in a reference picture (which may be the same as or different from the reference picture used to predict the current block) other than the current picture in which the current block is located is used as a neighboring block used to derive prediction motion vector candidates.
  • a block co-located with the current block in the reference picture or blocks adjacent to the co-located block may be used.
  • the inter prediction unit 124 derives prediction motion vector candidates by using the motion vectors of the neighboring blocks, and determines a predicted motion vector with respect to the motion vector of the current block by using the prediction motion vector candidates. Then, a differential motion vector is calculated by subtracting the predicted motion vector from the motion vector of the current block.
  • the prediction motion vector may be obtained by applying a predefined function (eg, a median value, an average value operation, etc.) to the prediction motion vector candidates.
  • a predefined function eg, a median value, an average value operation, etc.
  • the image decoding apparatus also knows the predefined function.
  • the neighboring block used to derive the prediction motion vector candidate is a block that has already been encoded and decoded
  • the video decoding apparatus already knows the motion vector of the neighboring block. Therefore, the image encoding apparatus does not need to encode information for identifying the prediction motion vector candidate. Accordingly, in this case, information on a differential motion vector and information on a reference picture used to predict a current block are encoded.
  • the prediction motion vector may be determined by selecting any one of the prediction motion vector candidates.
  • information for identifying the selected prediction motion vector candidate is additionally encoded together with information on the differential motion vector and information on the reference picture used to predict the current block.
  • the subtractor 130 generates a residual block by subtracting the prediction block generated by the intra prediction unit 122 or the inter prediction unit 124 from the current block.
  • the transform unit 140 transforms the residual signal in the residual block having pixel values in the spatial domain into transform coefficients in the frequency domain.
  • the transform unit 140 may transform the residual signals in the residual block by using the entire size of the residual block as a transform unit, or divide the residual block into a plurality of sub-blocks and use the sub-blocks as transform units to perform transformation. You may.
  • the residual signals may be transformed by dividing the sub-block into two sub-blocks, which are a transform region and a non-transform region, and use only the transform region sub-block as a transform unit.
  • the transform region subblock may be one of two rectangular blocks having a size ratio of 1:1 based on the horizontal axis (or vertical axis).
  • the flag (cu_sbt_flag) indicating that only the subblock has been transformed, the vertical/horizontal information (cu_sbt_horizontal_flag), and/or the position information (cu_sbt_pos_flag) are encoded by the entropy encoder 155 and signaled to the video decoding apparatus.
  • the size of the transform region subblock may have a size ratio of 1:3 based on the horizontal axis (or vertical axis). Signaled to the decoding device.
  • the transform unit 140 may individually transform the residual block in a horizontal direction and a vertical direction.
  • various types of transformation functions or transformation matrices may be used.
  • a pair of transform functions for horizontal transformation and vertical transformation may be selected (Multiple Transform Selection: MTS).
  • MTS Multiple Transform Selection
  • the transform unit 140 may select one transform function pair having the best transform efficiency by using the MTS and transform the residual block in horizontal and vertical directions, respectively.
  • the information (mts_idx) on the transform function pair selected according to the MTS is encoded by the entropy encoder 155 and signaled to the image decoding apparatus.
  • the quantization unit 145 quantizes the transform coefficients output from the transform unit 140 using a quantization parameter, and outputs the quantized transform coefficients to the entropy encoding unit 155 .
  • the quantization unit 145 may directly quantize a related residual block for a certain block or frame without transformation.
  • the quantization unit 145 may apply different quantization coefficients (scaling values) according to positions of the transform coefficients in the transform block.
  • a quantization matrix applied to two-dimensionally arranged quantized transform coefficients may be encoded and signaled to an image decoding apparatus.
  • the rearrangement unit 150 may rearrange the coefficient values on the quantized residual values.
  • the reordering unit 150 may change a two-dimensional coefficient array into a one-dimensional coefficient sequence by using coefficient scanning. For example, the reordering unit 150 may output a one-dimensional coefficient sequence by scanning from DC coefficients to coefficients in a high frequency region using a zig-zag scan or a diagonal scan. .
  • a vertical scan for scanning a two-dimensional coefficient array in a column direction and a horizontal scan for scanning a two-dimensional block shape coefficient in a row direction may be used instead of the zig-zag scan according to the size of the transform unit and the intra prediction mode. That is, a scanning method to be used among a zig-zag scan, a diagonal scan, a vertical scan, and a horizontal scan may be determined according to the size of the transform unit and the intra prediction mode.
  • the entropy encoding unit 155 uses various encoding methods such as Context-based Adaptive Binary Arithmetic Code (CABAC) and Exponential Golomb to convert the one-dimensional quantized transform coefficients output from the reordering unit 150 .
  • CABAC Context-based Adaptive Binary Arithmetic Code
  • Exponential Golomb Exponential Golomb
  • the entropy encoding unit 155 encodes information such as CTU size, CU split flag, QT split flag, MTT split type, and MTT split direction related to block splitting, so that the video decoding apparatus divides the block in the same way as the video encoding apparatus. to be able to divide.
  • the entropy encoder 155 encodes information on a prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and intra prediction information (ie, intra prediction) according to the prediction type. mode) or inter prediction information (in the case of the merge mode, the merge index, in the case of the AMVP mode, the reference picture index and the information on the differential motion vector) is encoded.
  • the entropy encoder 155 encodes information related to quantization, that is, information about a quantization parameter and information about a quantization matrix.
  • the inverse quantization unit 160 inverse quantizes the quantized transform coefficients output from the quantization unit 145 to generate transform coefficients.
  • the inverse transform unit 165 reconstructs a residual block by transforming the transform coefficients output from the inverse quantization unit 160 from the frequency domain to the spatial domain.
  • the addition unit 170 restores the current block by adding the reconstructed residual block to the prediction block generated by the prediction unit 120 . Pixels in the reconstructed current block are used as reference pixels when intra-predicting the next block.
  • the loop filter unit 180 reconstructs pixels to reduce blocking artifacts, ringing artifacts, blurring artifacts, etc. generated due to block-based prediction and transformation/quantization. filter on them.
  • the loop filter unit 180 may include all or part of a deblocking filter 182, a sample adaptive offset (SAO) filter 184, and an adaptive loop filter (ALF) 186 as an in-loop filter. have.
  • the deblocking filter 182 filters the boundary between the reconstructed blocks in order to remove a blocking artifact caused by block-by-block encoding/decoding, and the SAO filter 184 and the ALF 186 deblocking filtering Additional filtering is performed on the captured image.
  • the SAO filter 184 and the ALF 186 are filters used to compensate for a difference between a reconstructed pixel and an original pixel caused by lossy coding.
  • the SAO filter 184 improves encoding efficiency as well as subjective image quality by applying an offset in units of CTUs.
  • the ALF 186 performs block-by-block filtering, and the distortion is compensated by applying different filters by classifying the edge of the corresponding block and the degree of change.
  • Information on filter coefficients to be used in the ALF 186 may be encoded and signaled to an image decoding apparatus.
  • the restored block filtered through the deblocking filter 182 , the SAO filter 184 and the ALF 186 is stored in the memory 190 .
  • the reconstructed picture may be used as a reference picture for inter prediction of blocks in a picture to be encoded later.
  • This embodiment relates to encoding of an image (video) using a neural network. More specifically, in the video encoding method for performing versatile video coding (VVC), in order to reduce the computational complexity for intra prediction, based on the explicit and derived characteristics extracted in the coding unit (CU) encoding process, An image encoding method for determining a ternary tree among multi-type trees (MTT) using a light-weight neural network (LNN) is provided.
  • VVC versatile video coding
  • a block corresponding to one node of the tree structure is indicated as a CU.
  • a CU corresponds to a current block to be encoded.
  • 5 is a schematic flowchart of rate-distortion optimization in intra prediction among video encoding methods.
  • An apparatus for encoding an image performing VVC may determine a split type and direction of a CU based on rate-distortion (RD) optimization and split the CU into blocks of an MTT structure.
  • RD rate-distortion
  • the image encoding apparatus performs intra prediction on the CU (S500).
  • the image encoding apparatus calculates the cost value J cur for the current CU according to rate-distortion by performing intra prediction and transformation on the CU.
  • the process of calculating J cur may be subdivided as follows based on a plurality of coding tools.
  • the image encoding apparatus performs a rate-distortion test on 67 intra prediction modes as shown in FIG. 3A ( S520 ), and uses up to 3 neighboring reference lines for intra prediction, Multiple Reference Line (MRL) to perform a rate-distortion test (S522).
  • the video encoding apparatus performs a rate-distortion test by applying an Intra Subblock Partition (ISP) up to five division types (S524), and performs a rate-distortion test by applying a Transform Selection (TS). and (S526), for example, by applying MTS up to five transform pairs, a rate-distortion test is performed (S528).
  • ISP Intra Subblock Partition
  • TS Transform Selection
  • S526 Transform Selection
  • the image encoding apparatus may perform appropriate intra prediction for the CU based on the detailed procedures S520 to S528 and calculate the cost value J cur accordingly.
  • the image encoding apparatus may perform a test based on MTT division.
  • the video encoding apparatus is QT partitioning, BT_H (Binary Tree Horizontal) partitioning, BT_H (Binary Tree Vertical, vertical binary tree) partitioning, TT_H (Ternary Tree Horizontal, horizontal ternary tree) partitioning, and TT_V (Ternary Tree Vertical, TT_V) partitioning.
  • a rate-distortion test is performed by sequentially applying the vertical ternary tree) partitioning (S502 to S510).
  • the video encoding apparatus may calculate costs J QT , J BT_H , J BT_V , J TT_H and J TT_V according to the MTT division application of QT, BT_H, BT_V, TT_H, and TT_V.
  • the image encoding apparatus determines an intra prediction mode and a partition structure for a CU based on the calculated costs (S512).
  • the video encoding apparatus additionally performs a new process (S600 to S616 in the example of FIG. 6 ) using LNN in addition to the rate-distortion optimization method as illustrated in FIG. 5 .
  • FIG. 6 is a schematic flowchart of rate-distortion optimization in intra prediction according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic illustration of a method for determining a ternary tree according to an embodiment of the present disclosure.
  • the process of determining the ternary tree includes feature extraction (S600 and S602), a screening test (S604), and an early TT division (S606 to S616) according to the result, or an existing normal TT segmentation (S508 and S510).
  • the video encoding apparatus extracts a CU segmentation characteristic for use as an input of the LNN in the intra prediction, QT segmentation, and BT segmentation steps (S500 to S506) of the CU (S600 and S602).
  • These CU partitioning characteristics are, (i) coding information according to intra prediction, (ii) parameters according to QT (Quad Tree) partitioning and MTT (Multi-Type Tree) partitioning, (iii) CU, QT partitioning and BT (Binary Tree) partitioning. ) can be extracted based on the RD cost according to the division.
  • the CU segmentation feature may include an explicit feature (EF) and a derived feature (DF).
  • a Coded Block Flag (CBF) is generated based on a flag indicating whether a transform tree structure for a CU exists, and an MTD is generated based on the depth of the MTT (ie, BT/TT) division.
  • BSR Block Shape Ratio
  • lh/(lw+lh) in the case of TT_H division and is defined as lw/(lw+lh) in the case of TT_V division
  • lw and lh respectively indicate the width and height of the luma block.
  • BTD BT Direction
  • BTS BTs' Superiority
  • J cur and J CU J BT_H and It is set to 1 if both J BT_V are less than the other cases
  • the four characteristics, IPM, QP, MRL and MTS show a low correlation with the binary class from -0.07 to 0.02.
  • the PCC value between QTD and MTD is obtained, since there is a strong correlation as -0,65, only one of the two can be used. This strong correlation also exists between ISPs and CBFs. Therefore, in the present disclosure, among the characteristics shown in Table 1, as shown in Table 2, in the order of the significant correlation with the binary class, five, BSR, BTD, BTS, MTD, and CBF, are selected as final characteristics. It is used as input for LNN.
  • MTD and CBF are values that can be directly calculated from a coding tool and a coding parameter after intra prediction for the current CU, and are thus denoted by an explicit characteristic x EF . Since the remaining three characteristics are values that can be calculated based on the block shape and RD cost after BT division is performed, they are expressed as derived characteristics x DF .
  • the derivative feature x DF may be generated in various situations.
  • the derivative characteristic according to the present disclosure is applied when additional MTT splitting for a CU is allowed.
  • This assumption includes the special case where QT partitioning is not allowed for a CU, but BT or TT partitioning is allowed.
  • the CU is a subtree of BT/TT
  • the size of the CU does not satisfy the width/height condition according to QT division
  • the depth of the CU does not satisfy the maximum depth of QT includes
  • J QT may be set as the maximum value of the RD cost.
  • the derivative characteristics can then be varied based on the orientation of the TT structure being tested. For example, the BSR for the TT_H division increases as the height of the block is greater than the width, and the BSR for the TT_V division increases as the width of the block is larger than the height. Therefore, in the case of TT_H partitioning, partitioning including horizontally thin blocks is not preferred.
  • the derivative feature BTD can also be varied based on the orientation of the TT structure being tested. It can be assumed that the most appropriate direction of TT segmentation is associated with the direction of BT segmentation. For example, if RD cost J BT_H is smaller than J BT_V , RD cost J TT_H may also be smaller than J TT_V . Accordingly, the BTD value may be calculated differently depending on which of the TT_H or TT_V division is tested.
  • the image encoding apparatus may perform a screening test (ST) (S604), and may perform early TT segmentation according to the result.
  • ST is defined as shown in Equation 1, and when it is 1, early TT division is performed, and when it is 0, regular TT division (S508 and S510) may be performed.
  • TC is a tree structure channel including a luma channel (LC) and a chroma channel (CC).
  • Pic is the area of the current picture.
  • TD Tue Depth
  • TD is the tree depth for the MTT structure, a value included in ⁇ 0,1,2,3 ⁇ , when the CU violates the picture boundary (when at least one pixel is not included in the Pic) 3 It may be a larger value.
  • the ST according to the present disclosure may be performed based on a dual tree concept and unique features of the MTT structure.
  • the ST may prevent an unexpected (unexpected) division of the BT_H or BT_V division regardless of the RD cost. Such unexpected division may occur at the boundary/corner of a picture when the CU violates the picture boundary, that is, when the size of the picture is not a multiple of the maximum CU.
  • the LNN can make a suboptimal selection according to the lack of reasonable characteristics on the input side, so that the LNN is not used when the CU violates the picture boundary.
  • the LNN is 3 or higher, the LNN is not used.
  • the maximum value of TD is 3, but when a CU violates a picture boundary, TD may increase due to unexpected division as described above.
  • the image encoding apparatus generates an output y h by using the LNN H based on the explicit characteristic and the derived characteristic (S606).
  • LNN H is a lightweight neural network, and may be implemented by including a small number of parameters so that the amount of computation by the neural network does not burden the entire encoding process.
  • the video encoding apparatus compares the output y h with a preset threshold (eg, 0.5) (S608), and if it is greater than the threshold, calculates an RD cost J TT_H for TT_H division (S610).
  • a preset threshold eg, 0.5
  • the entire process of dividing TT_H is omitted.
  • the entire process of TT_H partitioning includes even partitioning of a child node of the CU, but there is no restriction on the tree structure of the child node.
  • J TT_H may be set to the maximum value of the RD cost.
  • the image encoding apparatus generates an output y v by using the LNN V based on the explicit characteristic and the derived characteristic (S612).
  • LNN V is a lightweight neural network.
  • the video encoding apparatus compares the output y v with a preset threshold (eg, 0.5) (S614), and if it is greater than the threshold, calculates an RD cost J TT_V for TT_V division (S616).
  • a preset threshold eg, 0.5
  • the entire process of TT_V division is omitted.
  • the entire process of TT_V partitioning includes partitioning of the child node of the CU, but there is no restriction on the tree structure of the child node.
  • J TT_V may be set to the maximum value of the RD cost.
  • the image encoding apparatus determines an intra prediction mode and a partition structure for the CU based on the calculated cost values (S512).
  • the cost values calculated here include J cur , QT, BT_H and BT_V split according to intra prediction, J QT , J BT_H and J BT_V , and J TT_H and J TT_V according to early TT split or regular TT split.
  • the threshold value for the output y h and the threshold value for y v are exemplified as the same value, but the present invention is not limited thereto and may be different values.
  • the threshold is set to a value lower than 0.5 (e.g., for example, 0.25 or 0.125).
  • the lightweight neural networks LNN H and LNN V may be LNNs having the same structure, but may have different parameter values.
  • the LNN is a fully connected deep neural network, and may include an input layer, a hidden layer, and an output layer.
  • the input layer includes 5 nodes
  • the hidden layer and the output layer include 30 and 1 nodes, respectively. Accordingly, the number of parameters for each layer including the bias can be exemplified as shown in Table 3.
  • the number of parameters to be trained is 221 in total, which is a number that does not burden the increase in complexity of the image encoding apparatus.
  • the output of the LNN is made to exist in the range from 0 to 1. As described above, when the output of LNN is 1, it indicates that TT is the optimal choice for CU in terms of RD cost. On the other hand, when the output of the LNN is 0, it indicates that it is not an optimal choice for the CU.
  • the output of LNN located between 0 and 1 represents the probability of suitability of TT in terms of RD cost.
  • the bar shown in Table 3 is an example of the structure of the LNN and is not necessarily limited thereto.
  • An LNN (ie, LNN H and LNN V , respectively) may be trained based on the class to which the learning characteristic data and the target output are paired.
  • the learning characteristic data may be obtained using the rate-distortion optimization process illustrated in FIG. 6 , based on a QP value different from that used in the actual image encoding process.
  • the target output is a binary value, indicating whether the TT segmentation is optimal for the training feature data.
  • a loss function is defined based on a distance between an output of the LNN for learning characteristic data and a target output, and by updating the parameters of the LNN in a direction to decrease the loss function, training for the LNN may be performed.
  • the distance may be any one capable of expressing a distance difference between two comparison objects, such as an L1 metric and an L2 metric.
  • FIG. 8 is a schematic block diagram of an image encoding apparatus for determining a terrestrial tree according to an embodiment of the present disclosure.
  • An image encoding apparatus determines a ternary tree using an LNN based on rate-distortion optimization.
  • the video encoding apparatus includes an intra prediction unit 124 , a QT/BT cost estimation unit 802 , a second cost estimation unit 804 , and an optimal division determining unit 806 in addition to existing components including a feature extraction unit 810 . ), a screening test unit 812 and an additional component of the first cost estimator 814 .
  • the intra prediction unit 124 calculates the cost value J cur for the current CU according to rate-distortion by performing intra prediction and transformation on the CU using a plurality of coding tools.
  • the image encoding apparatus may perform a test based on MTT division.
  • the QT/BT cost estimator 802 calculates costs J QT , J BT_H and J BT_V according to rate-distortion by sequentially applying the QT division, the BT_H division, and the BT_V division.
  • the second cost estimator 804 performs TT normal division according to the result of the screening test. That is, the second cost estimator 804 calculates the costs J TT_H and J TT_V according to rate-distortion by sequentially applying the TT_H division and the TT_V division.
  • the optimal partition determination unit 806 determines an intra prediction mode and a partition structure for the CU based on the calculated costs.
  • the feature extractor 810 extracts a feature of an image within the CU to be used as an input of the LNN in the intra prediction, QT segmentation, and BT segmentation steps for the CU.
  • the properties for the image may include explicit properties and derived properties, as shown in Table 1.
  • BSR, BTD, BTS, MTD, and CBF are selected as final characteristics in the order in which the correlation with the binary class is significant and used as an input to the LNN.
  • MTD and CBF are values that can be directly calculated from a coding tool and a coding parameter after intra prediction for the current CU, so they are expressed as explicit properties. Since the remaining three characteristics are values that can be calculated based on the block shape and RD cost after BT division is performed, they are expressed as derivative characteristics.
  • the screening test unit 812 may determine whether to perform TT early segmentation by generating a test result as shown in Equation (1).
  • the first cost estimator 814 performs TT early division according to the result of the screening test.
  • the first cost estimator 814 generates an output y h using LNN H based on the explicit and derived characteristics.
  • the RD cost J TT_H for the TT_H split is calculated.
  • the first cost estimator 814 omits the entire process for TT_H division, and then generates an output y v using LNN V based on explicit and derived characteristics.
  • the RD cost J TT_V for the TT_V split is calculated.
  • the entire process of TT_V division is omitted.
  • a ternary tree is determined during MTT (Multi-Type Tree) using a lightweight neural network based on explicit and derived characteristics extracted in a coding unit (CU) encoding process.
  • MTT Multi-Type Tree
  • CU coding unit
  • the image encoding method for determining the ternary tree based on the lightweight neural network as described above uses intra prediction for CUs such as I frame (Intra frame), P frame (Predicted frame), and B frame (Bipredictive frame). It can be applied to all possible types of frames.
  • an image encoding method for determining a ternary tree based on a lightweight neural network may be applied even when inter prediction is used for a CU.
  • each process is sequentially executed in each flowchart according to the present embodiment
  • the present invention is not limited thereto.
  • the flowchart since it may be applicable to change and execute the processes described in the flowchart or to execute one or more processes in parallel, the flowchart is not limited to a time-series order.
  • the non-transitory recording medium includes, for example, all kinds of recording devices in which data is stored in a form readable by a computer system.
  • the non-transitory recording medium includes a storage medium such as an erasable programmable read only memory (EPROM), a flash drive, an optical drive, a magnetic hard drive, and a solid state drive (SSD).
  • EPROM erasable programmable read only memory
  • SSD solid state drive

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La divulgation concernant un procédé de codage d'une image (vidéo) au moyen d'un réseau neuronal, le présent mode de réalisation concerne un procédé de codage d'image permettant de réaliser un codage vidéo polyvalent (VVC), un arbre ternaire étant déterminé parmi des arbres de types multiples (MTT) en utilisant un réseau neuronal léger (LNN) sur la base d'un attribut explicite et d'un attribut dérivé extrait dans un processus de codage d'unité de codage (CU), afin de réduire la complexité de calcul pour une prédiction intra.
PCT/KR2021/016973 2020-11-25 2021-11-18 Codage d'image au moyen d'un réseau neuronal WO2022114669A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR20200159984 2020-11-25
KR10-2020-0159984 2020-11-25
KR10-2021-0030727 2021-03-09
KR1020210030727A KR102466258B1 (ko) 2020-11-25 2021-03-09 신경망을 이용한 영상 부호화

Publications (2)

Publication Number Publication Date
WO2022114669A2 true WO2022114669A2 (fr) 2022-06-02
WO2022114669A3 WO2022114669A3 (fr) 2022-07-21

Family

ID=81756163

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/016973 WO2022114669A2 (fr) 2020-11-25 2021-11-18 Codage d'image au moyen d'un réseau neuronal

Country Status (1)

Country Link
WO (1) WO2022114669A2 (fr)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160065959A1 (en) * 2014-08-26 2016-03-03 Lyrical Labs Video Compression Technology, LLC Learning-based partitioning for video encoding
KR102592721B1 (ko) * 2017-01-11 2023-10-25 한국전자통신연구원 이진 파라미터를 갖는 컨볼루션 신경망 시스템 및 그것의 동작 방법

Also Published As

Publication number Publication date
WO2022114669A3 (fr) 2022-07-21

Similar Documents

Publication Publication Date Title
WO2022114742A1 (fr) Appareil et procédé de codage et décodage vidéo
WO2022186616A1 (fr) Procédé et appareil de codage vidéo au moyen d'une dérivation d'un mode de prédiction intra
WO2023038447A1 (fr) Procédé et dispositif de codage/décodage vidéo
WO2022045738A1 (fr) Codage et décodage d'images basé sur un apprentissage profond à l'aide d'un filtre en boucle
WO2022114669A2 (fr) Codage d'image au moyen d'un réseau neuronal
WO2022177380A1 (fr) Codage et décodage vidéo sur la base d'une prédiction inter
WO2022031003A1 (fr) Procédé de prédiction d'un paramètre de quantification utilisé dans un dispositif de codage/décodage d'image
WO2019083243A1 (fr) Procédé et appareil de filtrage sao
WO2023132509A1 (fr) Procédé d'induction d'un vecteur de mouvement côté décodeur à l'aide d'une corrélation spatiale
WO2024112126A1 (fr) Procédé et dispositif de codage/décodage vidéo pour prédire et modifier une structure de partition d'une unité d'arbre de codage
WO2023075120A1 (fr) Procédé et appareil de codage vidéo utilisant diverses structures de partitionnement en blocs
WO2024058430A1 (fr) Procédé et appareil de codage vidéo qui utilisent de manière adaptative une arborescence unique et une arborescence double dans un bloc
KR102466258B1 (ko) 신경망을 이용한 영상 부호화
WO2023224290A1 (fr) Procédé et appareil de sélection d'échantillon de référence pour dériver un modèle de relation inter-composantes en prédiction intra
WO2023022376A1 (fr) Procédé et dispositif de codage vidéo utilisant un filtre en boucle amélioré
WO2022211463A1 (fr) Procédé et dispositif de codage vidéo utilisant une précision de prédiction intra adaptative
WO2023182673A1 (fr) Procédé et dispositif de codage vidéo à l'aide d'une initialisation d'un modèle contextuel
WO2022191525A1 (fr) Procédé de codage vidéo et appareil utilisant un ordre de balayage en spirale
WO2022119302A1 (fr) Procédé et dispositif de codage vidéo faisant appel à la fusion de blocs
WO2024123035A1 (fr) Procédé et dispositif de codage/décodage vidéo utilisant un mode de partitionnement géométrique
WO2024075983A1 (fr) Procédé et dispositif de codage vidéo utilisant une prédiction de correspondance de modèle intra basée sur des blocs multiples
WO2022191554A1 (fr) Procédé et dispositif de codage de vidéo utilisant une division aléatoire en blocs
WO2022177375A1 (fr) Procédé de génération d'un bloc de prédiction à l'aide d'une somme pondérée d'un signal de prédiction intra et d'un signal de prédiction inter, et dispositif l'utilisant
WO2023132508A1 (fr) Procédé de dérivation intra-mode à base de modèle pour des composants de chrominance
WO2021141372A1 (fr) Codage et décodage d'image basés sur une image de référence ayant une résolution différente

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21898495

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21898495

Country of ref document: EP

Kind code of ref document: A2