CN114173120A - Video coding block division method and video coding block division prediction model training method - Google Patents

Video coding block division method and video coding block division prediction model training method Download PDF

Info

Publication number
CN114173120A
CN114173120A CN202111464395.7A CN202111464395A CN114173120A CN 114173120 A CN114173120 A CN 114173120A CN 202111464395 A CN202111464395 A CN 202111464395A CN 114173120 A CN114173120 A CN 114173120A
Authority
CN
China
Prior art keywords
probability
ctu
prediction residual
video coding
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111464395.7A
Other languages
Chinese (zh)
Inventor
邵宇超
郭磊
陈宇聪
黄跃
张旭
赵明菲
闻兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202111464395.7A priority Critical patent/CN114173120A/en
Publication of CN114173120A publication Critical patent/CN114173120A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present disclosure relates to a video coding block division method in an inter-frame prediction mode and a training method of a video coding block division prediction model, wherein the video coding block division method in the inter-frame prediction mode comprises: acquiring a first prediction residual and a second prediction residual; inputting a first prediction residual and a second prediction residual into a video coding block division prediction model and obtaining a first probability, wherein the first probability is the probability that each edge of each minimum subblock of a CTU to be coded is used as a division boundary of all possible block division modes; for each CU of the CTU to be encoded, performing the following operations: and based on the first probability, determining a second probability corresponding to the current CU when the current CU is divided according to each possible block division mode, removing the block division modes with the second probability not meeting a preset threshold value from all the possible block division modes, and executing rate distortion optimization decision on the rest block division modes to obtain the block division mode of the current CU.

Description

Video coding block division method and video coding block division prediction model training method
Technical Field
The present disclosure relates to the field of audio and video processing, and more particularly, to a method and an apparatus for partitioning a video coding block in an inter-frame prediction mode, and a method and an apparatus for training a partition prediction model of a video coding block.
Background
At present, most of video coding adopts a block-based hybrid coding mode, each frame image of a video is firstly divided into a plurality of coding units taking a block as a unit to perform intra-frame or inter-frame prediction, then, prediction residual errors are subjected to transformation quantization, and finally, mode information, quantized residual errors and the like are subjected to entropy coding to obtain a coded bit stream. In order to adapt to various video contents and video characteristics, a QT (Quadro Tree) and MTT (Multi-Type Tree) combined partitioning mode is adopted in the latest video coding standard H.266/VVC for partitioning blocks, so that the coding efficiency can be obviously improved, but the computational complexity is also obviously improved, and further the time consumed for partitioning video coding blocks is too long.
Disclosure of Invention
The present disclosure provides a method and an apparatus for partitioning a video coding block in an inter-frame prediction mode and a method and an apparatus for training a partition prediction model of a video coding block, so as to at least solve the above-mentioned problems in the related art.
According to a first aspect of the embodiments of the present disclosure, there is provided a method for dividing a video coding block in an inter prediction mode, including: acquiring a first prediction residual and a second prediction residual, wherein the first prediction residual is a prediction residual of a CTU to be coded in a current video frame, the second prediction residual is obtained by presetting the first prediction residual according to a block division mode of a co-located CTU of the CTU to be coded, and the co-located CTU is a part corresponding to the CTU to be coded in a reference frame of the current video frame; inputting the first prediction residual and the second prediction residual into a video coding block partition prediction model and obtaining a first probability, wherein the first probability is the probability that each edge of each minimum subblock of the CTU to be coded is used as a partition boundary of all possible block partition modes; for each CU of the CTU to be encoded, performing the following operations: and determining a second probability corresponding to the current CU when the current CU is divided according to each possible block division mode based on the first probability, removing the block division modes of which the second probabilities do not accord with a preset threshold value from all the possible block division modes, and executing a rate-distortion optimization decision on the rest block division modes to obtain the block division mode of the current CU.
Optionally, the pre-processing the first prediction residual according to the block division mode of the co-located CTU of the CTU to be coded includes: averaging the first prediction residual according to a block division mode of a co-located CTU of the CTU to be coded; and taking the first prediction residual after the average value as the second prediction residual.
Optionally, the block partitioning mode includes a quadtree partitioning mode, a ternary tree partitioning mode, and a binary tree partitioning mode, and both the binary tree partitioning mode and the ternary tree partitioning mode include a horizontal partitioning mode and a vertical partitioning mode, respectively; the removing, from the all possible block division modes, the block division mode with the second probability that does not meet the preset threshold includes: removing the horizontal division mode and/or the vertical division mode with the second probability not meeting a preset threshold from the binary tree division mode and the ternary tree division mode respectively.
Optionally, the preset threshold includes a first preset threshold and a second preset threshold, the second probability corresponding to the horizontal division mode is used as a horizontal probability, and the second probability corresponding to the vertical division mode is used as a vertical probability; the removing, from the binary tree splitting pattern and the ternary tree splitting pattern, the horizontal splitting pattern and/or the vertical splitting pattern of which the second probability does not meet a preset threshold includes: for the binary tree partitioning pattern or the ternary tree partitioning pattern, performing the following operations: comparing the horizontal probability and the vertical probability with the first preset threshold respectively; removing the horizontal division mode and the vertical division mode under the condition that the horizontal probability and the vertical probability are both smaller than the first preset threshold; removing a block division mode of which a probability value is smaller than the first preset threshold value from among the horizontal probability and the vertical probability in a case where one of the horizontal probability and the vertical probability is smaller than the first preset threshold value and the other is greater than or equal to the first preset threshold value; and under the condition that the horizontal probability and the vertical probability are both larger than or equal to the second preset threshold, making a difference between the horizontal probability and the vertical probability, and under the condition that the absolute value of the difference is larger than or equal to the second preset threshold, removing the block division mode with the smaller probability value in the horizontal probability and the vertical probability.
According to a second aspect of the embodiments of the present disclosure, there is provided a training method for a video coding block partition prediction model, including: obtaining a video training sample, wherein the video training sample comprises a first prediction residual, a second prediction residual and a truth vector, the first prediction residual is a prediction residual of a first CTU to be coded in a current video frame, the second prediction residual is obtained by performing preset processing on the first prediction residual according to a block division mode of a second CTU in a reference frame of the current video frame, the second CTU is a co-located CTU of the first CTU, and the truth vector represents a condition that each edge of each minimum subblock in the first CTU is used as a division boundary of a real block division mode; inputting the first prediction residual and the second prediction residual into the video coding block division prediction model to obtain an estimated vector which is formed by the probability that each edge of each minimum subblock in the first CTU is used as the division boundary of all possible block division modes; superposing corresponding weights on the probabilities of preset edges of a plurality of minimum subblocks in the first CTU; calculating a value of a loss function based on the estimation vector and the truth vector on which the respective weights are superimposed; and training the video coding block division prediction model by adjusting the model parameters of the video coding block division prediction model according to the value of the loss function.
Optionally, the pre-processing the first prediction residual according to the block division mode of the second CTU in the reference frame of the current video frame includes: averaging the first prediction residuals according to a block division mode of a second CTU in a reference frame of the current video frame; and taking the first prediction residual after the average value as the second prediction residual.
Optionally, the superimposing respective weights on the probabilities of preset edges of a plurality of minimum subblocks in the first CTU includes: multiplying the probability of edges of a plurality of minimum subblocks as a preset partition boundary by a corresponding weight to enhance the sensitivity of the video coding block partition prediction model to video coding loss, wherein the weight has a value greater than 1, the preset partition boundary is obtained by quadtree partitioning of the first CTU, and the preset partition boundary does not include the edges of the first CTU.
Optionally, the video coding block partition prediction model is obtained by performing preset processing on an intra-frame prediction mode block partition prediction network structure, and the intra-frame prediction mode block partition prediction network structure is realized by a ResNet structure; the preset processing of the intra-frame prediction mode block partition prediction network structure comprises the following steps: and reducing the number of convolution layers of the intra-frame prediction mode block division prediction network structure to one half of the original number.
According to a third aspect of the embodiments of the present disclosure, there is provided a video coding block dividing apparatus in an inter prediction mode, including: a residual acquisition unit configured to: acquiring a first prediction residual and a second prediction residual, wherein the first prediction residual is a prediction residual of a CTU to be coded in a current video frame, the second prediction residual is obtained by presetting the first prediction residual according to a block division mode of a co-located CTU of the CTU to be coded, and the co-located CTU is a part corresponding to the CTU to be coded in a reference frame of the current video frame; a probability acquisition unit configured to: inputting the first prediction residual and the second prediction residual into a video coding block partition prediction model and obtaining a first probability, wherein the first probability is the probability that each edge of each minimum subblock of the CTU to be coded is used as a partition boundary of all possible block partition modes; a processing unit configured to: for each CU of the CTU to be encoded, performing the following operations: and determining a second probability corresponding to the current CU when the current CU is divided according to each possible block division mode based on the first probability, removing the block division modes of which the second probabilities do not accord with a preset threshold value from all the possible block division modes, and executing a rate-distortion optimization decision on the rest block division modes to obtain the block division mode of the current CU.
Optionally, the residual obtaining unit is configured to: averaging the first prediction residual according to a block division mode of a co-located CTU of the CTU to be coded; and taking the first prediction residual after the average value as the second prediction residual.
Optionally, the block partitioning mode includes a quadtree partitioning mode, a ternary tree partitioning mode, and a binary tree partitioning mode, and both the binary tree partitioning mode and the ternary tree partitioning mode include a horizontal partitioning mode and a vertical partitioning mode, respectively; the processing unit is configured to: removing the horizontal division mode and/or the vertical division mode with the second probability not meeting a preset threshold from the binary tree division mode and the ternary tree division mode respectively.
Optionally, the preset threshold includes a first preset threshold and a second preset threshold, the second probability corresponding to the horizontal division mode is used as a horizontal probability, and the second probability corresponding to the vertical division mode is used as a vertical probability; the processing unit is configured to: for the binary tree partitioning pattern or the ternary tree partitioning pattern, performing the following operations: comparing the horizontal probability and the vertical probability with the first preset threshold respectively; removing the horizontal division mode and the vertical division mode under the condition that the horizontal probability and the vertical probability are both smaller than the first preset threshold; removing a block division mode of which a probability value is smaller than the first preset threshold value from among the horizontal probability and the vertical probability in a case where one of the horizontal probability and the vertical probability is smaller than the first preset threshold value and the other is greater than or equal to the first preset threshold value; and under the condition that the horizontal probability and the vertical probability are both larger than or equal to the second preset threshold, making a difference between the horizontal probability and the vertical probability, and under the condition that the absolute value of the difference is larger than or equal to the second preset threshold, removing the block division mode with the smaller probability value in the horizontal probability and the vertical probability.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a training apparatus for a video coding block partition prediction model, including: a sample acquisition unit configured to: obtaining a video training sample, wherein the video training sample comprises a first prediction residual, a second prediction residual and a truth vector, the first prediction residual is a prediction residual of a first CTU to be coded in a current video frame, the second prediction residual is obtained by performing preset processing on the first prediction residual according to a block division mode of a second CTU in a reference frame of the current video frame, the second CTU is a co-located CTU of the first CTU, and the truth vector represents a condition that each edge of each minimum subblock in the first CTU is used as a division boundary of a real block division mode; an estimation vector acquisition unit configured to: inputting the first prediction residual and the second prediction residual into the video coding block division prediction model to obtain an estimated vector which is formed by the probability that each edge of each minimum subblock in the first CTU is used as the division boundary of all possible block division modes; a weight superimposing unit configured to: superposing corresponding weights on the probabilities of preset edges of a plurality of minimum subblocks in the first CTU; a loss function calculation unit configured to: calculating a value of a loss function based on the estimation vector and the truth vector on which the respective weights are superimposed; a model parameter adjustment unit configured to: and training the video coding block division prediction model by adjusting the model parameters of the video coding block division prediction model according to the value of the loss function.
Optionally, the sample acquiring unit is configured to: averaging the first prediction residuals according to a block division mode of a second CTU in a reference frame of the current video frame; and taking the first prediction residual after the average value as the second prediction residual.
Optionally, the weight superposition unit is configured to: multiplying the probability of edges of a plurality of minimum subblocks as a preset partition boundary by a corresponding weight to enhance the sensitivity of the video coding block partition prediction model to video coding loss, wherein the weight has a value greater than 1, the preset partition boundary is obtained by quadtree partitioning of the first CTU, and the preset partition boundary does not include the edges of the first CTU.
Optionally, the video coding block partition prediction model is obtained by performing preset processing on an intra-frame prediction mode block partition prediction network structure, and the intra-frame prediction mode block partition prediction network structure is realized by a ResNet structure; the preset processing of the intra-frame prediction mode block partition prediction network structure comprises the following steps: and reducing the number of convolution layers of the intra-frame prediction mode block division prediction network structure to one half of the original number.
According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a video coding block partitioning method in inter prediction mode or a training method of a video coding block partitioning prediction model according to the present disclosure.
According to a sixth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a video coding block partitioning method in inter prediction mode or a training method of a video coding block partitioning prediction model according to the present disclosure.
According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product, instructions in which are executable by a processor of a computer device to perform a video coding block partitioning method or a training method of a video coding block partitioning prediction model in an inter prediction mode according to the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
according to the video coding block dividing method and device in the inter-frame prediction mode, based on the coding information of the current CTU and the coding information of the co-located CTU, the boundary division probabilities of all sub-blocks in the current CTU are predicted at a CTU level through a video coding block division prediction model, the corresponding CU division mode probability values are calculated at a CU level by utilizing the division probabilities, a part of possible block division modes are removed according to the CU division mode probability values, and rate distortion optimization decision is executed only by traversing the remaining possible block division modes, so that huge calculation amount caused by traversing all the possible block division modes to execute the rate distortion optimization decision can be avoided, and the time for dividing the video coding block is shortened to a great extent.
In addition, according to the training method and device for the video coding block partition prediction model disclosed by the invention, when the video coding block partition prediction model is trained, corresponding weights can be superposed on the boundary probability of each predicted partition sub-block according to the coding loss sensitivity, so that the block partition mode with higher sensitivity to the coding loss can be reserved to a greater extent when the video coding block partition prediction model carries out video coding block partition prediction, and the coding loss is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
Fig. 1 is a schematic structural diagram illustrating a block division mode according to an exemplary embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating a video coding block division method in an inter prediction mode according to an exemplary embodiment of the present disclosure.
Fig. 3 is a schematic diagram illustrating a process of obtaining a second prediction residual according to an exemplary embodiment of the present disclosure.
Fig. 4(a) is a schematic diagram illustrating probability vectors according to an exemplary embodiment of the present disclosure.
Fig. 4(b) is a schematic diagram illustrating a process of acquiring a truth vector according to an exemplary embodiment of the present disclosure.
Fig. 5 is a diagram illustrating a determination of a second probability corresponding to a horizontal block partitioning pattern based on a first probability according to an exemplary embodiment of the present disclosure.
Fig. 6 is a flowchart illustrating a training method of a video coding block partition prediction model according to an exemplary embodiment of the present disclosure.
Fig. 7 is a schematic diagram illustrating a structure of a video coding block partition prediction model according to an exemplary embodiment of the present disclosure.
Fig. 8 is a schematic diagram illustrating a partition boundary resulting from quad-tree partitioning of a first CTU according to an exemplary embodiment of the present disclosure.
Fig. 9 is a block diagram illustrating a video coding block division apparatus in an inter prediction mode according to an exemplary embodiment of the present disclosure.
Fig. 10 is a block diagram illustrating a training apparatus of a video coding block partition prediction model according to an exemplary embodiment of the present disclosure.
Fig. 11 is a block diagram illustrating an electronic device 1100 according to an example embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.
In the video Coding standard h.265/HEVC, a series of CUs (Coding units) are iteratively divided by using a QT structure for CTUs (Coding Tree units). In order to adapt to various video contents and video features, a QT and MTT joint partitioning mode is adopted in a current latest video coding standard h.266/VVC, the MTT includes two partitioning modes of BT (Binary Tree) and TT (Ternary Tree), after a CTU is partitioned according to a QT structure, leaf nodes are further partitioned through the MTT, and the partition types include four types, i.e., horizontal BT, vertical BT, horizontal TT and vertical TT, fig. 1 is a schematic structural diagram showing a block partitioning mode according to an exemplary embodiment of the present disclosure, and a specific structure of each partitioning mode can be clarified with reference to fig. 1.
More complex partitioning approaches, while significantly improving coding efficiency, also cause a significant increase in computational complexity, resulting in a very time-consuming video coding block partitioning process. This is because the increase in the partitioning manner causes the CTU to try the block partitioning mode significantly more times, and RDO (Rate-Distortion Optimization) decision is performed once every time one block partitioning mode is tried. Statistics show that the time ratio of the CTU recursive block division can exceed 90% in the whole video coding process.
For the problem of time consumption in the video coding block division process, an acceleration method for video coding block division under an intra-frame prediction mode and an inter-frame prediction mode has been proposed at present, for example, in a latest acceleration scheme for video coding block division under the intra-frame prediction mode, according to an original pixel value of a CTU with a size of 64x64, a neural network is used to predict the probability of dividing each edge of each 4x4 sub-block in the CTU of 64x64, and the division probability of the current CU under each block division mode is calculated according to the probability value of each divided edge, if a certain division probability is lower than a preset value, the RDO decision process under the division mode corresponding to the division probability is skipped, so that the time for dividing the video coding block is shortened, but for a video sequence with lower precision, it is difficult to control coding loss by using the video coding block division method. However, the current block partitioning acceleration method for the inter prediction mode mainly focuses on the conventional acceleration scheme.
In order to shorten the block division time in the video coding process and reduce the coding loss, the present disclosure provides a new video coding block division method and apparatus in the inter-frame prediction mode and a training method and apparatus of the video coding block division prediction model, and specifically, based on the coding information of the current CTU and the coding information of its co-located CTUs, predicting boundary partition probabilities of sub-blocks in a current CTU at a CTU level through a video coding block partition prediction model, and calculating corresponding CU partition mode probability values at a CU level by using the partition probabilities, removing a portion of the possible block partitioning patterns according to the CU partitioning pattern probability values, and performing a rate-distortion optimization decision by only traversing the remaining possible block partitioning patterns, therefore, huge calculation amount caused by traversing all possible block division modes to execute rate distortion optimization decision can be avoided, and the time for dividing video coding blocks is shortened to a great extent. In addition, according to the training method and device for the video coding block partition prediction model disclosed by the invention, when the video coding block partition prediction model is trained, corresponding weights can be superposed on the boundary probability of each predicted partition sub-block according to the coding loss sensitivity, so that the block partition mode with higher sensitivity to the coding loss can be reserved to a greater extent when the video coding block partition prediction model carries out video coding block partition prediction, and the coding loss is reduced. Hereinafter, a video coding block division method and apparatus in an inter prediction mode and a training method and apparatus of a video coding block division prediction model according to exemplary embodiments of the present disclosure will be described in detail with reference to fig. 2 to 11.
Fig. 2 is a flowchart illustrating a video coding block division method in an inter prediction mode according to an exemplary embodiment of the present disclosure.
Referring to fig. 2, in step 201, a first prediction residual and a second prediction residual may be obtained, where the first prediction residual is a prediction residual of a CTU to be coded in a current video frame, and the second prediction residual is obtained by performing a preset process on the first prediction residual according to a block partition mode of a co-located CTU of the CTU to be coded, where the co-located CTU is a portion of a reference frame of the current video frame corresponding to the CTU to be coded. Here, the size of the CTU to be encoded may be 64 × 64, 32 × 32, etc., without limitation. In the following description, CTUs to be encoded having a size of 64 × 64 are explained.
The first prediction residual is the prediction residual of the CTU to be coded in the current video frame, and can be obtained through two steps of motion estimation and motion compensation. Specifically, in the motion estimation step, a suitable matching region is found for the CTU to be currently encoded in a reference frame of the current video frame (for example, a frame before or after the current video frame), and in the motion compensation step, a difference between the CTU to be currently encoded and the matching region is found (i.e., a prediction residual, for example, obtained by subtracting a pixel value of the current CTU from a pixel value of the matching region). In order to fully utilize the information of the encoded CTU, the second prediction residual may be obtained by performing a preset process on the prediction residual of the CTU to be encoded according to the block partition mode of the co-located CTU of the CTU to be encoded. Here, a specific acquisition process of the second prediction residual may be understood with reference to fig. 3, fig. 3 is a schematic diagram illustrating a process of obtaining the second prediction residual according to an exemplary embodiment of the present disclosure, fig. 3(a) illustrates a prediction residual of a 64 × 64 CTU, wherein each small square represents an 8 × 8 sub-block, a number in each small square represents a difference (i.e., a prediction residual) between a pixel value of the region and a pixel value of a corresponding region in its co-located CTU, fig. 3(b) illustrates a block division mode (illustrated by color blocks of different gray levels) of the co-located CTU of the 64 × 64 CTU, and referring to fig. 3(b), the prediction residual in fig. 3(a) may be averaged in each illustrated block, for example, after the prediction residual at a corresponding position in fig. 3(a) is averaged by a block of 16 × 16 at the upper left corner of fig. 3(b), a value is 3, 4, 3, 3 becomes 3, 3, 3, 3.
In step 202, the first prediction residual and the second prediction residual may be input into a video coding block partition prediction model and a first probability may be obtained, where the first probability is a probability that each edge of each minimum sub-block of the CTU to be coded serves as a partition boundary of all possible block partition modes.
Here, the video coding block partition prediction model may be implemented by an artificial neural network (e.g., a Deep Neural Network (DNN), a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), etc.), and in a specific implementation process, the video coding block partition prediction model may be trained in advance, and a specific structure and a training process of the model will be described in detail with reference to fig. 6 to 8, which will not be described in detail herein. And the minimum subblock of the CTU to be encoded may be an 8 × 8 subblock, a 4 × 4 subblock, etc., without limitation. In the following description, the 8 × 8 subblocks are explained as the minimum subblocks. The first probability is a set of probability vectors composed of probabilities that each edge of each minimum subblock of the CTU to be encoded serves as a partition boundary of all possible block partition modes, the correspondence of each value in the probability vector to the edge of each minimum subblock can be described with reference to fig. 4(a), and fig. 4(a) is a schematic diagram illustrating a probability vector according to an exemplary embodiment of the present disclosure, in fig. 4(a), each edge of each 8 × 8 minimum subblock corresponds to a probability value representing the probability that the edge serves as a partition boundary of a video coding block partition, for example, the probability corresponding to the bottom edge of the minimum subblock at the upper left corner in fig. 4(a) is 0.6.
In step 203, the following operations may be performed for each CU of the CTU to be encoded: and based on the first probability, determining a second probability corresponding to the current CU when the current CU is divided according to each possible block division mode, removing the block division modes with the second probability not meeting a preset threshold value from all the possible block division modes, and executing rate distortion optimization decision on the rest block division modes to obtain the block division mode of the current CU.
Here, the block division patterns include a quadtree division pattern (QT), a treelet division pattern (TT), and a binary tree division pattern (BT), and the binary tree division pattern (BT) and the treelet division pattern (TT) each include a horizontal division pattern (BTH (horizontal binary tree division pattern) and TTH (horizontal treelet division pattern)) and a vertical division pattern (BTV (vertical binary tree division pattern) and TTV (vertical treelet division pattern)), respectively, that is, for each CU, all possible block division patterns thereof are QT, BTH, BTV, TTH, and TTV. According to an exemplary embodiment of the present disclosure, since QT is more sensitive to coding loss, to better control coding loss, the present disclosure sets to always keep performing RDO decision on QT (i.e., not removing QT partition patterns) during the iteration of video coding block partitioning, and to remove block partition patterns of BTH, BTV, TTH, and TTV for which the second probability does not meet a preset threshold, i.e., to remove horizontal partition patterns and/or vertical partition patterns for which the second probability does not meet a preset threshold from binary tree partition patterns and ternary tree partition patterns, respectively. For each block division mode, the corresponding second probability is determined differently, which can be described with reference to fig. 5. Fig. 5 is a diagram illustrating a determination of a second probability corresponding to a horizontal block partitioning pattern based on a first probability according to an exemplary embodiment of the present disclosure. Referring to fig. 5, S1, S2, S3, and S4 indicate average probabilities of partition boundaries when dividing the partition by BTH and TTH (which may be obtained by adding first probabilities of minimum subblocks included in the partition boundaries and dividing the sum by the number of minimum subblocks), the second probability corresponding to BTH is a probability that the values of S2 and S3 are smaller, the second probability corresponding to TTH is a probability that the values of S1 and S4 are smaller, the method of determining the second probability corresponding to the vertical block partition mode is similar to this, and the second probability corresponding to QT is a probability that the values of BTH and BTV are smaller. In some embodiments, after obtaining the first probability, the second probability corresponding to each block division mode may also be obtained by using a conventional machine learning method such as a decision tree, and in particular, may be obtained by inputting the first probability into a neural network (e.g., a ResNet structure, etc.) including one convolutional layer.
According to an exemplary embodiment of the present disclosure, the preset threshold may include a first preset threshold and a second preset threshold, and here, for convenience of description, the second probability corresponding to the horizontal division mode may be taken as the horizontal probability and the second probability corresponding to the vertical division mode may be taken as the vertical probability. In deciding whether to remove a certain block partitioning pattern, the following operations may be performed for either the binary tree partitioning pattern or the ternary tree partitioning pattern: respectively comparing the horizontal probability and the vertical probability with a first preset threshold value; under the condition that both the horizontal probability and the vertical probability are smaller than a first preset threshold value, removing the horizontal division mode and the vertical division mode; removing a block division mode of which the probability value is smaller than a first preset threshold value from among the horizontal probability and the vertical probability under the condition that one of the horizontal probability and the vertical probability is smaller than the first preset threshold value and the other one is larger than or equal to the first preset threshold value; and under the condition that the horizontal probability and the vertical probability are both larger than or equal to a second preset threshold, making a difference between the horizontal probability and the vertical probability, and under the condition that the absolute value of the difference is larger than or equal to the second preset threshold, removing the block division mode with smaller probability value in the horizontal probability and the vertical probability. In some embodiments, since BT and TT have different sensitivities to video coding loss (BT is more sensitive to video coding loss), in order to better control the video coding loss, the first preset threshold may be further divided into a first preset threshold for BT (for example, taking a value of 0.1 to 0.3) and a first preset threshold for TT (for example, taking a value of less than 0.1), and the second preset threshold may be further divided into a second preset threshold for BT (for example, taking a value of 0.1 to 0.15) and a second preset threshold for TT (for example, taking a value of less than 0.1), when the determination operation of removing a certain block division mode is performed, the determination is performed for BT by comparing the horizontal probability corresponding to the HBT and the vertical probability corresponding to the VBT with the first preset threshold for BT and the second preset threshold for BT; for TT, the horizontal probability corresponding to HTT and the vertical probability corresponding to VTT are compared with the first preset threshold value and the second preset threshold value for TT, so that the problem of large coding loss possibly caused by the use of a uniform preset threshold value can be avoided.
According to an exemplary embodiment of the present disclosure, the video coding block partition prediction model provided by the present disclosure may be further improved, where the improved model includes a plurality of parallel outputs, for example, a first probability that each edge of a 16x16 sub-block and an 8x8 sub-block is respectively output as a partition boundary of all possible block partition modes, when determining a block partition mode of each CU of a CTU to be encoded, a second probability corresponding to each possible block partition mode may be determined based on the first probability corresponding to each edge of the 16x16 sub-block, some block partition modes may be removed according to the foregoing method, and a rate-distortion optimization decision may be performed on the remaining block partition modes to obtain a block partition mode of a current CU; then, for a CU of size 16x16, a second probability for each possible block division mode may be determined based on the first probability for each edge of the 8x8 sub-block, and some block division modes may be removed according to the method described above. By such a model, the time for block division can be shortened to some extent, and coding loss can be better controlled.
Fig. 6 is a flowchart illustrating a training method of a video coding block partition prediction model according to an exemplary embodiment of the present disclosure.
Referring to fig. 6, in step 601, a video training sample may be obtained, wherein the video training sample includes a first prediction residual, a second prediction residual, and a truth vector. The first prediction residual is a prediction residual of a first CTU to be coded in the current video frame; the second prediction residual is obtained by presetting the first prediction residual according to a block division mode of a second CTU in a reference frame of the current video frame, wherein the second CTU is a co-located CTU of the first CTU; the truth vector indicates a case where each edge of each minimum subblock in the first CTU serves as a partition boundary of a real block partition pattern. Here, the size of the first CTU and the second CTU may be 64 × 64, 32 × 32, etc., and is not limited thereto, and in the following description, the CTUs having the size of 64 × 64 will be described.
According to an exemplary embodiment of the present disclosure, a desired number of video sequences (e.g., 100 video sequences) may be provided, and for each video sequence, one frame is extracted every 4-5 frames to calculate a prediction residual of a CTU to be coded (i.e., a first CTU) in the frame and use the prediction residual as a first prediction residual, in order to fully utilize information of the coded CTU, a second prediction residual may be obtained by performing a preset process on the first prediction residual according to a block division mode of a second CTU (i.e., a co-located CTU of the first CTU) in a reference frame of the extracted video frame, and according to an exemplary embodiment of the present disclosure, the first prediction residual may be averaged according to the block division mode of the second CTU, and the first prediction residual after the averaging may be used as a second prediction residual. The specific process of obtaining the second prediction residual may refer to the foregoing description about fig. 3(a) and fig. 3(b), and will not be described herein again. The truth vector is a training label of a video coding block partition prediction model, and can be obtained by a real block partition mode of a first CTU in an extracted video frame, for example, the truth vector can be understood by referring back to fig. 4(b), fig. 4(b) is a schematic diagram illustrating an obtaining process of the truth vector according to the exemplary embodiment of the present disclosure, fig. 4(b) illustrates a 64 × 64 CTU, a minimum subblock size of the CTU is 8 × 8, a solid black and bold line represents a partition boundary of the CTU after block partition, an edge of a minimum subblock in the minimum subblock as the partition boundary can be recorded as 1, and otherwise, the edge of each minimum subblock in the CTU can be recorded as 0, so that a truth vector reflecting a case that each edge of each minimum subblock in the CTU serves as a partition boundary of the real block partition mode can be obtained.
In step 602, the first prediction residual and the second prediction residual may be input into a video coding block partition prediction model to obtain an estimated vector consisting of probabilities that each edge of each minimum sub-block in the first CTU is a partition boundary of all possible block partition modes.
According to an exemplary embodiment of the present disclosure, the video coding block partition prediction model may be obtained by performing a preset process on an intra prediction mode block partition prediction network structure, and the intra prediction mode block partition prediction network structure may be implemented by a ResNet structure. Specifically, the video coding block partition prediction model shown in the present disclosure can be obtained by reducing the number of convolution layers of the intra prediction mode block partition prediction network structure to one half of the original number, so as to reduce the amount of computation of the video coding block partition prediction model during operation while taking the video coding block partition prediction effect into consideration. Fig. 7 is a schematic diagram illustrating a structure of a video coding block partition prediction model according to an exemplary embodiment of the present disclosure. Referring to fig. 7, the video coding block partition prediction model performs feature extraction and prediction on a 64 × 64 CTU by 5 convolutional layers, and finally outputs an estimation vector consisting of 112 probability values, each edge of a 8 × 8 minimum subblock inside the 64 × 64 CTU serves as a probability of a partition boundary of all possible block partition modes.
In step 603, corresponding weights may be superimposed on the probabilities of the preset edges of the plurality of smallest subblocks in the first CTU.
According to an exemplary embodiment of the present disclosure, the probability of edges of a plurality of minimum subblocks as a preset partition boundary may be multiplied by corresponding weights to enhance the sensitivity of a video coding block partition prediction model to video coding loss, wherein the value of the weight is greater than 1, and the preset partition boundary may be obtained by quadtree partitioning of a first CTU and does not include the edges of the first CTU. Here, the partition boundary obtained after the quadtree partition may refer to a solid bold black line in fig. 8, and fig. 8 is a schematic diagram illustrating the partition boundary obtained by the quadtree partition of the first CTU according to an exemplary embodiment of the present disclosure. Specifically, in the video encoding process, the larger the block is divided, the greater the encoding loss that may be caused after it is removed, and thus to enhance the sensitivity of the prediction model for video coding block division to the video encoding loss, thereby better controlling the encoding loss in the video encoding process, the probability of the edge of the minimum sub-block that is a boundary of the division by quadtree division may be multiplied by a corresponding weight after obtaining the estimated vector, and the value of the weight is, for example, but not limited to, 1.05.
In step 604, the value of the loss function may be calculated based on the estimation vector and the truth vector on which the corresponding weights are superimposed. Here, the loss Function may be a BCE (Binary Cross Entropy loss Function), MSE (Mean-Square Error Function), or the like, which is not limited thereto.
In step 605, the video coding block partition prediction model may be trained by adjusting model parameters of the video coding block partition prediction model according to the value of the loss function. That is, the parameters of the video coding block partition prediction model may be adjusted by back propagation of the loss calculated by the loss function. Furthermore, in the model training process, a batch of video training samples (e.g., 100 video sequences) may be used to adjust (or update) parameters of the video coding block partition prediction model, and the parameters of the video coding block partition prediction model are iteratively adjusted (or updated) with the goal of minimizing the value of the loss function until the video coding block partition prediction model converges.
According to the video coding block division prediction model trained by the training method, the probability that the edge of the minimum subblock in the CTU to be coded is used as the division boundary of all possible block division modes can be output in an inter-frame prediction mode, so that the probability of all possible block division modes can be obtained according to the obtained probability, RDO decision is avoided for certain block division modes with low possibility, and the block division time of video coding is shortened to the greatest extent. In addition, because the convolution layer of the video coding block division prediction model is deleted, the calculation amount can be reduced when the model runs, so that the block division time of video coding is further shortened, and in the training process of the model, corresponding weights are superposed on the probability of the edge of the minimum subblock which is more sensitive to coding loss, so that the output of the model can better reflect the sensitivity of the edge of each minimum subblock to the coding loss (namely, the probability that the block coding mode which is more sensitive to the coding loss is reserved is higher), and the coding loss in the video coding process can be better controlled.
Fig. 9 is a block diagram illustrating a video coding block division apparatus in an inter prediction mode according to an exemplary embodiment of the present disclosure.
Referring to fig. 9, a video coding block division apparatus 900 in an inter prediction mode according to an exemplary embodiment of the present disclosure may include a residual acquisition unit 901, a probability acquisition unit 902, and a processing unit 903.
The residual obtaining unit 901 may obtain a first prediction residual and a second prediction residual, where the first prediction residual is a prediction residual of a CTU to be coded in a current video frame, and the second prediction residual is obtained by performing a preset process on the first prediction residual according to a block division mode of a co-located CTU of the CTU to be coded, where the co-located CTU is a portion of a reference frame of the current video frame corresponding to the CTU to be coded; the probability obtaining unit 902 may input the first prediction residual and the second prediction residual into the video coding block partition prediction model and obtain a first probability, where the first probability is a probability that each edge of each minimum sub-block of the CTU to be coded is used as a partition boundary of all possible block partition modes; the processing unit 903 may perform the following for each CU of the CTU to be encoded: and based on the first probability, determining a second probability corresponding to the current CU when the current CU is divided according to each possible block division mode, removing the block division modes with the second probability not meeting a preset threshold value from all the possible block division modes, and executing rate distortion optimization decision on the rest block division modes to obtain the block division mode of the current CU.
Since the video coding block division method in the inter prediction mode shown in fig. 2 can be performed by the video coding block division apparatus 900 in the inter prediction mode shown in fig. 9, and the sample acquisition unit 901, the estimated residual acquisition unit 901, the probability acquisition unit 902, and the processing unit 903 can respectively perform operations corresponding to step 201, step 202, and step 203 in fig. 2, any relevant details involved in the operations performed by the units in fig. 9 can be referred to in the corresponding description of fig. 2, and are not repeated here.
Fig. 10 is a block diagram illustrating a training apparatus of a video coding block partition prediction model according to an exemplary embodiment of the present disclosure.
Referring to fig. 10, a training apparatus 1000 of a video coding block partition prediction model according to an exemplary embodiment of the present disclosure may include a sample acquisition unit 1001, an estimation vector acquisition unit 1002, a weight superposition unit 1003, a loss function calculation unit 1004, and a model parameter adjustment unit 1005.
The sample obtaining unit 1001 may obtain video training samples, where the video training samples include a first prediction residual, a second prediction residual, and a true value vector. The first prediction residual is a prediction residual of a first CTU to be coded in the current video frame; the second prediction residual is obtained by presetting the first prediction residual according to a block division mode of a second CTU in a reference frame of the current video frame, wherein the second CTU is a co-located CTU of the first CTU; the truth vector represents the condition that each edge of each minimum subblock in the first CTU is used as a dividing boundary of a real block dividing mode; the estimation vector obtaining unit 1002 may input the first prediction residual and the second prediction residual into the video coding block partition prediction model to obtain an estimated vector composed of estimated probabilities that each edge of each minimum sub-block in the first CTU is used as a partition boundary of all possible block partition modes; the weight superposition unit 1003 superposes corresponding weights on the probabilities of preset edges of a plurality of minimum subblocks in the first CTU; the loss function calculation unit 1004 may calculate a value of a loss function based on the estimation vector and the truth vector on which the corresponding weight is superimposed; the model parameter adjustment unit 1005 may train the video coding block partition prediction model by adjusting a model parameter of the video coding block partition prediction model according to a value of the loss function.
Since the training method of the video coding block partition prediction model shown in fig. 6 can be performed by the training apparatus 1000 of the video coding block partition prediction model shown in fig. 10, and the sample obtaining unit 1001, the estimated vector obtaining unit 1002, the weight superimposing unit 1003, the loss function calculating unit 1004, and the model parameter adjusting unit 1005 can respectively perform operations corresponding to step 601, step 602, step 603, step 604, and step 605 in fig. 6, any relevant details involved in the operations performed with respect to the units in fig. 10 can be referred to the corresponding description with respect to fig. 6, and are not repeated here.
Fig. 11 is a block diagram of an electronic device 1100 according to an example embodiment of the disclosure.
Referring to fig. 11, an electronic device 1100 includes at least one memory 1101 and at least one processor 1102, the at least one memory 1101 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 1102, perform a video coding block partitioning method in an inter prediction mode or a training method of a video coding block partitioning prediction model according to an exemplary embodiment of the present disclosure.
By way of example, the electronic device 1100 may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. Here, the electronic device 1100 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 1100 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the electronic device 1100, the processor 1102 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
The processor 1102 may execute instructions or code stored in the memory 1101, wherein the memory 1101 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.
The memory 1101 may be integrated with the processor 1102, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 1101 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 1101 and the processor 1102 may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the processor 1102 can read files stored in the memory.
In addition, the electronic device 1100 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 1100 may be connected to each other via a bus and/or a network.
According to an exemplary embodiment of the present disclosure, there may also be provided a computer readable storage medium storing instructions which, when executed by at least one processor, cause the at least one processor to perform a video coding block partitioning method in an inter prediction mode or a training method of a video coding block partitioning prediction model according to the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
According to an exemplary embodiment of the present disclosure, there may also be provided a computer program product, instructions in which are executable by a processor of a computer device to perform a video coding block partitioning method or a training method of a video coding block partitioning prediction model in an inter prediction mode according to an exemplary embodiment of the present disclosure.
According to the video coding block dividing method and device in the inter-frame prediction mode, based on the coding information of the current CTU and the coding information of the co-located CTU, the boundary division probabilities of all sub-blocks in the current CTU are predicted at a CTU level through a video coding block division prediction model, the corresponding CU division mode probability values are calculated at a CU level by utilizing the division probabilities, a part of possible block division modes are removed according to the CU division mode probability values, and rate distortion optimization decision is executed only by traversing the remaining possible block division modes, so that huge calculation amount caused by traversing all the possible block division modes to execute the rate distortion optimization decision can be avoided, and the time for dividing the video coding block is shortened to a great extent.
In addition, according to the training method and device for the video coding block partition prediction model disclosed by the invention, when the video coding block partition prediction model is trained, corresponding weights can be superposed on the boundary probability of each predicted partition sub-block according to the coding loss sensitivity, so that the block partition mode with higher sensitivity to the coding loss can be reserved to a greater extent when the video coding block partition prediction model carries out video coding block partition prediction, and the coding loss is reduced.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for partitioning a video coding block in an inter-prediction mode, comprising:
acquiring a first prediction residual and a second prediction residual, wherein the first prediction residual is a prediction residual of a CTU to be coded in a current video frame, the second prediction residual is obtained by presetting the first prediction residual according to a block division mode of a co-located CTU of the CTU to be coded, and the co-located CTU is a part corresponding to the CTU to be coded in a reference frame of the current video frame;
inputting the first prediction residual and the second prediction residual into a video coding block partition prediction model and obtaining a first probability, wherein the first probability is the probability that each edge of each minimum subblock of the CTU to be coded is used as a partition boundary of all possible block partition modes;
for each CU of the CTU to be encoded, performing the following operations:
determining, based on the first probability, a second probability corresponding to when the current CU is partitioned according to each possible block partitioning mode,
and removing the block division modes with the second probability not meeting the preset threshold value from all the possible block division modes, and executing rate distortion optimization decision on the rest block division modes to obtain the block division mode of the current CU.
2. The method of claim 1, wherein the pre-processing the first prediction residual according to the block partitioning mode of the co-located CTU of the CTU to be encoded comprises:
averaging the first prediction residual according to a block division mode of a co-located CTU of the CTU to be coded;
and taking the first prediction residual after the average value as the second prediction residual.
3. The video coding block division method of claim 1,
the block division mode comprises a quadtree division mode, a ternary tree division mode and a binary tree division mode, and the binary tree division mode and the ternary tree division mode respectively comprise a horizontal division mode and a vertical division mode;
the removing, from the all possible block division modes, the block division mode with the second probability that does not meet the preset threshold includes:
removing the horizontal division mode and/or the vertical division mode with the second probability not meeting a preset threshold from the binary tree division mode and the ternary tree division mode respectively.
4. The video coding block division method of claim 3, wherein said preset thresholds comprise a first preset threshold and a second preset threshold, said second probability corresponding to a horizontal division mode is taken as a horizontal probability, and said second probability corresponding to a vertical division mode is taken as a vertical probability;
the removing, from the binary tree splitting pattern and the ternary tree splitting pattern, the horizontal splitting pattern and/or the vertical splitting pattern of which the second probability does not meet a preset threshold includes:
for the binary tree partitioning pattern or the ternary tree partitioning pattern, performing the following operations:
comparing the horizontal probability and the vertical probability with the first preset threshold respectively;
removing the horizontal division mode and the vertical division mode under the condition that the horizontal probability and the vertical probability are both smaller than the first preset threshold;
removing a block division mode of which a probability value is smaller than the first preset threshold value from among the horizontal probability and the vertical probability in a case where one of the horizontal probability and the vertical probability is smaller than the first preset threshold value and the other is greater than or equal to the first preset threshold value;
and under the condition that the horizontal probability and the vertical probability are both larger than or equal to the second preset threshold, making a difference between the horizontal probability and the vertical probability, and under the condition that the absolute value of the difference is larger than or equal to the second preset threshold, removing the block division mode with the smaller probability value in the horizontal probability and the vertical probability.
5. A method for training a video coding block partition prediction model is characterized by comprising the following steps:
obtaining a video training sample, wherein the video training sample comprises a first prediction residual, a second prediction residual and a truth vector, the first prediction residual is a prediction residual of a first CTU to be coded in a current video frame, the second prediction residual is obtained by performing preset processing on the first prediction residual according to a block division mode of a second CTU in a reference frame of the current video frame, the second CTU is a co-located CTU of the first CTU, and the truth vector represents a condition that each edge of each minimum subblock in the first CTU is used as a division boundary of a real block division mode;
inputting the first prediction residual and the second prediction residual into the video coding block division prediction model to obtain an estimated vector which is formed by the probability that each edge of each minimum subblock in the first CTU is used as the division boundary of all possible block division modes;
superposing corresponding weights on the probabilities of preset edges of a plurality of minimum subblocks in the first CTU;
calculating a value of a loss function based on the estimation vector and the truth vector on which the respective weights are superimposed;
and training the video coding block division prediction model by adjusting the model parameters of the video coding block division prediction model according to the value of the loss function.
6. An apparatus for partitioning a video coding block in an inter prediction mode, comprising:
a residual acquisition unit configured to: acquiring a first prediction residual and a second prediction residual, wherein the first prediction residual is a prediction residual of a CTU to be coded in a current video frame, the second prediction residual is obtained by presetting the first prediction residual according to a block division mode of a co-located CTU of the CTU to be coded, and the co-located CTU is a part corresponding to the CTU to be coded in a reference frame of the current video frame;
a probability acquisition unit configured to: inputting the first prediction residual and the second prediction residual into a video coding block partition prediction model and obtaining a first probability, wherein the first probability is the probability that each edge of each minimum subblock of the CTU to be coded is used as a partition boundary of all possible block partition modes;
a processing unit configured to: for each CU of the CTU to be encoded, performing the following operations:
determining, based on the first probability, a second probability corresponding to when the current CU is partitioned according to each possible block partitioning mode,
and removing the block division modes with the second probability not meeting the preset threshold value from all the possible block division modes, and executing rate distortion optimization decision on the rest block division modes to obtain the block division mode of the current CU.
7. An apparatus for training a video coding block partition prediction model, comprising:
a sample acquisition unit configured to: obtaining a video training sample, wherein the video training sample comprises a first prediction residual, a second prediction residual and a truth vector, the first prediction residual is a prediction residual of a first CTU to be coded in a current video frame, the second prediction residual is obtained by performing preset processing on the first prediction residual according to a block division mode of a second CTU in a reference frame of the current video frame, the second CTU is a co-located CTU of the first CTU, and the truth vector represents a condition that each edge of each minimum subblock in the first CTU is used as a division boundary of a real block division mode;
an estimation vector acquisition unit configured to: inputting the first prediction residual and the second prediction residual into the video coding block division prediction model to obtain an estimated vector which is formed by the probability that each edge of each minimum subblock in the first CTU is used as the division boundary of all possible block division modes;
a weight superimposing unit configured to: superposing corresponding weights on the probabilities of preset edges of a plurality of minimum subblocks in the first CTU;
a loss function calculation unit configured to: calculating a value of a loss function based on the estimation vector and the truth vector on which the respective weights are superimposed;
a model parameter adjustment unit configured to: and training the video coding block division prediction model by adjusting the model parameters of the video coding block division prediction model according to the value of the loss function.
8. An electronic device, comprising:
at least one processor;
at least one memory storing computer-executable instructions,
wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the method of video coding block partitioning in inter-prediction mode as claimed in any one of claims 1 to 4 or the method of training of video coding block partitioning prediction models as claimed in claim 5.
9. A computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the method of video coding block partitioning in inter prediction mode as claimed in any one of claims 1 to 4 or the method of training a video coding block partitioning prediction model as claimed in claim 5.
10. A computer program product comprising computer instructions which, when executed by at least one processor, implement the method of video coding block partitioning in inter prediction mode as claimed in any of claims 1 to 4 or the method of training of video coding block partitioning prediction model as claimed in claim 5.
CN202111464395.7A 2021-12-03 2021-12-03 Video coding block division method and video coding block division prediction model training method Pending CN114173120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111464395.7A CN114173120A (en) 2021-12-03 2021-12-03 Video coding block division method and video coding block division prediction model training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111464395.7A CN114173120A (en) 2021-12-03 2021-12-03 Video coding block division method and video coding block division prediction model training method

Publications (1)

Publication Number Publication Date
CN114173120A true CN114173120A (en) 2022-03-11

Family

ID=80482690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111464395.7A Pending CN114173120A (en) 2021-12-03 2021-12-03 Video coding block division method and video coding block division prediction model training method

Country Status (1)

Country Link
CN (1) CN114173120A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610960A (en) * 2023-07-20 2023-08-18 北京万界数据科技有限责任公司 Monitoring management system for artificial intelligence training parameters
CN117676171A (en) * 2024-01-31 2024-03-08 腾讯科技(深圳)有限公司 Three-tree division processing method, equipment and storage medium for coding unit
CN117692663A (en) * 2024-01-31 2024-03-12 腾讯科技(深圳)有限公司 Binary tree partitioning processing method, equipment and storage medium for coding unit
WO2024103987A1 (en) * 2022-11-18 2024-05-23 中兴通讯股份有限公司 Inter-frame prediction method, decoding method, electronic device, and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024103987A1 (en) * 2022-11-18 2024-05-23 中兴通讯股份有限公司 Inter-frame prediction method, decoding method, electronic device, and storage medium
CN116610960A (en) * 2023-07-20 2023-08-18 北京万界数据科技有限责任公司 Monitoring management system for artificial intelligence training parameters
CN116610960B (en) * 2023-07-20 2023-10-13 北京万界数据科技有限责任公司 Monitoring management system for artificial intelligence training parameters
CN117676171A (en) * 2024-01-31 2024-03-08 腾讯科技(深圳)有限公司 Three-tree division processing method, equipment and storage medium for coding unit
CN117692663A (en) * 2024-01-31 2024-03-12 腾讯科技(深圳)有限公司 Binary tree partitioning processing method, equipment and storage medium for coding unit
CN117676171B (en) * 2024-01-31 2024-05-07 腾讯科技(深圳)有限公司 Three-tree division processing method, equipment and storage medium for coding unit
CN117692663B (en) * 2024-01-31 2024-05-24 腾讯科技(深圳)有限公司 Binary tree partitioning processing method, equipment and storage medium for coding unit

Similar Documents

Publication Publication Date Title
CN114173120A (en) Video coding block division method and video coding block division prediction model training method
CN111466115B (en) Intra prediction mode concept for block-wise slice coding
CN108737841B (en) Coding unit depth determination method and device
TWI554086B (en) Decoder, encoder and associated method and computer program
US11070803B2 (en) Method and apparatus for determining coding cost of coding unit and computer-readable storage medium
CN108124154A (en) Fast selecting method, device and the electronic equipment of inter-frame forecast mode
CN108989799B (en) Method and device for selecting reference frame of coding unit and electronic equipment
CN109769119B (en) Low-complexity video signal coding processing method
CN110024397B (en) Method and apparatus for encoding video
KR20190107944A (en) Image processing apparatus for performing filtering on restored images and filtering method thereof
Li et al. Run-time deep learning enhanced fast coding unit decision for high efficiency video coding
CN116489386A (en) VVC inter-frame rapid coding method based on reference block
WO2022022013A1 (en) Mode selection method and apparatus, computer-readable storage medium and electronic device
CN109688411B (en) Video coding rate distortion cost estimation method and device
CN115278235B (en) Video coding method and device, electronic equipment and storage medium
CN103168465A (en) Parametric loop filter
CN114666579A (en) Video coding method and device, electronic equipment and storage medium
CN114143536B (en) Video coding method of SHVC (scalable video coding) spatial scalable frame
WO2023024115A1 (en) Encoding method, decoding method, encoder, decoder and decoding system
CN105208382A (en) Sampling point self-adaptation compensation mode judging method and device
CN105554503B (en) A kind of HEVC coding unit level bit rate control method
CN115086678B (en) Video encoding method and device, and video decoding method and device
CN103841422B (en) The intra-frame prediction method and device of depth image
KR101711894B1 (en) Method and apparatus for encoding video using coding information in upper depth
KR102650523B1 (en) Method and apparatus for end-to-end neural compression by deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination