CN114257810B - Context model selection method, device, equipment and storage medium - Google Patents

Context model selection method, device, equipment and storage medium Download PDF

Info

Publication number
CN114257810B
CN114257810B CN202011009881.5A CN202011009881A CN114257810B CN 114257810 B CN114257810 B CN 114257810B CN 202011009881 A CN202011009881 A CN 202011009881A CN 114257810 B CN114257810 B CN 114257810B
Authority
CN
China
Prior art keywords
coding unit
target
syntax element
prediction
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011009881.5A
Other languages
Chinese (zh)
Other versions
CN114257810A (en
Inventor
朱晗
王英彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011009881.5A priority Critical patent/CN114257810B/en
Priority to PCT/CN2021/118832 priority patent/WO2022063035A1/en
Publication of CN114257810A publication Critical patent/CN114257810A/en
Application granted granted Critical
Publication of CN114257810B publication Critical patent/CN114257810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for selecting a context model, and belongs to the technical field of audio and video. The method comprises the following steps: determining a reference coding unit of a target coding unit; predicting the block division structure of the target coding unit according to the reference coding unit to obtain the block division prediction structure of the target coding unit; and determining a context model respectively adopted by at least one syntax element related to the block division of the target coding unit based on the block division prediction structure of the target coding unit. According to the method and the device, the selection condition of the context model is optimized to improve the entropy coding efficiency and reduce the bit number of the code stream, and the video compression efficiency is improved.

Description

Context model selection method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of audios and videos, in particular to a method, a device, equipment and a storage medium for selecting a context model.
Background
A video signal refers to a sequence of images comprising a plurality of frames. Since the data bandwidth of the digitized video signal is very high, it is difficult for the computer device to directly store and process the video signal, and therefore, a video compression technique is required to reduce the data bandwidth of the video signal.
Video compression techniques are implemented by video coding, and in some mainstream video coding techniques, a hybrid coding framework is adopted to perform a series of operations and processes on an input original video signal. At the encoding end, an encoder performs block division structure, predictive coding, transform coding and quantization, entropy coding or statistical coding and the like on an input original video signal (video sequence) to obtain a video code stream, encapsulates the video code stream to obtain a video track aiming at the video code stream obtained after encoding, and further encapsulates the video track to obtain a video file, so that the video file is stored in the encoder in a structure which is easier to analyze. At the decoding end, the decoder needs to perform inverse operations of decapsulation, decoding, and the like on the already encoded image to present the video content.
In the related art, in order to further compress the video, the block division structure of the video is indicated by syntax elements, and entropy coding needs to be applied to the syntax elements before writing into the code stream, so as to further improve the coding efficiency by utilizing high-order information between the syntax elements. The coding symbol information used as the condition is called context, the probability distribution characteristics of each syntax element are different, and the more conditions selected by the context model, the smaller the conditional entropy obtained, so that the better coding performance can be obtained by increasing the context order. However, as the order of the context increases, the complexity of storing and updating the probability model also increases greatly; reducing the number of probability models may prevent the encoder from making an accurate estimate of the probability, resulting in degraded encoding performance.
Therefore, when designing the context model, both the coding efficiency and the complexity of the probability model implementation are considered.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for selecting a context model, which can be used for improving the efficiency of entropy coding, reducing the bit number of a code stream and improving the video compression effect. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a method for selecting a context model, where the method includes:
determining a reference coding unit of a target coding unit;
predicting the block division structure of the target coding unit according to the reference coding unit to obtain the block division prediction structure of the target coding unit;
determining context models respectively adopted by at least one syntax element related to the block division of the target coding unit based on the block division prediction structure of the target coding unit;
wherein the syntax element is used for indicating a block division structure of the coding unit, and the context model is used for probability estimation of the syntax element.
In another aspect, an embodiment of the present application provides an apparatus for selecting a context model, where the apparatus includes:
a unit determining module for determining a reference coding unit of a target coding unit;
the structure prediction module is used for predicting the block division structure of the target coding unit according to the reference coding unit to obtain the block division prediction structure of the target coding unit;
a model determining module, configured to determine, based on a block partition prediction structure of the target coding unit, context models respectively adopted by at least one syntax element involved in block partition of the target coding unit;
wherein the syntax element is used for indicating a block division structure of the coding unit, and the context model is used for performing probability estimation on the syntax element.
In yet another aspect, embodiments of the present application provide a computer device, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the selection method of the context model as described above.
In yet another aspect, embodiments of the present application provide a computer-readable storage medium having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement a selection method of a context model as described above.
In yet another aspect, embodiments of the present application provide a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the method of selecting a context model as described above.
The technical scheme provided by the embodiment of the application can bring the following beneficial effects:
the block division structure of a certain coding unit is predicted according to a reference coding unit of the coding unit, and then the prediction result of the block division structure of the coding unit is added in the selection process of the context model of the syntax element, so that the selection condition of the context model can be increased or optimized, the entropy coding efficiency is improved, and the bit number of a code stream is reduced. In addition, as the prediction result of the block division structure is only added, more accurate probability estimation can be obtained, and the improvement of the video compression efficiency is facilitated.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a video encoding process provided by an embodiment of the present application;
FIG. 2 is a block partitioning flow diagram provided by an embodiment of the present application;
FIG. 3 is a block partitioning scheme according to an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating a spatial relationship of coding units according to an embodiment of the present application;
FIG. 5 is a block diagram of a communication system provided by one embodiment of the present application;
fig. 6 is a block diagram of a streaming system provided by an embodiment of the present application;
FIG. 7 is a flow diagram of a method for selecting a context model provided by one embodiment of the present application;
FIG. 8 is a schematic diagram of a structure prediction model provided by an embodiment of the present application;
FIG. 9 is a schematic diagram of a prediction process of a structure prediction model provided by an embodiment of the present application;
FIG. 10 is a schematic diagram of an output vector of a structural prediction model provided in one embodiment of the present application;
FIG. 11 is a diagram illustrating a selection process of a context model provided by one embodiment of the present application;
FIG. 12 is a block diagram of a selection device of a context model provided in one embodiment of the present application;
FIG. 13 is a block diagram of a selection apparatus for a context model provided in another embodiment of the present application;
fig. 14 is a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
First, a brief description of the video encoding technique will be given with reference to fig. 1. Please refer to fig. 1, which illustrates a schematic diagram of a video encoding process according to an embodiment of the present application.
A video signal refers to a sequence of images comprising one or more frames. A frame (frame) is a representation of spatial information of a video signal. Taking the YUV mode as an example, one frame includes one luminance sample matrix (Y) and two chrominance sample matrices (Cb and Cr). From the viewpoint of the manner of acquiring the video signal, the method can be divided into two manners, that is, a manner shot by a camera and a manner generated by a computer. Due to the difference of statistical characteristics, the corresponding compression encoding modes may be different.
In some mainstream Video Coding technologies, such as h.265/HEVC (High efficiency Video Coding), h.266/VVC (universal Video Coding) Standard, and AVS (Audio Video Coding Standard) (such as AVS 3), a hybrid Coding framework is adopted to perform a series of operations and processes on an input original Video signal as follows:
1. block Partition Structure (Block Partition Structure): the input image is divided into several non-overlapping processing units, each of which will perform a similar compression operation. This processing Unit is called a CTU (Coding Tree Unit), or LCU (Large Coding Unit). The CTU can continue to perform finer partitioning further down to obtain one or more basic Coding units, called CU (Coding Unit). Each CU is the most basic element in a coding link, and when performing prediction, the CU needs to be further divided into different PUs (prediction units). Described below are various possible encoding schemes for each CU.
2. Predictive Coding (Predictive Coding): the method comprises the modes of intra-frame prediction, inter-frame prediction and the like, and residual video signals are obtained after the original video signals are predicted by selected reconstructed video signals. The encoding side needs to decide for the current CU the most suitable one among the many possible predictive coding modes and inform the decoding side. The intra-frame prediction means that the predicted signal comes from an already encoded and reconstructed region in the same image. Inter-prediction means that the predicted signal comes from a picture (called a reference picture) that is already coded and is different from the current picture.
3. Transform coding and Quantization (Transform & Quantization): the residual video signal is subjected to Transform operations such as DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), etc., to convert the signal into a Transform domain, which is referred to as Transform coefficients. In the signal in the transform domain, a lossy quantization operation is further performed to lose certain information, so that the quantized signal is favorable for compressed representation. In some video coding standards, there may be more than one transform mode that can be selected, so the encoding side also needs to select one of the transforms for the current CU and inform the decoding side. The degree of refinement of the quantization is generally determined by the quantization parameter. QP (Quantization Parameter) values are larger, and coefficients indicating a larger value range are quantized to the same output, so that larger distortion and lower code rate are usually brought; conversely, the QP value is smaller, and the coefficients representing a smaller value range will be quantized to the same output, thus usually causing less distortion and corresponding to a higher code rate.
4. Entropy Coding (Entropy Coding) or statistical Coding: the quantized transform domain signal is subjected to statistical compression coding according to the frequency of each value, and finally, a compressed code stream (or called as a "video code stream", for convenience of description, hereinafter referred to as a "video code stream") of binarization (0 or 1) is output. Meanwhile, the encoding generates other information, such as the selected mode, motion vector, etc., which also needs to be entropy encoded to reduce the code rate. The statistical coding is a lossless coding mode, and can effectively reduce the code rate required by expressing the same signal. Common statistical Coding methods include Variable Length Coding (VLC) or context-based Binary Arithmetic Coding (CABAC).
5. Loop Filtering (Loop Filtering): the already encoded image is subjected to operations of inverse quantization, inverse transformation, and prediction compensation (the operations of the inversions of the above-mentioned operations 2 to 4), and a reconstructed decoded image can be obtained. Compared with the original image, the reconstructed image has some information different from the original image due to quantization, and distortion (distortion) occurs. The distortion degree caused by quantization can be effectively reduced by performing filtering operation on the reconstructed image, such as deblocking (deblocking), SAO (Sample Adaptive Offset), ALF (Adaptive Lattice Filter), or other filters. Since these filtered reconstructed pictures are to be used as reference for subsequent coded pictures for prediction of future signals, the above-mentioned filtering operation is also referred to as loop filtering, and filtering operation within the coding loop.
As can be seen from the above description, at the decoding end, after obtaining the compressed code stream for each CU, the decoder performs entropy decoding on the one hand to obtain various mode information and quantized transform coefficients, and then performs inverse quantization and inverse transform on each transform coefficient to obtain a residual signal; on the other hand, a prediction signal corresponding to the CU can be obtained based on the known coding mode information. And adding the residual signal of the CU to the predicted signal to obtain a reconstructed signal of the CU. The reconstructed values of the decoded image need to undergo loop filtering operations to produce the final output signal.
Please refer to fig. 2, which illustrates a block partitioning procedure provided in an embodiment of the present application, where the block partitioning procedure can be applied to AVS (e.g. AVS 3) or the next generation video codec standard, and the embodiment of the present application does not limit this. As can be seen from fig. 2, the block division process involves three division modes, which are respectively: QT (Quad Tree), BT (Binary Tree), EQT (Extended Quad-Tree). BT and EQT are differentiated by Horizontal and Vertical partitions, and thus BT is further classified into HBT (Horizontal Binary Tree) and VBT (Vertical Binary Tree), and EQT is further classified into HEQT (Horizontal Extended Quad Tree) and VEQT (Vertical Extended Quad Tree). These three division methods will be described below with reference to fig. 3.
For a certain coded tile CTU, the CTU may be divided into several Leaf nodes (Leaf nodes) as a root Node (root). One node corresponds to one image area, if a certain node is not divided any more, the node is called a leaf node, and the image area corresponding to the node forms a CU; if the node continues to be divided, the node may be divided into a plurality of sub-regions by using the above one or a combination of a plurality of division methods, each sub-region corresponds to one sub-node, and then it is necessary to determine whether the sub-nodes continue to be divided. For example, assuming that the division level of the root node is 0, the division level of the child node may be 1 added to the division level of the parent node. In the video coding process, an encoder generally sets a minimum block size of a CU, and in the partitioning process, if the block size of a certain node is equal to the minimum block size, the node defaults to not continue partitioning. For convenience of description, the "image region corresponding to the node" is hereinafter simply referred to as "node".
1. And (5) QT division.
For a certain node, the node can be divided into 4 sub-nodes by adopting a QT division mode. As shown in fig. 3 (a), the node may be divided into four sub-nodes with the same block size according to the QT division (each sub-node has the same width and height, and the width is half of the width of the node before division and the height is half of the height of the node before division).
For example, for a node with a block size of 64 × 64, if the node is not divided, the node directly becomes 1 CU with a block size of 64 × 64; if the node continues to be divided, the node may be divided into 4 nodes having a block size of 32 × 32 in the QT division manner. If the division of a certain node among the 4 nodes having the block size of 32 × 32 is continued, 4 nodes having the block size of 16 × 16 are generated.
2. And (5) dividing BT.
For a certain node, the node can be divided into 2 sub-nodes by adopting a BT division mode. Optionally, the BT partition mode includes two types: an HBT division mode and a VBT division mode. As shown in fig. 3 (b), the HBT division method is to divide the node into two upper and lower sub-nodes with the same block size (each sub-node has the same width and height, and the width is equal to the width of the node before division, and the height is half of the height of the node before division); as shown in fig. 3 c, VBT is a node divided into two sub-nodes of the same size (each sub-node has the same width and height, and the width is half of the width of the node before division, and is higher than the height of the node before division).
For example, if a node having a block size of 64 × 64 is not divided any more, the node directly becomes 1 CU having a block size of 64 × 64; if the node continues to be divided, the node may be divided into 2 nodes with a block size of 64 × 32 according to the HBT division manner, or divided into 2 nodes with a block size of 32 × 64 according to the VBT division manner.
3. And (4) EQT division.
For a certain node, the node may be divided into 4 child nodes by using an EQT division manner. Optionally, the EQT partition includes two types: HEQT partition and VEQT partition. As shown in fig. 3 (d), the HEQT partition manner is to divide a node into an upper sub-region, a middle sub-region and a lower sub-region, and horizontally divide the middle sub-region into a middle left sub-node and a middle right sub-node (the width of the upper sub-node and the width of the lower sub-node are equal to the width of the node before division, and the height of the upper sub-node and the height of the lower sub-node are a quarter of the height of the node before division, and the width of the middle left sub-node and the height of the middle right sub-node are half of the width of the node before division, and the height of the middle left sub-node and the height of the node also are half of the height of the node before division); as shown in fig. 3 (e), the VEQT partition is to partition the node into three left, middle, and right sub-regions, and vertically partition the middle sub-region into two middle, and lower sub-nodes (the width of the two left and right sub-nodes is one fourth of the width of the node before partitioning, higher than the height of the node before partitioning, the width of the two middle, and lower sub-nodes is one half of the width of the node before partitioning, and the height is one half of the height of the node before partitioning).
For example, if a node having a block size of 64 × 64 is not divided any more, the node directly becomes 1 CU having a block size of 64 × 64; if the node is divided continuously, the node can be divided into 4 nodes according to the HEQT division mode, the block sizes of the 4 nodes are respectively 64 × 16, 32 × 32 and 64 × 16, or the node can be divided into 4 sub-nodes according to the VEQT division mode, and the block sizes of the 4 sub-nodes are respectively 16 × 64, 32 × 32 and 16 × 64.
In one example, to further compress the video, the block partition structure of the video is indicated by a syntax element. Corresponding to the three partitioning modes shown in fig. 3 above, the following four syntax elements are defined:
1. QT _ split _ flag (QT split flag): if the value of QT _ split _ flag is "1", it indicates that the partitioning process should use QT partitioning for partitioning; if the value of QT _ split _ flag is "0", this indicates that the partitioning process should not use the QT partition for partitioning.
2. beta _ split _ flag (BT/EQT partition flag): if the value of the beta _ split _ flag is '1', the BT/EQT partition is used for dividing in the dividing process; if the value of beta _ split _ flag is "0", it indicates that the partitioning process should not use the BT/EQT partitioning for partitioning.
3. beta _ split _ type _ flag (BT/EQT partition type flag): if the value of the beta _ split _ type _ flag is '0', the BT division is used when the BT/EQT division is carried out; if the value of the beta _ split _ type _ flag is '1', the EQT division is used when the BT/EQT division is carried out;
4. beta _ split _ dir _ flag (BT/EQT split direction flag): if the value of the beta _ split _ type _ flag is '1', the method indicates that the vertical partition is used when the BT/EQT partition is carried out; if the value of beta _ split _ type _ flag is "0", it indicates that the horizontal division should be used when performing the BT/EQT division.
It should be noted that the names of the above syntax elements and the meanings represented by the values of the syntax elements are only examples, and those skilled in the art will easily understand other implementations after understanding the technical solutions of the present application, and it should be understood that these shall all fall within the protection scope of the present application. For example, for QT _ split _ flag, defining that the value of QT _ split _ flag is "1" indicates that the partitioning process should not be partitioned using QT partitioning, and defining that the value of QT _ split _ flag is "0" indicates that the partitioning process should be partitioned using QT partitioning.
Because the probability distribution characteristics of different syntax elements are different, in order to further compress the video, a plurality of context models are defined for each syntax element in the entropy coding process, and the probability estimation of the syntax elements can be realized through the context models. As shown in table one below, which illustrates the correspondence between the syntax elements and the context model provided by an embodiment of the present application.
Table-correspondence between syntax elements and context models
Figure BDA0002697214760000081
Figure BDA0002697214760000091
The context model corresponding to the syntax element can be located through ctxIdxInc (context index increments) and ctxIdxStart (context index Start), for example, for qt _ split _ flag, if ctxIdxInc is 1, the index of the context model is 11.
In one example, the selection of the context model may be made according to various local information, such as the size of the current coding unit, the size of the neighboring coding units, the partition depth, and the like. As shown in fig. 4, it shows the spatial position relationship between the current coding unit (E) and the adjacent coding units provided by an embodiment of the present application. In one example, in connection with fig. 4, ctxIdxInc for each syntax element is determined as follows:
1. ctxIdxInc of qt _ split _ flag.
ctxIdxInc for qt _ split _ flag is determined according to the following method:
(1) If the current image is an intra-prediction image and the width of E is 128, ctxIdxinc is equal to 3;
(2) Otherwise ("the current image is an intra-prediction image" and "the width of E is 128" are not the same), if a "exists" and the height of a is less than the height of E, and B "exists" and the width of B is less than the width of E, ctxIdxInc is equal to 2;
(3) Otherwise, if a is "present" and the height of a is less than the height of E, or B is "present" and the width of B is less than the width of E, ctxIdxInc is equal to 1;
(4) Otherwise, ctxIdxInc equals 0.
2. ctxIdxInc of beta _ split _ flag.
ctxIdxInc of the beta _ split _ flag is determined according to the following method:
firstly:
(1) ctxIdxInc equals 2 if a is "present" and the height of a is less than the height of E, and B is "present" and the width of B is less than the width of E;
(2) Otherwise, if a is "present" and the height of a is less than the height of E, or B is "present" and the width of B is less than the width of E, ctxIdxInc is equal to 1;
(3) Otherwise, ctxIdxInc equals 0.
Secondly, the method comprises the following steps:
(4) If the product of the width of E multiplied by the height of E is greater than 1024, ctxIdxInc is unchanged;
(5) Otherwise, if the product of the width of E multiplied by the height of E is greater than 256, ctxIdxInc is increased by 3;
(6) Otherwise, ctxIdxInc is increased by 6.
3. ctxIdxInc of beta _ split _ type _ flag.
ctxIdxInc of beta _ split _ type _ flag is determined according to the following method:
(1) If a is "present" and the height of a is less than the height of E, and B is "present" and the width of B is less than the width of E, ctxIdxInc equals 2;
(2) Otherwise, if a is "present" and the height of a is less than the height of E, or B is "present" and the width of B is less than the width of E, ctxIdxInc is equal to 1;
(3) Otherwise, ctxIdxInc equals 0.
4. ctxIdxInc of beta _ split _ dir _ flag.
ctxIdxInc of beta _ split _ dir _ flag is determined according to the following method:
(1) If E has a width of 128 and a height of 64, ctxIdxInc equals 4;
(2) Otherwise, if E has a width of 64 and a height of 128, ctxIdxInc equals 3;
(3) Otherwise, if the height of E is greater than the width of E, ctxIdxInc is equal to 2;
(4) Otherwise, if the width of E is greater than the height of E, ctxIdxInc is equal to 1;
(5) Otherwise, ctxIdxInc equals 0.
Optionally, the more precise the selection condition of the context model of the syntax element, the better the compression effect of the video. Based on this, an embodiment of the present application provides a method for selecting a context model, where a block partition structure of a certain coding unit is predicted according to a reference coding unit of the coding unit, and then a prediction result of the block partition structure of the coding unit is added in a process of selecting the context model of a syntax element, so that a selection condition of the context model can be increased or optimized, thereby improving efficiency of entropy coding and reducing the number of bits of a code stream. In addition, as the prediction result of the block division structure is only added, more accurate probability estimation can be obtained, and the improvement of the video compression efficiency is facilitated.
It should be noted that the method for selecting a context model provided in the embodiment of the present application may be applied to AVS (e.g., AVS 3) or a next generation video codec standard, and the embodiment of the present application does not limit this.
Referring to fig. 5, a simplified block diagram of a communication system is shown according to an embodiment of the present application. Communication system 200 includes a plurality of devices that may communicate with each other over, for example, network 250. By way of example, the communication system 200 includes a first device 210 and a second device 220 interconnected by a network 250. In the embodiment of fig. 5, the first device 210 and the second device 220 perform unidirectional data transfer. For example, the first apparatus 210 may encode video data, such as a video picture stream captured by the first apparatus 210, for transmission over the network 250 to the second apparatus 220. The encoded video data is transmitted in the form of one or more encoded video streams. The second device 220 may receive the encoded video data from the network 250, decode the encoded video data to recover the video data, and display a video picture according to the recovered video data. Unidirectional data transmission is common in applications such as media services.
In another embodiment, the communication system 200 includes a third device 230 and a fourth device 240 that perform bi-directional transmission of encoded video data, which may occur, for example, during a video conference. For bi-directional data transfer, each of the third device 230 and the fourth device 240 may encode video data (e.g., a stream of video pictures captured by the devices) for transmission over the network 250 to the other of the third device 230 and the fourth device 240. Each of third apparatus 230 and fourth apparatus 240 may also receive encoded video data transmitted by the other of third apparatus 230 and fourth apparatus 240, and may decode the encoded video data to recover the video data, and may display video pictures on an accessible display device according to the recovered video data.
In the embodiment of fig. 5, the first device 210, the second device 220, the third device 230, and the fourth device 240 may be computer devices such as a server, a personal computer, and a smart phone, but the principles disclosed herein may not be limited thereto. The embodiment of the application is suitable for a Personal Computer (PC), a mobile phone, a tablet Computer, a media player and/or a special video conference device. Network 250 represents any number of networks that communicate encoded video data between first device 210, second device 220, third device 230, and fourth device 240, including, for example, wired and/or wireless communication networks. The communication network 250 may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the internet. For purposes of this application, the architecture and topology of network 250 may be immaterial to the operation of the present disclosure, unless explained below.
As an example, fig. 6 shows the placement of a video encoder and a video decoder in a streaming environment. The subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV (television), storage of compressed video on Digital media including CD (Compact Disc), DVD (Digital Versatile Disc), memory sticks, and the like.
The streaming system may include an acquisition subsystem 313, which may include a video source 301, such as a digital camera, that creates an uncompressed video picture stream 302. In an embodiment, the video picture stream 302 includes samples taken by a digital camera. The video picture stream 302 is depicted as a thick line to emphasize a high data amount video picture stream compared to the encoded video data 304 (or encoded video code stream), the video picture stream 302 may be processed by an electronic device 320, the electronic device 320 comprising a video encoder 303 coupled to a video source 301. The video encoder 303 may comprise hardware, software, or a combination of hardware and software to implement or perform aspects of the disclosed subject matter as described in more detail below. The video encoder 303 may be a computer device, which refers to an electronic device with data calculation, processing and storage capabilities, such as a PC, a mobile phone, a tablet computer, a media player, a dedicated video conferencing device, a server, etc. The video encoder 303, which is based on the methods provided herein, may be implemented by 1 or more processors or 1 or more integrated circuits.
The encoded video data 304 (or encoded video codestream 304) is depicted as a thin line compared to the video picture stream 302 to emphasize the lower data amount of the encoded video data 304 (or encoded video codestream 304), which may be stored on the streaming server 305 for future use. One or more streaming client subsystems, such as client subsystem 306 and client subsystem 308 in fig. 6, may access streaming server 305 to retrieve copies 307 and 309 of encoded video data 304. The client subsystem 306 may include, for example, a video decoder 310 in an electronic device 330. Video decoder 310 decodes incoming copies 307 of the encoded video data and generates an output video picture stream 311 that may be presented on a display 312, such as a display screen, or another presentation device (not depicted). In some streaming systems, encoded video data 304, copy 307, and copy 309 (e.g., video codestreams) may be encoded according to some video encoding/compression standard.
It should be noted that electronic devices 320 and 330 may include other components (not shown). For example, the electronic device 320 may include a video decoder (not shown), and the electronic device 330 may also include a video encoder (not shown). Wherein the video decoder is configured to decode the received encoded video data; a video encoder is used to encode video data.
The technical solution of the present application will be described below by means of several embodiments.
Referring to fig. 7, a flowchart of a method for selecting a context model according to an embodiment of the present application is shown. The method may be applied in a device for encoding a video sequence, such as the first device 210 in the communication system shown in fig. 5; it can also be applied to a device that decodes encoded video data to recover a video sequence, such as the second device 220 in the communication system shown in fig. 5. The method can comprise the following steps (steps 710-730):
in step 710, a reference coding unit of the target coding unit is determined.
The target coding unit refers to an image unit to be processed in a video coding and decoding process, and may be a current image unit to be processed or an image unit to be processed after the current image unit to be processed. As can be known from the introduction of the above-mentioned "block division structure", in the video compression process, a frame of picture in the video signal may be divided into mutually non-overlapping CTUs, and generally, the CTUs may be referred to as a picture unit to be processed, i.e., a target coding unit. The shape of the target coding unit is not limited in the embodiments of the present application, and optionally, the target coding unit is square, that is, the width and the height of the target coding unit are equal; alternatively, the target coding unit is rectangular, i.e., the width and height of the target coding unit are not equal. The size of the block size of the target coding unit is not limited in the embodiments of the present application, and optionally, the size of the target coding unit is 64 × 64 or 128 × 128 or 128 × 64, and in practical applications, the size of the target coding unit may be determined according to the block size of the maximum coding unit allowed by the video encoder, for example, the block size of the maximum coding unit allowed by the video encoder is 128 × 128, and then the block size of the target coding unit is smaller than or equal to 128 × 128.
The reference coding unit is used for providing reference for block division of the target coding unit, wherein the block division refers to division of a structure of the coding unit. The embodiment of the present application does not limit the relationship between the block sizes of the reference coding unit and the target coding unit, and optionally, the block size of the reference coding unit may be equal to the block size of the target coding unit, or may not be equal to the block size of the target coding unit, for example, smaller than the block size of the target coding unit.
As can be seen from the above description, in the video compression process, a frame of image in the video signal may be divided into mutually non-overlapping CTUs, and the CTUs may continue to be further finely divided to obtain one or more basic coding units, and then video coding is performed based on the one or more basic coding units. In the embodiment of the present application, a process of dividing a CTU into one or more basic coding units is referred to as a structural division process of a coding unit, that is, block division. In the process of probability estimation of the syntax element of the target coding unit, namely in the process of selecting the context model, the block division structure of the target coding unit is added, so that the accuracy of probability estimation can be improved while the increased selection condition of the context model is reduced. However, considering that the video decoder cannot acquire the content information of the current coding unit and the coding units subsequent to the current coding unit in the process of interpreting the video code stream, and thus the video decoder cannot acquire the block division structure of the target coding unit, in the embodiment of the present application, in the selection process of the context model, the block division structure of the target coding unit is predicted by the block division structure of the coding unit (reference coding unit) that has completed the coding process or completed the reconstruction process, and the context model is selected using the predicted block division structure.
It should be noted that, in the case where the technical solution of the present application is applied to a video encoding process, a reference coding unit is referred to as a coding unit that has completed the encoding process; in the case where the technical solution of the present application is applied to a video decoding process, a reference coding unit is referred to as a coding unit that has completed a reconstruction process. In order to ensure that the prediction results of the video decoder and the video encoder for the block division structures of the target coding units are the same, in the embodiment of the present application, the reference coding units used by the video decoder and the video encoder also need to be consistent, that is, the video decoder and the video encoder need to use the reference coding units with the same position information. Optionally, the video encoder may write the position information of the reference coding unit into the video stream, and the video decoder may determine the reference coding unit according to the decoded position information when decoding the video stream; or, position information of the reference coding unit is predefined, and the video encoder and the video decoder both determine the reference coding unit according to the predefined position information; alternatively, the video encoder and the video decoder employ the same determination condition to determine the reference coding unit.
In one example, the number of reference coding units is a positive integer greater than or equal to 2; after the step 710, the method further includes: determining a rate-distortion cost of each reference coding unit; and selecting a preferred coding unit from the at least two reference coding units according to the rate distortion cost of each reference coding unit, wherein the block division structure of the preferred coding unit is used for predicting the block division structure of the target coding unit.
In the process of block division of the target coding unit, the Rate-Distortion (RD) cost of the target coding unit under various division modes is also compared. For a lossy video encoding process, the code rate and distortion are usually inversely related, and a higher compression rate results in a lower code rate and also increases the distortion, and vice versa. For this reason, a compromise is chosen, taking into account the rate-distortion cost. In general, a target coding unit is divided in a block division structure in which a rate-distortion cost is minimum. Therefore, in order to predict the block partition structure of the target coding unit more accurately, in the embodiment of the present application, after the reference coding units are determined, the rate-distortion costs of the respective reference coding units are further compared, and a reference coding unit with a smaller rate-distortion cost is selected as a preferred coding unit, so as to predict the block partition structure of the target coding unit by using the block partition structure of the preferred coding unit.
Optionally, the number of preferred coding units is one; alternatively, the number of coding units is preferably plural. The selection mode of the preferred coding unit is not limited in the embodiment of the application, and optionally, after the rate distortion cost of each reference coding unit is determined, the reference coding units may be sorted in the order of the rate distortion cost from small to large, and the reference coding unit of the first S bits is taken as the preferred coding unit, and S is a positive integer; alternatively, after determining the rate distortion cost of each reference coding unit, the reference coding unit with the rate distortion cost less than the preset threshold may be used as the preferred coding unit.
Since the reference coding unit is a coding unit that has completed the encoding process or a coding unit that has completed the reconstruction process, the content information of the reference coding unit can be known by the video encoder and the video decoder. After the reference coding unit is determined, content information of the reference coding unit, such as Y/U/V components of respective pixels of the reference coding unit, is further acquired to provide a reference for prediction of the block division structure of the target coding unit.
Optionally, in the embodiment of the present application, a certain memory space is allocated for the coding units to store the content information of the coding units, and when the content information of a certain coding unit needs to be accessed, the information stored in the memory location corresponding to the coding unit may be read. Optionally, in order to clarify the memory location of each coding unit, the index value of each coding unit may be determined, and the index value and the memory location are associated, so that content information stored in the corresponding memory location is subsequently read according to the index value of a certain coding unit, and the content information of the coding unit is further acquired.
And 720, predicting the block division structure of the target coding unit according to the reference coding unit to obtain the block division prediction structure of the target coding unit.
After the reference coding unit is obtained, the block division structure of the target coding unit can be predicted according to the reference coding unit, and the block division prediction result of the target coding unit is obtained. The method for predicting the block division structure of the target coding unit is not limited, optionally, the reference coding unit is processed through a trained deep learning model to obtain the block division prediction structure of the target coding unit, the block division structure is predicted through the trained deep learning model, the block division structure can be rapidly predicted, prediction compatibility is improved, and the complex prediction process is avoided being required to be executed for each image unit to be processed; or, similarity matching is carried out between the reference coding unit and the target coding unit, and the block division structure of the reference coding unit with the highest similarity is used as the block division prediction structure; or, comparing the rate distortion costs of the reference coding units, and using the block division structure of the reference coding unit with the smallest rate distortion cost as the block division prediction structure. For a process of predicting a block partition structure of a target coding unit by a deep learning model, please refer to the following method embodiments, which are not repeated herein.
Step 730, determining context models respectively adopted by at least one syntax element related to the block partition of the target coding unit based on the block partition prediction structure of the target coding unit.
As can be seen from the above description, the syntax element is used to indicate the block division structure of the coding unit, and the context model is used to perform probability estimation on the syntax element. In the process of selecting the context model for the syntactic element of the target coding unit, the block division structure of the target coding unit is added, so that the bit number of code streams required by the transmission of the syntactic element can be saved while the selection condition of the added context model is reduced, and the coding efficiency is improved. Therefore, after the block division prediction structure of the target coding unit is obtained through prediction, the block division prediction structure is added in the context model selection process of the syntax element so as to improve the entropy coding efficiency and reduce the bit number of the code stream.
The adding mode of the block division prediction structure in the context model selecting process is not limited, and optionally, the block division prediction structure can be used as a further selecting condition added in the context model selecting process and can also be fused with the selecting condition of the original context model to optimize the selecting condition of the original context model. For an introduction description of the manner in which the block partition prediction structure is added in the process of selecting the context model, please refer to the following method embodiments, which are not described herein again.
In one example, the at least one syntax element involved in the block partitioning of the target coding unit comprises: a first syntax element indicating whether a target coding unit is block-divided in a first division manner, such as "qt _ split _ flag" in the above embodiments; a second syntax element indicating whether a target coding unit is block partitioned by using a second partition manner and/or a third partition manner, such as "beta _ split _ flag" in the above embodiments; a third syntax element, configured to indicate whether the second partition manner or the third partition manner is adopted for block division of the target coding unit, in a case where the second syntax element indicates that the target coding unit is block-divided in the second partition manner or the third partition manner, such as "beta _ split _ type _ flag" in the above embodiment; a fourth syntax element, configured to indicate a dividing direction of the second division manner or the third division manner when the target coding unit is block-divided by the second division manner or the third division manner, such as "beta _ split _ dir _ flag" in the above embodiment. For the meaning of the value of each syntax element, reference may be made to the description of the above embodiments, which are not repeated herein. Optionally, the first partition means comprises QT partition; and/or the second division mode comprises BT division and/or TT (Ternary Tree) division; and/or the third division mode comprises EQT division. It should be noted that the three conditions that the first partition manner includes QT partition, the second partition manner includes BT partition and/or TT partition, and the third partition manner includes EQT partition may not be satisfied at the same time, for example, when the first partition manner includes QT and the second partition manner includes BT, the third partition manner may not include EQT. It should be understood that these solutions are within the scope of the present application.
It should be noted that, during the process of video encoding or video decoding, a selection mode flag (flag), such as spf _ flag, may be predefined to indicate whether to use the context model selection method described in the embodiments of the present application. Taking the selection mode flag as spf _ flag as an example, if spf _ flag =1, executing the context model selection method described in the embodiment of the present application; if spf _ flag =0, the context model selection method described in the embodiment of the present application is not performed, for example, the context model selection method in the original or related art is adopted. By defining the selection mode flag, the compatibility of the technical scheme provided by the embodiment of the present application can be improved, and failure of context model selection caused by failure of some video encoders or video decoders to support the technical scheme provided by the embodiment of the present application is avoided, thereby avoiding influence on video compression efficiency.
In summary, according to the technical solution provided in the embodiment of the present application, a block partition structure of a certain coding unit is predicted according to a reference coding unit of the coding unit, and then a prediction result of the block partition structure of the coding unit is added in a selection process of a context model of a syntax element, so that a selection condition of the context model can be increased or a selection condition of the context model can be optimized, so as to improve entropy coding efficiency and reduce the number of bits of a code stream. In addition, as the prediction result of the block division structure is only added, more accurate probability estimation can be obtained, and the improvement of the video compression efficiency is facilitated.
The embodiment of the application provides a plurality of determination modes aiming at the determination process of the reference coding unit. Next, these determination methods will be described.
In one example, the step 710 includes: acquiring coding units meeting target conditions in a target video frame, wherein the target video frame is a video frame where the target coding unit is located; a coding unit adjacent to the target coding unit is selected as a reference coding unit from among coding units satisfying the target condition.
Coding units spatially adjacent to the target coding unit may serve as reference coding units. Alternatively, the coding unit spatially adjacent to the target coding unit is located in the same video frame as the target coding unit, and therefore, the video frame in which the target coding unit is located, i.e., the target video frame, needs to be determined first. Then, a coding unit satisfying a target condition is selected from coding units included in the target video frame, wherein the target condition includes that the coding process is completed for the video coding process; for the video decoding process, the target condition includes that the reconstruction process has been completed. Then, a coding unit adjacent to the target coding unit may be selected as the reference coding unit from among coding units satisfying the target condition. Alternatively, the coding units adjacent to the target coding unit include a coding unit on the left of the target coding unit, a coding unit on the top, and a coding unit on the top left. For example, as shown in fig. 4, assuming that the target coding unit is coding unit E, the coding units adjacent to the target coding unit include: coding unit a and coding unit B.
In another example, the step 710 includes: determining the position information of a target coding unit in a target video frame, wherein the target video frame is a video frame where the target coding unit is located; acquiring at least one adjacent video frame of a target video frame; and determining the coding unit satisfying the position information in at least one adjacent video frame as a reference coding unit.
Coding units temporally adjacent to the target coding unit may serve as reference coding units. Alternatively, the coding units temporally adjacent to the target coding unit are located in different video frames from the target coding unit, but the position information of the coding units in the video frame where the coding units are located is consistent with the position information of the target coding unit in the video frame where the coding units are located. Therefore, it is necessary to determine the video frame where the target coding unit is located, i.e., the target video frame. Then, it is necessary to determine the position information of the target coding unit in the target video frame and acquire at least one video frame adjacent to the target video frame, that is, at least one adjacent video frame. Then, the coding unit in the at least one neighboring video frame that is consistent with the position information is determined as a reference coding unit, that is, the coding unit in the position corresponding to the position information of the target coding unit in the at least one neighboring video frame is determined as a reference coding unit.
Optionally, for a video encoding process, the neighboring video frames comprise video frames that have completed the encoding process; for the video decoding process, the neighboring video frames include video frames for which the reconstruction process has been completed. By adopting the video frame in which the encoding process has been completed and the video frame in which the reconstruction process has been completed as the neighboring video frames of the target video frame, it can be ensured that the coding unit corresponding to the position information of the target coding unit exists in the neighboring video frames. In the embodiment of the application, the adjacent video frame can be a video frame before the target video frame or a video frame after the target video frame, in practical application, the temporal precedence relationship between the adjacent video frame and the target video frame can refer to a reference mode of inter-frame prediction, and under the condition that the inter-frame prediction adopts forward reference, the adjacent video frame is a video frame before the target video frame; in the case where inter-frame prediction employs backward reference, the adjacent video frame is a video frame subsequent to the target video frame. The number of adjacent video frames is not limited in the embodiment of the application, and optionally, the number of adjacent video frames is the same as the number of video frames referred to by inter-frame prediction; alternatively, the number of adjacent video frames is a preset number.
In yet another example, the step 710 includes: acquiring at least one coding unit stored in a cache; and determining the obtained at least one coding unit as a reference coding unit.
For a video encoding process, at least one encoding unit that has recently completed the encoding process is typically stored in a cache of the video encoder; for a video decoding process, at least one decoding unit that has recently completed the reconstruction process is typically stored in a buffer of the video decoder. Since the coding unit that has completed the encoding process recently or the coding unit that has completed the reconstruction process recently is usually adjacent to the target coding unit in spatial position, the coding unit stored in the buffer may also be used as the reference coding unit of the target coding unit. Based on this, it is necessary to first acquire at least one coding unit stored in the cache, and determine the acquired at least one coding unit as a reference coding unit.
It should be noted that, the above description of the determination manners of the reference coding units is only for convenience of description, and the reference coding units may be determined by combining the determination manners of the at least two reference coding units in practical applications. For example, a reference coding unit is determined in connection with a coding unit spatially adjacent to the target coding unit and a coding unit temporally adjacent to the target coding unit; alternatively, the reference coding unit is determined in combination with coding units spatially adjacent to the target coding unit and coding units stored in the buffer. It should be understood that these are all intended to fall within the scope of the present application.
In summary, according to the technical solution provided by the embodiment of the present application, the accuracy of predicting the block partition structure of the target coding unit can be improved by obtaining the coding units having reference values to the block partition structure of the target coding unit and determining the reference coding units according to the coding units. In addition, in the embodiment of the present application, the reference coding unit is a coding unit that has completed a coding process or completed a reconstruction process, and thus, in the process of predicting the block division structure of the target coding unit, known information and priori knowledge can be fully utilized, waste of these information resources is avoided, and the utilization rate of the information resources is improved. In addition, the embodiment of the application provides various determination modes of the reference coding unit, and the flexibility of determining the reference coding unit is improved.
Next, description will be made regarding a process of predicting the block division structure of the target coding unit by the deep learning model.
In one example, the step 720 includes: and calling a structure prediction model to process the reference coding unit to obtain a block division prediction structure of the target coding unit.
The structure prediction model is used for predicting the block division structure of the coding unit, and in the embodiment of the application, the structure prediction model is a deep learning model, such as a convolutional neural network model. Referring to fig. 8, a schematic diagram of a structure prediction model according to an embodiment of the present application is shown. As can be seen from fig. 8, the structure prediction model converts the prediction process of the block division structure of the target coding unit into a plurality of binary problems, and a process of determining whether a value of a certain syntax element in a certain level of division is 1 or 0 can be regarded as one binary problem.
In the embodiment of the present application, the structure prediction model may directly predict the block division structure of the target coding unit, or may indirectly predict the block division structure of the target coding unit. For example, the structure prediction model shown in fig. 8 may directly predict, according to an input reference coding unit, values of syntax elements related to each level (different partition depths) of the target coding unit to obtain values of the syntax elements related to the partitions of the target coding unit. The block division structure of the indirect prediction target coding unit means that an output vector of a structure prediction model has other physical meanings, and the output vector needs to be further converted to determine values of syntax elements related to each level of division. As shown in fig. 9, the structure prediction model may predict probabilities of respective CU edge partitions in the target coding unit according to the input reference coding unit, and then may further infer a block partition prediction structure of the target coding unit according to the probabilities of the respective CU edge partitions.
It should be noted that, in the embodiment of the present application, only the input of the structure prediction model includes the block partition structure of the reference coding unit as an example for description, in practical applications, the input of the structure prediction model may further include other coding information, such as QP (Quantization Parameter) information, prediction (intra prediction or inter prediction) mode information, and it should be understood that these all fall within the protection scope of the present application.
In the following, the training process of the structure prediction model will be described by taking an example in which the input of the structure prediction model includes a reference coding unit (e.g., Y/U/V components of each pixel of the reference coding unit).
In one example, the training process of the structure prediction model includes the following steps:
(1) At least one training sample is obtained.
The training samples are used in the training process of the structure prediction model, the number of the training samples is not limited in the embodiment of the application, and in practical application, the number of the training samples can be determined by combining the processing capacity of the training equipment of the structure prediction model, the prediction accuracy of the structure prediction model and other factors. In this embodiment of the present application, each training sample includes a block division structure of a first coding unit and a second coding unit, where the second coding unit is a reference coding unit of the first coding unit. The content information of the second coding unit (such as Y/U/V components of each pixel of the second coding unit) and the block division structure of the first coding unit are both information that can be obtained, that is, before the training samples are obtained, the block division process for the first coding unit and the second coding unit has been completed.
Optionally, in the process of obtaining the training samples, a determination manner of the reference coding unit of the first coding unit is consistent with a determination manner of the reference coding unit of the target coding unit in a subsequent process of predicting the block division structure of the target coding unit by using the structure prediction model. The reference coding units are determined in the same determination mode in the model training process and the model using process, so that the prediction accuracy of the structure prediction model can be improved.
(2) And calling a structure prediction model to process the second coding unit to obtain the block division prediction structure of the first coding unit.
In the training process of the structure prediction model, various parameters of the structure prediction model may be preset to construct the structure prediction model, and then the structure prediction model is called to process the second coding unit (for example, Y/U/V components of each pixel of the second coding unit) to predict the block division structure of the first coding unit, so as to obtain the block division prediction structure of the first coding unit.
The basic fact (Ground route) required in the structure prediction model training process, that is, the block division structure of each coding unit, requires that each coding unit traverses all division modes, and determines the final block division structure after comparing the rate distortion costs of the various division modes, and the format of the block division structure needs to conform to the output format of the structure prediction model. Taking the example of directly predicting the block division structure of the target coding unit by the structure prediction model, the basic fact construction process of the structure prediction model is shown in fig. 10. Since the number of channels output by the structure prediction model needs to be fixed, the partition depth predicted by the structure prediction model needs to be defined in advance, and each syntax element value needs to be filled. As shown in fig. 10, the value 100 is an invalid syntax element value added when the syntax element values are filled. The output of the structure prediction model includes these invalid syntax element values, and at this time, the invalid syntax element values may be removed according to the dividing flow of the coding unit shown in fig. 2, that is, the code stream conforming to the preset format may be obtained by inference for the subsequent selection process of the context model.
(3) A prediction loss value of the structure prediction model is calculated based on the block division prediction structure of the first coding unit and the block division structure of the first coding unit.
After the block division prediction structure of the first coding unit is output through the structure prediction model, a prediction loss value of the structure prediction model indicating an error between the block division prediction structure of the first coding unit and the block division structure of the first coding unit may be calculated from the block division prediction structure of the first coding unit and the block division structure of the first coding unit. Alternatively, the predicted loss value of the structure prediction model may be obtained by classifying a network loss function, such as cross entropy.
(4) And adjusting parameters of the structure prediction model according to the prediction loss value.
As can be seen from the above description, in the training process of the structure prediction model, various parameters of the structure prediction model are predefined, and then after the prediction loss value is calculated, various parameters of the structure prediction model can be adjusted according to the prediction loss value, so that the prediction loss value enters the convergence range, thereby completing the training process of the structure prediction model and obtaining the trained structure prediction model.
In summary, according to the technical scheme provided by the embodiment of the present application, the reference coding unit is processed by the deep learning model to predict the block partition structure of the coding unit to be processed, and the block partition structure of the coding unit to be processed can be obtained by directly inputting the reference coding unit into the deep learning model in the prediction process, so that the speed of the prediction process of the block partition structure can be effectively increased. In addition, the deep learning model can be continuously used for multiple times and can be used for predicting the block division structures of various different coding units, so that the compatibility of the technical scheme provided by the embodiment of the application is improved.
Next, a description will be given of how the block division prediction structure is added in the selection process of the context model. In the embodiment of the present application, the block division prediction structure may be used as a further selection condition added in the context model selection process, and may also be fused with the selection condition of the original context model to optimize the selection condition of the original context model.
First, a further selection condition is introduced that illustrates that the block partition prediction structure can be added in the context model selection process.
In one example, step 730 includes: for a target syntax element in the at least one syntax element, determining a predicted value of the target syntax element according to a block partition prediction structure of a target coding unit; obtaining an initial index increment value of a context model adopted by a target syntax element; and determining the index increment value of the context model adopted by the target syntax element according to the predicted value of the target syntax element and the initial index increment value of the context model adopted by the target syntax element.
On one hand, a predicted value (allowslitref) of the target syntax element can be obtained by reasoning from the block partition prediction structure of the target coding unit; on the other hand, through the above-described determination process of the index increment value of the context model adopted by the relevant syntax element, the initial index increment value of the context model adopted by the target syntax element can be obtained. And then further determining an index increment value (ctxIdxInc) of the context model finally adopted by the target syntax element according to the predicted value and the initial index increment value of the target syntax element.
Next, with reference to fig. 4 and the selection method of the context model described in the embodiment of the present application, an exemplary description is given for ctxIdxInc of qt _ split _ flag, ctxIdxInc of bet _ split _ type _ flag, and ctxIdxInc of bet _ split _ dir _ flag, respectively.
1. ctxIdxInc of qt _ split _ flag.
ctxIdxInc for qt _ split _ flag is determined according to the following method:
firstly:
(1) If the current image is an intra-prediction image and the width of E is 128, ctxIdxinc is equal to 3;
(2) Otherwise ("the current image is an intra-prediction image" and "the width of E is 128" are not the same), if a "exists" and the height of a is less than the height of E, and B "exists" and the width of B is less than the width of E, ctxIdxInc is equal to 2;
(3) Otherwise, if a is "present" and the height of a is less than the height of E, or B is "present" and the width of B is less than the width of E, ctxIdxInc equals 1;
(4) Otherwise, ctxIdxInc equals 0.
Secondly, the method comprises the following steps:
ctxIdxInc is increased by 4 allowslitref.
2. ctxIdxInc of beta _ split _ flag.
ctxIdxInc of the beta _ split _ flag is determined according to the following method:
firstly:
(1) If a is "present" and the height of a is less than the height of E, and B is "present" and the width of B is less than the width of E, ctxIdxInc equals 2;
(2) Otherwise, if a is "present" and the height of a is less than the height of E, or B is "present" and the width of B is less than the width of E, ctxIdxInc is equal to 1;
(3) Otherwise, ctxIdxInc equals 0.
Secondly, the method comprises the following steps:
(4) If the product of the width of E multiplied by the height of E is greater than 1024, ctxIdxInc is unchanged;
(5) Otherwise, if the product of the width of E multiplied by the height of E is greater than 256, ctxIdxInc is increased by 3;
(6) Otherwise, ctxIdxInc is increased by 6.
And finally:
ctxIdxInc is increased by 9 allowslitref.
3. ctxIdxInc of bet _ split _ type _ flag.
ctxIdxInc of beta _ split _ type _ flag is determined according to the following method:
firstly:
(1) If a is "present" and the height of a is less than the height of E, and B is "present" and the width of B is less than the width of E, ctxIdxInc equals 2;
(2) Otherwise, if a is "present" and the height of a is less than the height of E, or B is "present" and the width of B is less than the width of E, ctxIdxInc is equal to 1;
(3) Otherwise, ctxIdxInc equals 0.
Secondly, the method comprises the following steps:
ctxIdxInc is increased by 3 allowslitref.
4. ctxIdxInc of beta _ split _ dir _ flag.
ctxIdxInc of beta _ split _ dir _ flag is determined according to the following method:
firstly:
(1) If E has a width of 128 and a height of 64, ctxIdxInc equals 4;
(2) Otherwise, if E has a width of 64 and a height of 128, ctxIdxInc equals 3;
(3) Otherwise, if the height of E is greater than the width of E, ctxIdxInc is equal to 2;
(4) Otherwise, if the width of E is greater than the height of E, ctxIdxInc is equal to 1;
(5) Otherwise, ctxIdxInc equals 0.
Secondly, the method comprises the following steps:
ctxIdxInc is increased by 5 allowslitref.
As can be seen from the above examples, by using the block partition prediction structure as a further selection condition added in the context model selection process, the context models that can be used by the original respective syntax elements can be expanded, for example, for the syntax element qt _ split _ flag, 4 context models, namely, the context models with ctxIdxStart of 10 and ctxidxinc of 4, 5, 6, and 7, are added. By expanding the number of the context models, the probability estimation of the syntactic elements can be more accurate, thereby reducing the bit number of the code rate.
Next, the block partition prediction structure is described to be fused with the selection conditions of the original context model, so as to optimize the selection conditions of the original context model.
In another example, step 730 includes: for a target syntax element in the at least one syntax element, determining a predicted value of the target syntax element according to a block partition prediction structure of a target coding unit; obtaining a determining condition of an index increment value of a context model adopted by a target syntax element; and determining an index increment value of a context model adopted by the target syntax element according to the predicted value and the determination condition of the target syntax element.
The predicted value (allowslitref) of the target syntax element can be obtained through reasoning from the block division prediction structure of the target coding unit, and then the predicted value of the target syntax element is fused with the determination condition of the index increment value of the original context model to determine the index increment value (ctxIdxInc) of the context model adopted by the target syntax element. The predicted value of the syntax element is merged into the selection condition of the original context model, so that the selection condition of the original context model can be optimized, the introduction of too many selection conditions of the context model is avoided, the entropy coding efficiency is improved, the complexity of the video compression process is avoided being too high, and the improvement of the video compression efficiency is facilitated.
Next, with reference to fig. 4 and the selection method of the context model described in the embodiment of the present application, an exemplary description is given for ctxIdxInc of qt _ split _ flag, ctxIdxInc of bet _ split _ type _ flag, and ctxIdxInc of bet _ split _ dir _ flag, respectively.
1. ctxIdxInc of qt _ split _ flag.
ctxIdxInc for qt _ split _ flag is determined according to the following method:
(1) If the current image is an intra-frame prediction image and the width of E is 128 or allowslitRef is 1, ctxIdxinc is equal to 3;
(2) Otherwise ("the current image is an intra-prediction image and the width of E is 128" and "allowslitref is 1" are both not satisfied "), if a" exists "and the height of a is less than the height of E, and B" exists "and the width of B is less than the width of E, ctxIdxInc is equal to 2;
(3) Otherwise, if a is "present" and the height of a is less than the height of E, or B is "present" and the width of B is less than the width of E, ctxIdxInc is equal to 1;
(4) Otherwise, ctxIdxInc equals 0.
2. ctxIdxInc of beta _ split _ flag.
ctxIdxInc of the beta _ split _ flag is determined according to the following method:
firstly:
(1) If a is "present" and the height of a is less than the height of E, and B is "present" and the width of B is less than the width of E, ctxIdxInc equals 2;
(2) Otherwise, if the two conditions that "a 'exists' and the height of a is less than the height of E" and "B 'exists' and the width of B is less than the width of E" are not the same, ctxIdxInc is equal to 1 if a "exists" and the height of a is less than the height of E, or B "exists" and the width of B is less than the width of E;
(3) Otherwise, ctxIdxInc equals 0.
Secondly, the method comprises the following steps:
(4) If the product of the width of E multiplied by the height of E is greater than 1024 or allowslitRef is 1, ctxIdxInc is unchanged;
(5) Otherwise ("both conditions of the product of the width of E multiplied by the height of E being greater than 1024" and "allowslitref being 1" do not hold), if the product of the width of E multiplied by the height of E is greater than 256, ctxIdxInc is increased by 3;
(6) Otherwise, ctxIdxInc is increased by 6.
3. ctxIdxInc of beta _ split _ type _ flag.
ctxIdxInc of beta _ split _ type _ flag is determined according to the following method:
(1) ctxIdxInc equals 2 if a is "present" and the height of a is less than the height of E, and B is "present" and the width of B is less than the width of E, or allowslitref is 1;
(2) Otherwise ("a 'exists' and a height is less than E's height, and B' exists 'and B width is less than E's width" and "allowslitref is 1"), if a "exists" and a's height is less than E's height, or B "exists" and B's width is less than E's width, ctxIdxInc is equal to 1;
(3) Otherwise, ctxIdxInc equals 0.
4. ctxIdxInc of beta _ split _ dir _ flag.
ctxIdxInc of beta _ split _ dir _ flag is determined according to the following method:
(1) ctxIdxInc equals 4 if E has a width of 128 and a height of 64, or allowslitref is 1;
(2) Otherwise ("neither condition of width 128 and height 64 of E" nor "allowslitref being 1") if width 64 and height 128 of E, ctxIdxInc equals 3;
(3) Otherwise, if the height of E is greater than the width of E, ctxIdxInc is equal to 2;
(4) Otherwise, if the width of E is greater than the height of E, ctxIdxInc is equal to 1;
(5) Otherwise, ctxIdxInc is equal to 0.
In summary, according to the technical scheme provided by the embodiment of the present application, the block division prediction structure is used as a further selection condition added in the context model selection process, so that the context models that can be used by the original syntax elements can be expanded, and since the number of the context models is expanded, the selection of the context models is more accurate, the probability estimation of the syntax elements is more accurate, and the bit number of the code rate is reduced. In addition, according to the technical scheme provided by the embodiment of the application, the predicted value of the syntax element is fused into the selection condition of the original context model, so that the selection condition of the original context model can be optimized, too many selection conditions of the context model are avoided, the bit number of a code stream required by syntax element transmission is saved, the complexity of a video compression process is avoided being too high, and the improvement of the video compression efficiency is facilitated.
Referring to fig. 11, a schematic diagram of a selection process of a context model according to an embodiment of the present application is shown. The method may be applied in a device for encoding a video sequence, such as the first device 210 in the communication system shown in fig. 5; it can also be applied to a device that decodes encoded video data to recover a video sequence, such as the second device 220 in the communication system shown in fig. 5.
As shown in fig. 11, a reference coding unit of a target coding unit is determined before context model selection is performed on a syntax element involved in block division of the target coding unit. The reference coding unit of the target coding unit may include the following coding units: the encoding unit spatially adjacent to the target encoding unit (the encoding unit adjacent to the target encoding unit in the video frame where the target encoding unit is located), the encoding unit temporally adjacent to the target encoding unit (the encoding unit corresponding to the target encoding unit in the adjacent video frame where the target encoding unit is located), and the encoding unit stored in the cache.
The reference coding units are all coding units that have completed the encoding process or completed the reconstruction process, and therefore, the content information of the reference coding units can be obtained by a video encoder or a video decoder. After the reference coding unit is determined, the Y/U/V components of each pixel of the reference coding unit may be acquired from the content information of the reference coding unit.
In the embodiment of the present application, a deep learning model (structure prediction model) is employed to predict the block division structure of the target coding unit. After the reference coding unit is obtained, calling a structure prediction model to process Y/U/V components of each pixel of the reference coding unit, so as to obtain a block division prediction structure of the target coding unit.
Then, in the selection process of the context model adopted by the syntax element related to the block division of the target coding unit, the block division prediction structure of the target coding unit is added to determine a more accurate context model for the syntax element.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 12, a block diagram of a device for selecting a context model according to an embodiment of the present application is shown. The device has the function of implementing the above example of the context model selection method, and the function can be implemented by hardware or by hardware executing corresponding software. The apparatus may be a device that encodes a video sequence, a device that decodes encoded video data to recover a video sequence, or may be provided in the above-mentioned device. The apparatus 1200 may include: a unit determination module 1210, a structure prediction module 1220, and a model determination module 1230.
A unit determining module 1210 for determining a reference coding unit of the target coding unit.
The structure prediction module 1220 is configured to predict the block partition structure of the target coding unit according to the reference coding unit, so as to obtain the block partition prediction structure of the target coding unit.
A model determining module 1230, configured to determine context models respectively adopted by at least one syntax element involved in block partitioning of the target coding unit based on a block partitioning prediction structure of the target coding unit; wherein the syntax element is used for indicating a block division structure of the coding unit, and the context model is used for performing probability estimation on the syntax element.
In one example, the model determining module 1230 is configured to: for a target syntax element of the at least one syntax element, determining a predicted value of the target syntax element according to a block partition prediction structure of the target coding unit; obtaining an initial index increment value of a context model adopted by the target syntax element; and determining the index increment value of the context model adopted by the target syntax element according to the predicted value of the target syntax element and the initial index increment value of the context model adopted by the target syntax element.
In one example, the model determining module 1230 is configured to: for a target syntax element of the at least one syntax element, determining a predicted value of the target syntax element according to a block partition prediction structure of the target coding unit; obtaining a determining condition of an index increment value of a context model adopted by the target syntax element; and determining an index increment value of a context model adopted by the target syntax element according to the predicted value of the target syntax element and the determination condition.
In an example, the structure prediction module 1220 is configured to: and calling a structure prediction model to process the reference coding unit to obtain a block division prediction structure of the target coding unit, wherein the structure prediction model is used for predicting the block division structure of the coding unit.
In one example, as shown in fig. 13, the apparatus 1200 further includes: a sample acquiring module 1232 configured to acquire at least one training sample; each training sample comprises a block division structure of a first coding unit and a second coding unit, wherein the second coding unit is a reference coding unit of the first coding unit; a structure processing module 1234, configured to invoke the structure prediction model to process the second coding unit, so as to obtain a block partition prediction structure of the first coding unit; a loss value calculating module 1236, configured to calculate a prediction loss value of the structure prediction model according to the block division prediction structure of the first coding unit and the block division structure of the first coding unit, where the prediction loss value is used to indicate an error between the block division prediction structure of the first coding unit and the block division structure of the first coding unit; and a parameter adjusting module 1238, configured to adjust a parameter of the structure prediction model according to the predicted loss value.
In one example, the unit determining module 1210 is configured to: acquiring a coding unit which meets a target condition in a target video frame, wherein the target video frame is a video frame where the target coding unit is located; and selecting a coding unit adjacent to the target coding unit from the coding units meeting the target condition as the reference coding unit.
In one example, the unit determining module 1210 is configured to: determining the position information of the target coding unit in a target video frame, wherein the target video frame is a video frame where the target coding unit is located; acquiring at least one adjacent video frame of the target video frame; determining a coding unit of the at least one neighboring video frame that satisfies the position information as the reference coding unit.
In one example, the unit determining module 1210 is configured to: acquiring at least one coding unit stored in a cache; determining the obtained at least one coding unit as the reference coding unit.
In one example, the number of reference coding units is a positive integer greater than or equal to 2; as shown in fig. 13, the apparatus 1200 further includes: a rate-distortion cost determining module 1242, configured to determine a rate-distortion cost of each of the reference coding units; a preferred unit determining module 1244, configured to select a preferred coding unit from at least two of the reference coding units according to the rate distortion cost of each of the reference coding units, where a block partition structure of the preferred coding unit is used to predict a block partition structure of the target coding unit.
In one example, the at least one syntax element comprises: a first syntax element for indicating whether the target coding unit is block partitioned in a first partition manner; a second syntax element for indicating whether a second partition manner or a third partition manner is adopted to perform block partition on the target coding unit; a third syntax element for indicating whether the second partitioning manner or the third partitioning manner is adopted for block partitioning of the target coding unit, if the second syntax element indicates that the second partitioning manner or the third partitioning manner is adopted for block partitioning of the target coding unit; a fourth syntax element, configured to indicate a dividing direction of the second dividing manner or the third dividing manner when the target coding unit is block-divided in the second dividing manner or the third dividing manner.
In one example, the first partition comprises a QT partition; and/or the second division mode comprises BT division and/or TT division; and/or the third division mode comprises EQT division.
In summary, according to the technical solution provided in the embodiment of the present application, a block partition structure of a certain coding unit is predicted according to a reference coding unit of the coding unit, and then a prediction result of the block partition structure of the coding unit is added in a selection process of a context model of a syntax element, so that a selection condition of the context model can be increased or a selection condition of the context model can be optimized, so as to improve entropy coding efficiency and reduce the number of bits of a code stream. In addition, because the more accurate probability estimation can be obtained only by adding the prediction result of the block division structure, the embodiment of the application avoids excessive selection conditions of the added context model, and is beneficial to the improvement of the video compression efficiency.
It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Referring to fig. 14, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device may be the device for encoding video sequences introduced above, such as the first device 210 in the communication system shown in fig. 5; or may be the device introduced above for decoding encoded video data to recover a video sequence, such as the second device 220 in the communication system shown in fig. 5. The computer device 140 may include: processor 141, memory 142, communication interface 143, encoder/decoder 144, and bus 145.
The processor 141 includes one or more processing cores, and the processor 141 executes various functional applications and information processing by running software programs and modules.
The memory 142 may be used to store a computer program for execution by the processor 141 to implement the above-described selection method of the context model.
Communication interface 143 may be used to communicate with other devices, such as to receive and transmit audio-visual data.
The encoder/decoder 144 may be used to perform encoding and decoding functions, such as encoding and decoding audio-visual data.
The memory 142 is connected to the processor 141 through a bus 145.
Further, memory 142 may be implemented by any type or combination of volatile or non-volatile storage devices, including, but not limited to: magnetic or optical disk, EEPROM (Electrically Erasable Programmable Read-Only Memory), EPROM (Erasable Programmable Read-Only Memory), SRAM (Static Random-Access Memory), ROM (Read-Only Memory), magnetic Memory, flash Memory, PROM (Programmable Read-Only Memory).
Those skilled in the art will appreciate that the configuration shown in FIG. 14 is not intended to be limiting of the computer device 140 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
In an exemplary embodiment, a computer readable storage medium is also provided, having stored therein at least one instruction, at least one program, code set, or set of instructions which, when executed by a processor of a computer device, implements the above-described method of selecting a context model.
Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM).
In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method for selecting a context model.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.
The above description is only exemplary of the application and should not be taken as limiting the application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the application should be included in the protection scope of the application.

Claims (14)

1. A method for selecting a context model, the method comprising:
determining a reference coding unit of a target coding unit;
predicting the block division structure of the target coding unit according to the reference coding unit to obtain the block division prediction structure of the target coding unit;
determining context models respectively adopted by at least one syntax element related to the block division of the target coding unit based on the block division prediction structure of the target coding unit;
wherein the syntax element is used for indicating a block division structure of the coding unit, and the context model is used for performing probability estimation on the syntax element.
2. The method of claim 1, wherein the determining, based on the prediction structure of the block partition of the target coding unit, the context models respectively adopted by the at least one syntax element involved in the block partition of the target coding unit comprises:
for a target syntax element of the at least one syntax element, determining a predicted value of the target syntax element according to a block partition prediction structure of the target coding unit;
obtaining an initial index increment value of a context model adopted by the target syntax element;
and determining the index increment value of the context model adopted by the target syntax element according to the predicted value of the target syntax element and the initial index increment value of the context model adopted by the target syntax element.
3. The method of claim 1, wherein the determining, based on the prediction structure of the block partition of the target coding unit, the context models respectively adopted by the at least one syntax element involved in the block partition of the target coding unit comprises:
for a target syntax element of the at least one syntax element, determining a predicted value of the target syntax element according to a block partition prediction structure of the target coding unit;
obtaining a determining condition of an index increment value of a context model adopted by the target syntax element;
and determining an index increment value of a context model adopted by the target syntax element according to the predicted value of the target syntax element and the determination condition.
4. The method of claim 1, wherein the predicting the block partition structure of the target coding unit according to the reference coding unit to obtain the block partition prediction structure of the target coding unit comprises:
and calling a structure prediction model to process the reference coding unit to obtain a block division prediction structure of the target coding unit, wherein the structure prediction model is used for predicting the block division structure of the coding unit.
5. The method of claim 4, wherein the structure prediction model is trained as follows:
obtaining at least one training sample; each training sample comprises a block division structure of a first coding unit and a second coding unit, wherein the second coding unit is a reference coding unit of the first coding unit;
calling the structure prediction model to process the second coding unit to obtain a block division prediction structure of the first coding unit;
calculating a prediction loss value of the structure prediction model according to the block division prediction structure of the first coding unit and the block division structure of the first coding unit, wherein the prediction loss value is used for indicating an error between the block division prediction structure of the first coding unit and the block division structure of the first coding unit;
and adjusting parameters of the structure prediction model according to the prediction loss value.
6. The method of claim 1, wherein determining the reference coding unit of the target coding unit comprises:
acquiring a coding unit which meets a target condition in a target video frame, wherein the target video frame is a video frame where the target coding unit is located;
and selecting a coding unit adjacent to the target coding unit from the coding units meeting the target condition as the reference coding unit.
7. The method of claim 1, wherein determining the reference coding unit of the target coding unit comprises:
determining the position information of the target coding unit in a target video frame, wherein the target video frame is the video frame where the target coding unit is located;
acquiring at least one adjacent video frame of the target video frame;
determining a coding unit of the at least one neighboring video frame that satisfies the position information as the reference coding unit.
8. The method of claim 1, wherein determining the reference coding unit of the target coding unit comprises:
acquiring at least one coding unit stored in a cache;
determining the obtained at least one coding unit as the reference coding unit.
9. The method of claim 1, wherein the number of reference coding units is a positive integer greater than or equal to 2; after the determining the reference coding unit of the target coding unit, the method further includes:
determining a rate-distortion cost for each of the reference coding units;
and selecting a preferred coding unit from at least two reference coding units according to the rate distortion cost of each reference coding unit, wherein the block division structure of the preferred coding unit is used for predicting the block division structure of the target coding unit.
10. The method according to any of claims 1 to 9, wherein the at least one syntax element comprises:
a first syntax element for indicating whether the target coding unit is block-divided in a first division manner;
a second syntax element for indicating whether a second partition manner or a third partition manner is adopted to perform block partition on the target coding unit;
a third syntax element for indicating whether the second partitioning manner or the third partitioning manner is adopted for block partitioning of the target coding unit, if the second syntax element indicates that the second partitioning manner or the third partitioning manner is adopted for block partitioning of the target coding unit;
a fourth syntax element, configured to indicate a dividing direction of the second dividing manner or the third dividing manner when the target coding unit is block-divided in the second dividing manner or the third dividing manner.
11. The method of claim 10, wherein the first partition comprises a quadtree, QT, partition; and/or the second division mode comprises binary tree BT and/or ternary tree TT division; and/or the third division mode comprises the expanded quad-tree EQT division.
12. An apparatus for selecting a context model, the apparatus comprising:
a unit determining module for determining a reference coding unit of a target coding unit;
the structure prediction module is used for predicting the block division structure of the target coding unit according to the reference coding unit to obtain the block division prediction structure of the target coding unit;
a model determining module, configured to determine, based on a block partition prediction structure of the target coding unit, context models respectively adopted by at least one syntax element involved in block partition of the target coding unit;
wherein the syntax element is used for indicating a block division structure of the coding unit, and the context model is used for performing probability estimation on the syntax element.
13. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a method of selecting a context model according to any one of claims 1 to 11.
14. A computer-readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of selecting a context model according to any one of claims 1 to 11.
CN202011009881.5A 2020-09-23 2020-09-23 Context model selection method, device, equipment and storage medium Active CN114257810B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011009881.5A CN114257810B (en) 2020-09-23 2020-09-23 Context model selection method, device, equipment and storage medium
PCT/CN2021/118832 WO2022063035A1 (en) 2020-09-23 2021-09-16 Context model selection method and apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011009881.5A CN114257810B (en) 2020-09-23 2020-09-23 Context model selection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114257810A CN114257810A (en) 2022-03-29
CN114257810B true CN114257810B (en) 2023-01-06

Family

ID=80788599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011009881.5A Active CN114257810B (en) 2020-09-23 2020-09-23 Context model selection method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN114257810B (en)
WO (1) WO2022063035A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115442602A (en) * 2022-08-12 2022-12-06 阿里巴巴(中国)有限公司 Partition decision method, decoding and transcoding method, device and medium for coding unit
CN115883835B (en) * 2023-03-03 2023-04-28 腾讯科技(深圳)有限公司 Video coding method, device, equipment and storage medium
CN116170594B (en) * 2023-04-19 2023-07-14 中国科学技术大学 Coding method and device based on rate distortion cost prediction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361920A (en) * 2018-10-31 2019-02-19 南京大学 A kind of interframe quick predict algorithm of the adaptive decision-making tree selection towards more scenes
CN111316642A (en) * 2017-10-27 2020-06-19 华为技术有限公司 Method and apparatus for signaling image coding and decoding partition information
TW202032995A (en) * 2019-01-02 2020-09-01 弗勞恩霍夫爾協會 Encoding and decoding a picture

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7469070B2 (en) * 2004-02-09 2008-12-23 Lsi Corporation Method for selection of contexts for arithmetic coding of reference picture and motion vector residual bitstream syntax elements
AU2012278484B2 (en) * 2011-07-01 2016-05-05 Samsung Electronics Co., Ltd. Method and apparatus for entropy encoding using hierarchical data unit, and method and apparatus for decoding
WO2018174617A1 (en) * 2017-03-22 2018-09-27 한국전자통신연구원 Block form-based prediction method and device
WO2019240493A1 (en) * 2018-06-12 2019-12-19 한국전자통신연구원 Context adaptive binary arithmetic coding method and device
TW202304207A (en) * 2018-07-13 2023-01-16 弗勞恩霍夫爾協會 Partitioned intra coding concept
CN111435993B (en) * 2019-01-14 2022-08-26 华为技术有限公司 Video encoder, video decoder and corresponding methods

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111316642A (en) * 2017-10-27 2020-06-19 华为技术有限公司 Method and apparatus for signaling image coding and decoding partition information
CN109361920A (en) * 2018-10-31 2019-02-19 南京大学 A kind of interframe quick predict algorithm of the adaptive decision-making tree selection towards more scenes
TW202032995A (en) * 2019-01-02 2020-09-01 弗勞恩霍夫爾協會 Encoding and decoding a picture

Also Published As

Publication number Publication date
WO2022063035A1 (en) 2022-03-31
CN114257810A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
JP7483035B2 (en) Video decoding method and video encoding method, apparatus, computer device and computer program thereof
CN112425166B (en) Intra-prediction in video coding using cross-component linear model
CN114257810B (en) Context model selection method, device, equipment and storage medium
CN113766249B (en) Loop filtering method, device, equipment and storage medium in video coding and decoding
CN112533000B (en) Video decoding method and device, computer readable medium and electronic equipment
WO2022116836A1 (en) Video decoding method and apparatus, video coding method and apparatus, and device
CN111316642B (en) Method and apparatus for signaling image coding and decoding partition information
CN111770345B (en) Motion estimation method, device and equipment of coding unit and storage medium
CN113315967B (en) Video encoding method, video encoding device, video encoding medium, and electronic apparatus
CN111741299A (en) Method, device and equipment for selecting intra-frame prediction mode and storage medium
US20230082386A1 (en) Video encoding method and apparatus, video decoding method and apparatus, computer-readable medium, and electronic device
WO2022022299A1 (en) Method, apparatus, and device for constructing motion information list in video coding and decoding
CN111770338B (en) Method, device and equipment for determining index value of coding unit and storage medium
CN114286095B (en) Video decoding method, device and equipment
CN114286096B (en) Video decoding method, device and equipment
JP7483029B2 (en) VIDEO DECODING METHOD, VIDEO ENCODING METHOD, DEVICE, MEDIUM, AND ELECTRONIC APPARATUS
WO2022116854A1 (en) Video decoding method and apparatus, readable medium, electronic device, and program product
US20240064298A1 (en) Loop filtering, video encoding, and video decoding methods and apparatus, storage medium, and electronic device
EP4124036A1 (en) Video coding/decoding method, apparatus, and device
US20230104359A1 (en) Video encoding method and apparatus, video decoding method and apparatus, computer-readable medium, and electronic device
CN114079772B (en) Video decoding method and device, computer readable medium and electronic equipment
WO2022037458A1 (en) Method, apparatus and device for constructing motion information list in video coding and decoding
WO2022037464A1 (en) Video decoding method and apparatus, video coding method and apparatus, device, and storage medium
CN114979656A (en) Video encoding and decoding method and device, computer readable medium and electronic equipment
CN113891076A (en) Method and apparatus for filtering decoded blocks of video data and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40065638

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant