CN117692663A - Binary tree partitioning processing method, equipment and storage medium for coding unit - Google Patents

Binary tree partitioning processing method, equipment and storage medium for coding unit Download PDF

Info

Publication number
CN117692663A
CN117692663A CN202410131178.3A CN202410131178A CN117692663A CN 117692663 A CN117692663 A CN 117692663A CN 202410131178 A CN202410131178 A CN 202410131178A CN 117692663 A CN117692663 A CN 117692663A
Authority
CN
China
Prior art keywords
binary tree
prediction model
partition
network prediction
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410131178.3A
Other languages
Chinese (zh)
Inventor
张宏顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202410131178.3A priority Critical patent/CN117692663A/en
Publication of CN117692663A publication Critical patent/CN117692663A/en
Pending legal-status Critical Current

Links

Abstract

The application relates to a binary tree partitioning processing method, binary tree partitioning processing equipment and a storage medium for a coding unit, and relates to the technical field of video compression. The method comprises the following steps: acquiring a characteristic vector of a coding unit CU; the feature vector is used for representing the image features of the CU and the neighborhood image features of the CU; based on the feature vector, acquiring a first predicted value and a second predicted value; the first predicted value is used for indicating the probability that the binary tree partition type of the CU is horizontal partition; the second predicted value is used for indicating the probability that the binary tree partition type of the CU is vertical partition; at least one partition flow of binary tree horizontal partition and binary tree vertical partition of the CU is skipped based on the first predicted value and the second predicted value. According to the method and the device, unnecessary BT division modes can be skipped in advance, the acceleration processing of CU division is realized on the premise that video quality is not affected, the calculated amount of video coding and decoding is reduced, and then video coding efficiency is improved.

Description

Binary tree partitioning processing method, equipment and storage medium for coding unit
Technical Field
The present invention relates to the field of video compression technologies, and in particular, to a binary tree partitioning method, apparatus, and storage medium for a coding unit.
Background
h.266/VVC is a video coding standard that can provide higher compression efficiency and better video quality.
In the related art, block division may divide a video frame into a plurality of blocks, each of which may be independently encoded and decoded to improve video quality.
However, the h.266/VVC introduces a multi-type tree partitioning mode, which increases the complexity of searching for the optimal block partitioning mode, and may bring a large amount of computation to video encoding and decoding, and seriously affects the video encoding efficiency.
Disclosure of Invention
The application provides a binary tree division processing method, binary tree division processing equipment and a storage medium for an encoding unit, wherein binary tree division can be skipped through prediction, so that video encoding efficiency is improved; the technical scheme comprises the following steps.
According to an aspect of the present application, there is provided a binary tree partitioning processing method for an encoding unit, the method including:
acquiring a characteristic vector of a coding unit CU; the feature vector is used for representing the image features of the CU and the neighborhood image features of the CU;
based on the feature vector, a first predicted value and a second predicted value are obtained; the first predicted value is used for indicating the probability that the binary tree division type of the CU is horizontal division; the second predicted value is used for indicating the probability that the binary tree partition type of the CU is vertical partition;
And skipping at least one partition flow of binary tree horizontal partition and binary tree vertical partition of the CU based on the first predicted value and the second predicted value.
According to an aspect of the present application, there is provided a binary tree partitioning processing method for an encoding unit, the method including:
obtaining a feature vector sample of a CU sample and an optimal binary tree division mode of the CU sample;
inputting the feature vector sample into a network prediction model to obtain a binary tree division mode prediction result of the CU sample, wherein the binary tree division mode prediction result is output by the network prediction model; the binary tree partition mode prediction result comprises a first partition mode prediction result and a second partition mode prediction result; the first partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is horizontally partitioned, and the second partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is vertically partitioned;
acquiring a loss function value of the network prediction model based on an optimal binary tree partitioning mode of the CU sample and a prediction result of the binary tree partitioning mode;
based on the loss function value of the network prediction model, carrying out parameter updating on the network prediction model;
The network prediction model is used for outputting a first prediction value and a second prediction value based on the input characteristic vector of the CU; the first prediction value is used for indicating the probability that the partition type of the CU is horizontal partition; the second predictor is for indicating a probability that a partition type of the CU is a vertical partition.
According to an aspect of the present application, there is provided a binary tree partitioning processing apparatus for an encoding unit, the apparatus comprising:
the first acquisition module is used for acquiring the characteristic vector of the coding unit CU; the feature vector is used for representing the image features of the CU and the neighborhood image features of the CU;
the second obtaining module is used for obtaining a first predicted value and a second predicted value based on the feature vector; the first predicted value is used for indicating the probability that the binary tree division type of the CU is horizontal division; the second predicted value is used for indicating the probability that the binary tree partition type of the CU is vertical partition;
and the execution module is used for skipping at least one partition flow of binary tree horizontal partition and binary tree vertical partition of the CU based on the first predicted value and the second predicted value.
According to an aspect of the present application, there is provided a binary tree partitioning processing apparatus for an encoding unit, the apparatus comprising:
the sample acquisition module is used for acquiring the feature vector samples of the CU samples and the optimal binary tree division mode of the CU samples;
the input/output module is used for inputting the feature vector sample into a network prediction model to obtain a binary tree division mode prediction result of the CU sample, wherein the binary tree division mode prediction result is output by the network prediction model; the binary tree partition mode prediction result comprises a first partition mode prediction result and a second partition mode prediction result; the first partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is horizontally partitioned, and the second partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is vertically partitioned;
the loss acquisition module is used for acquiring a loss function value of the network prediction model based on an optimal binary tree division mode of the CU sample and a binary tree division mode prediction result;
the parameter updating module is used for updating parameters of the network prediction model based on the loss function value of the network prediction model;
The network prediction model is used for outputting a first prediction value and a second prediction value based on the input characteristic vector of the CU; the first prediction value is used for indicating the probability that the partition type of the CU is horizontal partition; the second predictor is for indicating a probability that a partition type of the CU is a vertical partition.
According to another aspect of the present application, there is provided a computer device comprising a processor and a memory, the memory storing at least one instruction, at least one program, a set of codes or a set of instructions, the at least one instruction, the at least one program, the set of codes or the set of instructions being loaded and executed by the processor to implement the binary tree partitioning method for an encoding unit as described in the above aspect.
According to another aspect of the present application, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes or a set of instructions, the at least one instruction, the at least one program, the set of codes or the set of instructions being loaded and executed by a processor to implement the binary tree partitioning method for an encoding unit as described in the above aspect.
According to another aspect of the present application, there is provided a computer program product comprising computer instructions stored in a computer readable storage medium, from which a processor reads and executes the computer instructions to implement the binary tree partitioning method for an encoding unit as described in the above aspect.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
according to the obtained feature vector of the CU, a first predicted value and a second predicted value are obtained so as to skip binary tree division or binary tree horizontal division or binary tree vertical division of the CU; according to the method and the device, unnecessary BT division modes can be skipped in advance, the acceleration processing of CU division is realized on the premise that video quality is not affected, the calculated amount of video coding and decoding is reduced, and then video coding efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a basic flow chart of a video encoding process illustratively shown herein;
FIG. 2 is a schematic view of CU partitioning according to the present application;
FIG. 3 is a schematic diagram of an inter prediction mode provided by one embodiment of the present application;
FIG. 4 is a schematic diagram of candidate motion vectors provided by one embodiment of the present application;
FIG. 5 is a schematic diagram of an intra block copy mode provided by one embodiment of the present application;
FIG. 6 is a schematic diagram of an intra-string copy mode provided by one embodiment of the present application;
FIG. 7 is a simplified block diagram of a communication system provided in one embodiment of the present application;
FIG. 8 is a schematic diagram of the placement of video encoders and video decoders in a streaming environment as exemplarily shown herein;
FIG. 9 is a flowchart of a binary tree partitioning process for coding units provided in one exemplary embodiment of the present application;
FIG. 10 is a schematic diagram of the current CU versus neighboring block location related to the present application;
FIG. 11 is a flowchart of a binary tree partitioning method for coding units provided in another exemplary embodiment of the present application;
FIG. 12 is a flowchart of a binary tree partitioning processing method for an encoding unit provided in yet another exemplary embodiment of the present application;
FIG. 13 is a flowchart of a binary tree partitioning processing method for an encoding unit provided in yet another exemplary embodiment of the present application;
FIG. 14 is a flowchart of a method of training a network prediction model provided in one exemplary embodiment of the present application;
FIG. 15 is a schematic diagram of a fully connected network architecture according to the present application;
FIG. 16 is a flowchart of a method of training a network prediction model provided in another exemplary embodiment of the present application;
FIG. 17 is an exemplary training and application flow diagram of a network prediction model in accordance with the present application;
FIG. 18 is an exemplary training and application flow diagram of a first network prediction model and a second network prediction model according to the present application;
FIG. 19 is a schematic view of various CU partition types referred to herein;
FIG. 20 is a diagram illustrating an exemplary CU partitioning sequence provided by one exemplary embodiment of the present application;
FIG. 21 is a flowchart of an implementation of a binary tree partitioning method for coding units provided in an exemplary embodiment of the present application;
FIG. 22 is a network training flow diagram provided by an exemplary embodiment of the present application;
FIG. 23 is a schematic diagram of reference relationships corresponding to different frame types referred to herein;
fig. 24 is a reference relationship diagram of the GOP16 related to the present application;
FIG. 25 is an application flow diagram provided by an exemplary embodiment of the present application;
FIG. 26 is a block diagram of a binary tree partitioning processing device for an encoding unit, as shown in an exemplary embodiment of the present application;
FIG. 27 is a block diagram of a binary tree partitioning processing device for an encoding unit, as shown in another exemplary embodiment of the present application;
fig. 28 is a block diagram of a computer device according to an exemplary embodiment of the present application.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
In the embodiment of the present application, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related country and region. For example, the object behaviors such as attack operations referred to in the present application are all acquired under the condition of sufficient authorization.
It will be understood that, although the terms female, second, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a parent parameter may also be referred to as a second parameter, and similarly, a second parameter may also be referred to as a parent parameter without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
Some definitions of terms referred to in this application are presented below.
Second generation video coding standard h.264: also known as the advanced video coding (Advanced Video Coding, AVC) standard.
Third generation video coding standard h.265: also known as the high efficiency video coding (High Efficiency Video Coding, HEVC) standard.
Fourth generation video coding international standard h.266: also known as the universal video coding (Versatile Video Coding, VVC) standard. Although the coding algorithm in H.266/VVC has no fundamental improvement measure, the technical means of VVC is basically similar to the video coding standards of the previous generations, and still is in a mixed coding framework based on blocks, the H.266/VVC almost improves each link of coding, and extrudes the information redundancy which is not completely removed, so that the requirement of doubling the overall coding efficiency is met. The h.266/VVC standard is oriented towards a variety of applications, such as High definition, ultra High definition video (Ultra High Definition Video, UHDV) with 3840 x 2160 or 7620 x 4320 image resolution, 10 bit precision, high-Dynamic Range (HDR) and wide color gamut; such as immersion Media (immerse Media), 360 ° panoramic video projected using a common projection format (Omnidirectional Video), and the like. Furthermore, H.266/VVC also supports the applications set in the previous standards.
Advanced motion vector prediction (Advanced Motion Vector Prediction, AMVP) mode: is a new Motion Vector (MV) prediction technique proposed in h.265/HEVC. The H.266/VVC still adopts the technology and is improved on the basis of H.265/HEVC. AMVP utilizes the relativity of motion vectors in the space domain and the time domain to establish a candidate prediction MV list for the current PU, and a coding end selects optimal motion vector predictions (Motion Vector Prediction, MVP) from the candidate prediction MV list and carries out differential coding on the MVP; the decoding end can calculate the MVs of the current PU by establishing the same list and only predicting the index and motion vector difference (Motion Vector Difference, MVD) of the MVP in the list.
Before describing the embodiments of the present application, a brief description of video encoding techniques is first provided in connection with fig. 1. Fig. 1 illustrates a basic flow chart of a video encoding process.
A video signal refers to an image sequence comprising a plurality of frames. A frame (frame) is a representation of the spatial information of a video signal. Taking YUV mode as an example, a frame includes one luminance sample matrix (Y) and two chrominance sample matrices (Cb and Cr). From the viewpoint of the video signal acquisition mode, two modes, i.e., capturing by a camera and generating by a computer, can be classified. The corresponding compression coding modes may also differ due to the difference in statistical properties.
In some mainstream video coding technologies, such as h.265/HEVC (High Efficient Video Coding, high efficiency video compression coding), h.266/VVC (Versatile Video Coding, general video coding) standard, AVS (Audio Video coding Standard ), such as AVS3, a hybrid coding framework is used to perform the following series of operations and processes on an input original video signal.
1. Block division structure (Block Partition Structure): the input image is divided into several non-overlapping processing units, each of which will perform a similar compression operation. This processing Unit is called CTU (Coding Tree Unit), or LCU (Large Coding Unit, maximum Coding Unit). The CTU may continue to divide more finely, further down, to obtain one or more basic coded units, called CU (Coding Unit). Each CU is the most basic element in an encoding pass.
CTUs may be divided down into different CUs in a quadtree manner. For example, as shown in fig. 2, a CU partitioning diagram is shown, which is related to the present application. As shown in fig. 2, CTUs in VVC are first divided into different CUs according to a quadtree, and then CUs of sub-nodes of the quadtree may be divided according to a multi-type tree, including four division types: vertical binary tree partitioning (split_bt_ver), horizontal binary tree partitioning (split_bt_hor), vertical trigeminal tree partitioning (split_tt_ver), horizontal trigeminal tree partitioning (split_tt_hor), wherein the trigeminal tree is partitioned 1:2:1. The leaf nodes of the multi-type tree are also called CUs.
Each CU block contains intra prediction and inter prediction. Comparing different prediction modes in the same prediction type to find an optimal segmentation mode, and comparing inter-frame modes in a frame to find an optimal prediction mode under the current CU; and meanwhile, transforming the CU by a Transform Unit (TU), and finding the optimal transformation type. Finally, a frame of image is divided into CUs.
Among these, there are DC Mode, planar Mode, and 65 angle prediction modes, and also Intra Sub-region division (ISP), inter-component linear model prediction (Cross-Component Linear Model Prediction, CCLM), most probable Mode for luminance component (Most Probable Mode, MPM), luminance Derived Mode for chrominance component (DM), multi-reference line Intra prediction (Multiple Reference Line Intra Prediction, MRLP), and the like.
Wherein, the inter-frame prediction is based on H.265, and a combined intra-frame inter-frame prediction technology (Combined Inter and Intra Prediction, CIIP) and a geometric division inter-frame prediction technology (Geometric Partition Mode, GPM) are introduced; on the basis of the bidirectional AMVP mode, a symmetrical motion vector differential coding (Symmetric MVD Coding, SMVD) technology, decoding side motion vector refinement (DecoderSide Motion Vector Refinement, DMVR), bidirectional optical flow (Bi-Directional Optical Flow, BDOF), affine transformation and the like are added.
2. Predictive coding (Predictive Coding): the method comprises modes of intra-frame prediction, inter-frame prediction and the like, and the original video signal is predicted by the selected reconstructed video signal to obtain a residual video signal. The encoding side needs to decide one of the most suitable prediction coding modes among many possible prediction coding modes for the current CU and inform the decoding side. Intra prediction refers to the fact that the predicted signal comes from a region that has been encoded and reconstructed within the same image. Inter prediction refers to a predicted signal from an already encoded other picture (referred to as a reference picture) than the current picture.
3. Transform and Quantization (Transform & Quantization): the residual video signal is subjected to transformation operations such as DFT (Discrete Fourier Transform ), DCT (Discrete Cosine Transform, discrete cosine transform), etc., and the signal is converted into a transform domain, referred to as transform coefficients. And (3) carrying out lossy quantization operation on the signals in the transform domain, and losing certain information, so that the quantized signals are favorable for compression expression. In some video coding standards, there may be more than one transform mode to choose, so the coding end also needs to choose one of the transforms for the current CU and inform the decoding end. The degree of refinement of quantization is typically determined by quantization parameters. The QP (Quantization Parameter ) has a larger value, and the coefficients representing a larger range of values will be quantized to the same output, thus usually bringing more distortion and lower code rate; conversely, a smaller QP value will represent a smaller range of coefficients to be quantized to the same output, and therefore will typically result in less distortion, while corresponding to a higher code rate.
4. Entropy Coding (Entropy Coding) or statistical Coding: the quantized transform domain signal is subjected to statistical compression coding according to the occurrence frequency of each value, and finally a binary (0 or 1) compressed code stream is output. Meanwhile, encoding generates other information, such as a selected mode, a motion vector, etc., and entropy encoding is also required to reduce a code rate. The statistical coding is a lossless coding mode, and can effectively reduce the code rate required for expressing the same signal. Common statistical coding methods are variable length coding (Variable Length Coding, VLC for short) or context-based binary arithmetic coding (Content Adaptive Binary Arithmetic Coding, CABAC for short).
5. Loop Filtering (Loop Filtering): the encoded image is subjected to inverse quantization, inverse transformation and predictive compensation (the inverse operation of 2-4 above), so as to obtain a reconstructed decoded image. The reconstructed image is different from the original image in part of information due to the quantization effect compared to the original image, resulting in distortion. The degree of distortion generated by quantization can be effectively reduced by performing a filtering operation on the reconstructed image, such as deblocking filtering (deblocking), SAO (Sample Adaptive Offset ), ALF (Adaptive Lattice Filter, adaptive lattice filter), or the like. Since these filtered reconstructed images will be used as references for subsequent encoded images for predicting future signals, the above-described filtering operations are also referred to as loop filtering, as well as filtering operations within the encoding loop.
That is, in encoding, a frame of image is fed to an encoder, divided into CTUs, and depth-divided to obtain CUs, each of which includes a plurality of prediction modes and TUs. And predicting each CU to obtain a predicted value, subtracting the predicted value from input data to obtain a residual error, then carrying out transformation and quantization to obtain a residual error coefficient, then sending the residual error coefficient into an entropy coding module to output a code stream, simultaneously carrying out inverse quantization and inverse transformation on the residual error coefficient to obtain a residual error value of a reconstructed image, adding the residual error value and the predicted value to obtain a reconstructed image, filtering the reconstructed image, and then entering a reference frame queue to be used as a reference image of the next frame, thereby sequentially encoding backwards.
According to the above encoding process, at the decoding end, for each CU, the decoder obtains a compressed bitstream, and then performs entropy decoding to obtain various mode information and quantized transform coefficients. Each coefficient is subjected to inverse quantization and inverse transformation to obtain a residual signal. On the other hand, according to the known coding mode information, a prediction signal corresponding to the CU can be obtained, and after the prediction signal and the prediction signal are added, a reconstructed signal can be obtained. Finally, the reconstructed values of the decoded image require a loop filtering operation to produce the final output signal.
Some mainstream video coding standards, such as HEVC, VVC, AVS3, use a block-based hybrid coding framework. The method divides the original video data into a series of coding blocks, and combines video coding methods such as prediction, transformation, entropy coding and the like to realize the compression of the video data. Among them, motion compensation is a type of prediction method commonly used for video coding, and motion compensation derives a prediction value of a current coding block from a coded region based on redundancy characteristics of video content in a time domain or a space domain. Such prediction methods include: inter prediction, intra block copy prediction, intra string copy prediction, etc., these prediction methods may be used alone or in combination in a particular coding implementation. For coded blocks using these prediction methods, it is often necessary to explicitly or implicitly encode one or more two-dimensional displacement vectors in the code stream, indicating the displacement of the current block (or co-located blocks of the current block) relative to its one or more reference blocks.
It should be noted that, in different prediction modes and different implementations, the displacement vectors may have different names, and this application is described in the following manner: 1) The displacement Vector in the inter prediction mode is called a Motion Vector (MV); 2) The displacement Vector in the IBC (Intra Block Copy) prediction mode is called a Block Vector (BV); 3) The displacement Vector in the ISC (Intra String Copy, intra-frame String copy) prediction mode is called a String Vector (SV). Intra-string replication is also known as "string prediction" or "string matching", etc.
MV refers to a displacement vector for an inter prediction mode, which points from a current picture to a reference picture, with a value of a coordinate offset between the current block and the reference block, where the current block and the reference block are in two different pictures. In the inter prediction mode, motion vector prediction can be introduced, and a predicted motion vector corresponding to the current block is obtained by predicting the motion vector of the current block, and the difference value between the predicted motion vector corresponding to the current block and the actual motion vector is coded and transmitted, so that compared with the direct coding and transmitting of the actual motion vector corresponding to the current block, the bit cost is saved. In the embodiment of the present application, the predicted motion vector refers to a predicted value of a motion vector of a current block obtained by a motion vector prediction technique.
BV refers to a displacement vector for IBC prediction mode, which has a value of a coordinate offset between a current block and a reference block, wherein the current block and the reference block are both in a current picture. In the IBC prediction mode, block vector prediction may be introduced, and a block vector of the current block is predicted to obtain a predicted block vector corresponding to the current block, and a difference value between the predicted block vector corresponding to the current block and an actual block vector is encoded and transmitted, which is advantageous for saving bit overhead compared with directly encoding and transmitting the actual block vector corresponding to the current block. In the embodiment of the present application, the predicted block vector refers to a predicted value of a block vector of a current block obtained by a block vector prediction technique.
SV refers to a displacement vector for the ISC prediction mode, which has a value of a coordinate offset between a current string and a reference string, both of which are in a current image. In the ISC prediction mode, string vector prediction can be introduced, a predicted string vector corresponding to the current string is obtained by predicting the string vector of the current string, and the difference value between the predicted string vector corresponding to the current string and the actual string vector is coded and transmitted, so that compared with the direct coding and transmitting of the actual string vector corresponding to the current string, the bit cost is saved. In the embodiment of the present application, the predicted string vector refers to a predicted value of a string vector of the current string obtained by a string vector prediction technique.
Several different prediction modes are described below.
1. Inter prediction mode
As shown in fig. 3, inter prediction predicts pixels of a current image using pixels adjacent to an encoded image using correlation of video time domains, so as to achieve the purpose of effectively removing video time domain redundancy, and effectively saving bits of encoded residual data. Wherein P is the current frame, pr is the reference frame, B is the current block to be encoded, and Br is the reference block of B. The coordinate position of B 'and B in the image is the same, br coordinates are (xr, yr), and B' coordinates are (x, y). The displacement between the current block to be coded and its reference block, called Motion Vector (MV), is a vector that marks the positional relationship between the current block and the reference block when inter prediction is performed, namely:
Considering that the temporal or spatial neighboring blocks have a strong correlation, MV prediction techniques can be used to further reduce the bits required to encode MVs. In h.265/HEVC, inter prediction includes two MV prediction techniques, merge and AMVP (advanced motion vector prediction).
The Merge mode creates an MV candidate list for the current PU (Prediction Unit), where there are 5 candidate MVs (and their corresponding reference pictures). Traversing the 5 candidate MVs, and selecting the MVs with the minimum rate distortion cost as the optimal MVs. If the codec builds the candidate list in the same way, the encoder only needs to transmit the index of the optimal MV in the candidate list. It should be noted that the MV prediction technique of HEVC is also a skip mode, which is a special case of the Merge mode. After the Merge mode finds the optimal MV, if the current block is basically the same as the reference block, the residual data need not be transmitted, and only the index of the MV and a skip flag need to be transmitted.
Wherein the rate distortion cost (Rate Distortion Cost, rdcost) is used for preference among a plurality of options, calculated by the following formula:
where dist represents distortion, the difference between the original input pixel and the predicted pixel is recorded, bit represents the number of bits that need to be consumed to encode the current mode, and λ is the lagrangian constant. dist may be obtained by summing absolute variation differences (Sum of Absolute Transformed Difference, SATD), sum of squares error (Sum of Square Error, SSE), etc. SATD means that the sum of absolute values is calculated after hadamard transformation, and is one way to calculate distortion, namely, the sum of absolute values of all elements is calculated after hadamard transformation is carried out on residual signals. SSE represents the sum of squares of the errors of the original pixel and the reconstructed pixel, the residual signal is required to be transformed, quantized, inversely quantized and inversely transformed, and the estimated code is identical to the true code.
The MV candidate list established by the Merge mode includes two cases of space domain and time domain, and also includes a mode of combining the list for B Slice (B frame image). Wherein the spatial domain provides at most 4 candidate MVs, the establishment of which is shown in part (a) of fig. 4. The airspace list is established according to the sequence of A1-B0-A0-B2, wherein B2 is an alternative, namely when one or more of A1, B1, B0 and A0 are not existed, motion information of B2 is needed to be used; the time domain provides at most 1 candidate MV, and its establishment is shown in part (b) of fig. 4, and is obtained by scaling MVs of co-located PUs according to the following formula:
wherein curMV represents the MV of the current PU, colMV represents the MV of the co-located PU, td represents the distance between the current picture and the reference picture, and tb represents the distance between the co-located picture and the reference picture. If the PU at the D0 position on the homonymous block is not available, the homonymous PU at the D1 position is replaced. For PUs in B Slice, since there are two MVs, its MV candidate list also needs to provide two MVPs (Motion Vector Predictor, predictive motion vectors), which can represent the initial positions of MVs derived from neighboring blocks. HEVC generates a combined list for B Slice by combining the first 4 candidate MVs in the MV candidate list two by two.
Similarly, AMVP mode utilizes MV correlation of spatial and temporal neighboring blocks to build a MV candidate list for the current PU. Different from the Merge mode, selecting an optimal prediction MV from an MV candidate list of the AMVP mode, and performing differential coding on the optimal prediction MV and an optimal MV obtained by motion searching of a current block to be coded, namely coding MVD=MV-MVP, wherein MVD is a motion vector residual error (Motion Vector Difference); the decoding end can calculate the MVs of the current decoding block by establishing the same list and only needing the serial numbers of the MVDs and the MVPs in the list. The MV candidate list in AMVP mode also contains both spatial and temporal cases, except that the MV candidate list in AMVP mode has a length of only 2.
As described above, in AMVP mode of HEVC, MVDs need to be encoded. In HEVC, the resolution of the MVD is controlled by use_integer_mv_flag in slice_header, and when the value of the flag is 0, the MVD is encoded at 1/4 (luminance) pixel resolution; when the flag has a value of 1, the mvd is encoded with full (luminance) pixel resolution. A method of adaptive motion vector accuracy (Adaptive Motion Vector Resolution, AMVR) is used in VVC. This method allows each CU to adaptively select the resolution of the coded MV. In the normal AMVP mode, the selectable resolutions include 1/4,1/2,1, and 4 pixel resolutions. For a CU with at least one non-zero MVD component, a flag is first encoded indicating whether quarter-luma sample MVD precision is used for the CU. If the flag is 0, the MVD of the current CU is encoded with 1/4 pixel resolution. Otherwise, a second flag needs to be encoded to indicate that the CU uses 1/2 pixel resolution or other MVD resolution. Otherwise, a third flag is encoded to indicate whether 1 pixel resolution or 4 pixel resolution is used for the CU.
2. IBC prediction mode
IBC is an intra-frame coding tool adopted in HEVC screen content coding (Screen Content Coding, SCC for short) extension, which significantly improves the coding efficiency of screen content. In AVS3 and VVC, IBC technology is also adopted to improve the performance of screen content encoding. The IBC predicts the pixels of the current block to be coded by using the pixels of the coded image on the current image by utilizing the spatial correlation of the screen content video, and can effectively save the bits required by coding the pixels. As shown in fig. 5, the displacement between the current block and its reference block in IBC is called BV (block vector). The h.266/VVC employs BV prediction techniques similar to inter prediction to further save bits required to encode BVs. VVC predicts BV using AMVP mode similar to that in inter prediction and allows BVD to be encoded using 1 or 4 pixel resolution.
3. ISC prediction mode
The ISC technique divides a coded block into a series of pixel strings or unmatched pixels in some scanning order (e.g., raster scan, round-trip scan, zig-Zag scan, etc.). Similar to IBC, each string searches for a reference string of the same shape in the encoded region of the current image, derives a predicted value of the current string, and by encoding a residual between the pixel value of the current string and the predicted value, bits can be effectively saved by encoding the pixel value instead of directly encoding the pixel value. Fig. 6 gives a schematic representation of intra-frame string replication, fig. 6 shows the encoded region, string 1 (28 pixels), string 2 (35 pixels), and unmatched pixels (1 pixel). The displacement between string 1 and its reference string is the string vector 1 in fig. 6; the displacement between string 2 and its reference string is the string vector 2 in fig. 6.
The intra-frame string copy technique requires coding of SV, string length, and a flag of whether there is a matching string or not for each string in the current coding block. Where SV represents the displacement of the string to be encoded to its reference string. The string length represents the number of pixels contained in the string. In different implementations, there are several ways of encoding the string length, several examples (some examples may be used in combination) are given below:
1) Directly encoding the length of the string in the code stream;
2) Encoding and processing the subsequent pixel number to be processed of the string in the code stream, and calculating the length of the current string by a decoding end according to the size N of the current block and the processed pixel number N1 and the decoded pixel number N2 to obtain the length L=N-N1-N2;
3) A flag is encoded in the code stream to indicate whether the string is the last string, and if so, the length l=n-N1 of the current string is calculated from the number of processed pixels N1 according to the size N of the current block. If a pixel does not find a corresponding reference in the referenceable region, the pixel values of the unmatched pixels will be encoded directly.
As shown in fig. 7, a simplified block diagram of a communication system provided in one embodiment of the present application is shown. Communication system 200 includes a plurality of devices that may communicate with each other through, for example, network 250. For example, the communication system 200 includes a first device 210 and a second device 220 interconnected by a network 250. In the embodiment of fig. 7, the first device 210 and the second device 220 perform unidirectional data transmission. For example, the first apparatus 210 may encode video data, such as a stream of video pictures acquired by the first apparatus 210, for transmission to the second apparatus 220 over the network 250. The encoded video data is transmitted in one or more encoded video code streams. The second device 220 may receive the encoded video data from the network 250, decode the encoded video data to recover the video data, and display the video pictures according to the recovered video data. Unidirectional data transmission is common in applications such as media services.
In another embodiment, the communication system 200 includes a third device 230 and a fourth device 240 that perform bi-directional transmission of encoded video data, which may occur, for example, during a video conference. For bi-directional data transmission, each of the third device 230 and the fourth device 240 may encode video data (e.g., a stream of video pictures collected by the device) for transmission over the network 250 to the other of the third device 230 and the fourth device 240. Each of the third device 230 and the fourth device 240 may also receive encoded video data transmitted by the other of the third device 230 and the fourth device 240, and may decode the encoded video data to recover the video data, and may display video pictures on an accessible display device according to the recovered video data.
In the embodiment of fig. 7, the first device 210, the second device 220, the third device 230, and the fourth device 240 may be computer devices such as a server, a personal computer, and a smart phone, but the principles disclosed herein may not be limited thereto. The embodiments of the present application are applicable to PCs (Personal Computer, personal computers), cell phones, tablet computers, media players and/or dedicated video conferencing equipment. Network 250 represents any number of networks that transfer encoded video data between first device 210, second device 220, third device 230, and fourth device 240, including, for example, wired and/or wireless communication networks. Communication network 250 may exchange data in circuit-switched and/or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, and/or the internet. For the purposes of this application, the architecture and topology of network 250 may be irrelevant to the operation disclosed herein, unless explained below.
As an example, fig. 8 shows the placement of video encoders and video decoders in a streaming environment. The subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV (television), storing compressed video on digital media including CD (Compact Disc), DVD (Digital Versatile Disc ), memory sticks, and the like.
The streaming system may include an acquisition subsystem 313, which may include a video source 301, such as a digital camera, that creates an uncompressed video picture stream 302. In an embodiment, the video picture stream 302 includes samples taken by a digital camera. The video tile stream 302 is depicted as a bold line compared to the encoded video data 304 (or encoded video code stream) to emphasize a high data volume video tile stream, the video tile stream 302 may be processed by an electronic device 320, the electronic device 320 comprising a video encoder 303 coupled to a video source 301. The video encoder 303 may include hardware, software, or a combination of hardware and software to implement or implement aspects of the disclosed subject matter as described in more detail below. The encoded video data 304 (or encoded video stream 304) is depicted as a thin line compared to the video picture stream 302 to emphasize a lower amount of encoded video data 304 (or encoded video stream 304), which may be stored on the streaming server 305 for future use. One or more streaming client subsystems, such as client subsystem 306 and client subsystem 308 in fig. 8, may access streaming server 305 to retrieve copies 307 and 309 of encoded video data 304. Client subsystem 306 may include, for example, video decoder 310 in electronic device 330. The video decoder 310 decodes the incoming copy 307 of the encoded video data and generates an output video picture stream 311 that can be presented on a display 312 (e.g., a display screen) or another presentation device (not depicted). In some streaming systems, the encoded video data 304, 307, and 309 (e.g., video streams) may be encoded according to some video encoding/compression standard.
It should be noted that electronic device 320 and electronic device 330 may include other components (not shown). For example, electronic device 320 may include a video decoder (not shown), and electronic device 330 may also include a video encoder (not shown). Wherein the video decoder is configured to decode the received encoded video data; video encoders are used to encode video data.
h.266/VVC is a video coding standard intended to provide higher compression efficiency and better video quality. In H.266/VVC, block partitioning is a key technique for partitioning video frames into smaller blocks for encoding. Block partitioning refers to the division of a video frame into blocks, each of which can be encoded and decoded independently. This division may be adaptive according to the characteristics of the image content to improve the encoding efficiency. h.266/VVC introduces more block size and shape options to accommodate different types of image content.
The block division in h.266/VVC can be flexibly adjusted according to the spatial and temporal characteristics of the image. For example, for static or relatively smooth areas, larger blocks may be used for higher compression efficiency. While smaller blocks may provide better image quality for dynamic or detail rich regions.
Compared with the traditional H.265/HEVC and H.264/AVC technologies, the advantages of H.266/VVC are more obvious. The compression performance is higher, and more than 40% of bandwidth can be saved under the same quality; however, the encoding protocol is too complex and requires higher computational resources to encode and decode. This can pose challenges for some low power devices or resource constrained environments.
After all possible partition modes are traversed, the CU partition needs to select an optimal partition mode based on the minimum rdcost, and the rest of the partition modes are eliminated, so that the encoding operation amount is greatly increased. However, if the possible division pattern of the direct shut down portion is not attempted, the compression performance is poor, so it is important to design a method to eliminate unnecessary division of the portion in advance.
Referring to fig. 9, a flowchart of a binary tree partitioning method for an encoding unit according to an exemplary embodiment of the present application is shown. The method is performed by a computer device, as shown in fig. 9, and may include steps 910, 920, and 930.
Step 910: acquiring a characteristic vector of a coding unit CU; the feature vector is used to represent the image features of the CU and the neighborhood image features of the CU.
In an embodiment of the application, a computer device obtains a plurality of feature vectors capable of representing image features of a CU, as well as neighborhood image features.
The CU partitioning method has correlation with image features (such as texture features, gradient information, sub-block variance, etc.) of the CU. Specifically, for example, in a region having a relatively uniform texture, a large-sized CU is generally used for the division, and in a region having a relatively rich texture, a smaller-sized CU is generally used for the division. For another example, the horizontal and vertical partitions have a correlation with the texture direction, with horizontal partitions being generally used in areas where the texture is more horizontal and vertical partitions being generally used in directions where the texture is more vertical.
The CU partitioning method also has correlation with the neighboring image features of the CU (i.e., texture features of neighboring blocks, partitioning method, etc.). Specifically, for example, when the neighboring block is divided into large sizes, the current CU also generally adopts a large size as an optimal size; when the neighboring blocks are divided horizontally or vertically, the current CU is also generally divided in the same manner.
Therefore, the present application can comprehensively utilize the above correlation to skip some unnecessary CU partitioning modes in advance.
Step 920: based on the feature vector, acquiring a first predicted value and a second predicted value; the first predicted value is used for indicating the probability that the binary tree partition type of the CU is horizontal partition; the second predictor is used to indicate a probability that the binary tree partition type of the CU is a vertical partition.
In the embodiment of the present application, the computer device obtains, according to the feature vector obtained in step 910, a first prediction value for indicating a probability that the CU performs binary tree horizontal division, and a second prediction value for indicating a probability that the CU performs binary tree vertical division.
Step 930: at least one partition flow of binary tree horizontal partition and binary tree vertical partition of the CU is skipped based on the first predicted value and the second predicted value.
In this embodiment of the present application, the computer device may determine according to the first predicted value and the second predicted value obtained in step 920, and if the probability of selecting the binary tree for horizontal division is high and the probability of selecting the binary tree for vertical division is low, may skip the binary tree for vertical division in combination with the threshold value; conversely, if the selected binary tree vertical partition probability is high and the selected binary tree horizontal partition probability is low, the binary tree horizontal partition may be skipped in combination with the threshold.
In some embodiments, the binary tree partitioning process may be followed by other partitioning processes, such as a trigeminal tree partitioning process, a quadtree partitioning process, and the like.
Illustratively, if step 930 skips binary tree horizontal partitioning and binary tree vertical partitioning of the CU, and there are other partitioning flows after the binary tree partitioning flow, then the computer device continues to perform other partitioning of the CU.
If step 930 skips the binary tree horizontal partition and the binary tree vertical partition of the CU, and no other partition flow exists after the binary tree partition flow, the computer device does not need to perform processing in other partition modes on the CU.
For example, if step 930 skips only one of the binary tree horizontal partitioning and the binary tree vertical partitioning of the CU, and there are other partitioning flows after the binary tree partitioning flow, the computer device first performs processing of the binary tree horizontal partitioning and the binary tree vertical partitioning on the CU in a non-skipped partitioning manner, and then performs processing of other partitioning manners.
Step 930 skips only one of the two-tree horizontal partitioning and the two-tree vertical partitioning of the CU, and no other partitioning flows exist after the two-tree partitioning flow, so that the computer device performs processing of the two-tree horizontal partitioning and the two-tree vertical partitioning of the CU in a non-skipped partitioning manner.
Specifically, for example, there is a trigeminal tree partitioning procedure after the binary tree partitioning procedure, and after only skipping the binary tree horizontal partitioning of the CU in step 930, the computer device sequentially performs the processing of the binary tree vertical partitioning procedure and the trigeminal tree partitioning procedure on the CU.
In summary, according to the scheme shown in the embodiment of the present application, according to the obtained feature vector of the CU, the first predicted value and the second predicted value are obtained, so as to skip binary tree division or binary tree horizontal division or binary tree vertical division of the CU; according to the method and the device, unnecessary BT division modes can be skipped in advance, the acceleration processing of CU division is realized on the premise that video quality is not affected, the calculated amount of video coding and decoding is reduced, and then video coding efficiency is improved.
In some embodiments, the feature vector includes:
information of neighboring blocks of the CU, and information of the CU;
the information of the neighboring blocks includes at least one of: CU depth of the neighboring block, QT depth of the neighboring block, BT depth of the neighboring block, MT depth of the neighboring block, and optimal partition type of the neighboring block; the neighboring block is an encoded block that is neighboring the CU and has been encoded;
the information of the CU includes at least one of: the depth of a CU, the frame level of a CU, the quantization coefficients of the coding unit CU, the depth of the co-located blocks of a CU, gradient information of a CU, variance information of a CU, and shape information of a CU.
Wherein the feature vector is composed of a plurality of feature values. The information of a CU may be divided into two parts, including known information of the CU and pre-analysis information of the CU.
Specifically, for example, the above feature values include the following 3 aspects.
(1) Optimal information of neighboring blocks: including CU depth, BT depth, QT depth, TT depth, MT depth. Since the adjacent blocks are already encoded, the optimal information can be stored in advance, and no calculation or judgment is needed.
Please refer to fig. 10, which illustrates a schematic diagram of the current CU and neighboring block location relationship related to the present application.
As shown in fig. 10, the neighboring blocks include A0, A1, B0, B1, and B2, where A0 represents a lower left corner position of the current block, A1 represents a left corner position of the current CU, B0 represents an upper right corner position of the current CU, B1 represents an upper edge position of the current CU, and B2 represents an upper left corner position of the current CU.
The developer can set in the computer device, so that the computer device can adaptively adjust the selected positions of the adjacent blocks according to specific situations, so as to cope with the situation that the position relation shown in fig. 10 is not satisfied.
If the current CU is in a special position, such as at the bottom right corner of a frame of image, the neighboring blocks of B0 and A0 may be skipped, or other neighboring blocks may be used instead, such as a neighboring block between A1 and B2 (not shown in the figure), a neighboring block between B1 and B2 (not shown in the figure), or the like.
(2) Known information of CU: including the depth at which the current CU is located, BT depth, QT depth, TT depth, MT depth, quantization coefficient QP of the current CU, frame type weight of the current CU, etc., and no further calculation is required.
(3) Pre-analysis information of CU: for example, gradient information of the current CU is divided into 4 sub-blocks to count gradient information of each sub-block, the gradient information is divided into 4 sub-blocks to count distortion of each sub-block after NONE is completed, and secondary processing is performed on a calculation result.
The embodiment of the application provides specific content related to the feature vector, including information of adjacent blocks of the CU and information of the CU, wherein the information of the adjacent blocks of the CU is used for representing neighborhood image features of the CU, and the information of the CU is used for representing image features of the CU. The information of the adjacent blocks of the CU and the information of the CU have correlation with the CU dividing mode, and unnecessary CU dividing modes can be skipped in advance by comprehensively utilizing the correlation, so that the acceleration processing of CU dividing is realized.
Referring to fig. 11, a flowchart of a binary tree partitioning method for an encoding unit according to another exemplary embodiment of the present application is shown. The method is performed by a computer device, as shown in fig. 11, and step 930 in the embodiment shown in fig. 9 described above may be implemented as step 930a and step 930b.
Step 930a: in response to the first predictor being greater than a first threshold and the second predictor being less than a second threshold, skipping binary tree vertical partitioning of the CU; the first threshold is greater than the second threshold.
The first threshold and the second threshold may be preset in the computer device by the developer according to the test result, where the first threshold is greater than the second threshold.
In this embodiment of the present application, when the first predicted value obtained in step 920 is greater than a preset first threshold value, and the second predicted value obtained in step 920 is less than a preset second threshold value, the computer device skips the binary tree vertical partition procedure for the CU.
Step 930b: in response to the second predictor being greater than the first threshold and the first predictor being less than the second threshold, binary tree horizontal partitioning of the CU is skipped.
In this embodiment of the present application, when the second predicted value obtained in step 920 is greater than a preset first threshold, and the first predicted value obtained in step 920 is less than a preset second threshold, the computer device skips the binary tree horizontal partitioning procedure for the CU.
Wherein the first predictor is associated with a horizontal partition of the binary tree and the second predictor is associated with a vertical partition of the binary tree. If the first predicted value is larger and the second predicted value is smaller, the probability of performing binary tree horizontal division on the current CU is higher, and at the moment, the binary tree vertical division can be skipped; conversely, if the second predicted value is larger and the first predicted value is smaller, it is indicated that the probability of performing binary tree vertical partitioning on the current CU is higher, and at this time, binary tree horizontal partitioning may be skipped.
In the embodiment of the present application, after obtaining the first prediction value of the current CU and the second prediction value of the current CU based on the feature vector of the current CU:
if the first predicted value of the current CU is larger than the first threshold value and the second predicted value of the current CU is smaller than the second threshold value, the computer equipment carries out binary tree horizontal division on the current CU and skips binary tree vertical division of the current CU; if the second predicted value of the current CU is larger than the first threshold value and the first predicted value of the current CU is smaller than the second threshold value, the computer equipment vertically divides the current CU into binary trees and skips the binary tree horizontal division of the current CU;
in other than the above two possible cases, for example, the first prediction value of the current CU and the second prediction value of the current CU are both greater than the first threshold, the computer device cannot accurately skip the binary tree horizontal partition process or the binary tree vertical partition process according to the prediction result.
According to the technical scheme, based on the first predicted value and the second predicted value, the binary tree horizontal division flow or the binary tree vertical division flow of the CU is skipped, the comparison processing is carried out on the first predicted value and the second predicted value through the preset first threshold value and the preset second threshold value, whether the ideal range is met or not is judged, whether the binary tree horizontal division flow or the binary tree vertical division flow of the CU is skipped or not is judged, and therefore the acceleration processing of CU division is achieved.
Based on the method shown in the above embodiments of the present application. Referring to fig. 12, a flowchart of a binary tree partitioning method for an encoding unit according to yet another exemplary embodiment of the present application is shown. The method is performed by a computer device, as shown in fig. 12, and step 920 in the embodiment shown in fig. 9 or fig. 11 described above may be implemented as step 920a.
Step 920a: inputting the feature vector into a network prediction model to obtain a first predicted value and a second predicted value which are output by the network prediction model;
the network prediction model is a machine learning model obtained by training a feature vector sample of the CU sample and an optimal binary tree division mode of the CU sample.
In this embodiment of the present application, the computer device may input the feature vector obtained in step 910 into a network prediction model trained in advance through machine learning, where the network prediction model processes the feature vector, and outputs a first predicted value and a second predicted value.
The computer equipment can conduct supervised training on the machine learning model according to the characteristic vector samples of the CU samples and the optimal binary tree partitioning mode of the CU samples, so that a network prediction model capable of predicting the CU binary tree partitioning probability through the input CU characteristic vector is obtained.
The embodiment of the application provides a technical scheme for acquiring a first predicted value and a second predicted value based on a feature vector, and specifically relates to a network predicted model obtained by training a machine learning model, wherein the feature vector is used as the input of the network predicted model, and the output of the network predicted model is the first predicted value and the second predicted value. The network prediction model obtained through iterative training can quickly skip unnecessary binary tree division in a complex division flow, and the division flow is saved.
Based on the method shown in the above embodiments of the present application. Referring to fig. 13, a flowchart of a binary tree partitioning processing method for an encoding unit according to yet another exemplary embodiment of the present application is shown. The method is performed by a computer device, as shown in fig. 13, and further includes step 912, step 914, and step 916 before step 920 in the embodiment shown in fig. 9 or fig. 11.
Step 912: obtaining the block size of the CU; the block size is used to divide the CUs into a plurality of categories and train the network prediction model according to the categories, respectively.
Step 914: based on the block size, the category of the CU is acquired.
Step 916: based on the category, a network prediction model is obtained.
In the embodiment of the present application, the computer device may obtain the block size of the CU before step 920 described above.
In H.266/VVC, to meet the requirements of video coding of 4K, 8K, etc., the maximum size of CTU is increased to 128x128, and the minimum size is also 4x4. Accordingly, there are also a number of possibilities for the block size of the CU.
Thus, the computer device may divide the CUs into a plurality of categories according to the block sizes of the CUs. And during training, training the corresponding network prediction models according to the categories respectively. When the method is applied, a network prediction model of the CU is obtained according to the category corresponding to the block size of the CU.
The embodiment of the application provides a technical scheme for classifying CUs according to the sizes of the CUs and training and applying a network prediction model according to the classification; the scheme can reduce the complexity of machine learning to a certain extent, and improves the prediction accuracy of the network prediction model so as to accurately skip unnecessary binary tree division.
In some embodiments, the network prediction model includes a first network prediction model and a second network prediction model;
the step 920a may be implemented as:
and respectively inputting the feature vector into a first network prediction model and a second network prediction model, and obtaining a first predicted value output by the first network prediction model and a second predicted value output by the second network prediction model.
Because the binary tree partition includes a binary tree horizontal partition and a binary tree vertical partition, the computer device may construct two network prediction models (i.e., a first network prediction model and a second network prediction model) for determining whether the current CU may skip the binary tree horizontal partition and the binary tree vertical partition, respectively.
In the embodiment of the application, the feature vector is input into a first network prediction model, and the first network prediction model outputs a first prediction value for indicating the binary tree horizontal division probability; and inputting the feature vector into a second network prediction model, the second network prediction model outputting a second prediction value for indicating a binary tree vertical partition probability.
The embodiment of the application provides a technical scheme for respectively obtaining a first predicted value and a second predicted value according to a first network predicted model and a second network predicted model; accordingly, the first network prediction model and the second network prediction model are obtained through training through the feature vector samples of the CU samples and the optimal binary tree division mode of the CU samples, and therefore the efficiency of obtaining the first prediction value and the second prediction value can be improved.
Based on the method shown in the above embodiments of the present application. Referring to fig. 14, a flowchart of a method for training a network prediction model according to an exemplary embodiment of the present application is shown. The method may be performed by a computer device, which may be implemented as a model training device; the model training device may be the same device as the computer device performing steps 910 to 930, or may be a different device. For example, the computer device performing steps 910 through 930 may be a user terminal, and the model training device may be a developer terminal or a server. As shown in fig. 14, the training process of the network prediction model includes steps 1410, 1420, and 1430.
Step 1410: inputting the feature vector sample into a network prediction model to obtain a binary tree division mode prediction result of the CU sample, which is output by the network prediction model; the binary tree partition mode prediction result comprises a first partition mode prediction result and a second partition mode prediction result; the first partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is a horizontal partition, and the second partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is a vertical partition.
In one possible implementation, before performing step 1410, feature vector samples of CU samples may also be obtained, along with an optimal binary tree partitioning of CU samples.
That is, the sample data obtained by the computer device for training may include feature vector samples of CU samples, and an optimal binary tree partitioning of CU samples.
In the embodiment of the application, during initialization, a developer can set parameters of the network prediction model to random initial values. The computer equipment inputs the feature vector samples for training into an initialized network prediction model; and the network prediction model processes the feature vector samples and outputs a binary tree division mode prediction result of the CU samples.
The prediction result of the binary tree division mode of the CU sample comprises the following steps: the first division mode prediction result is used for indicating the probability of the BT_HOR division of the CU sample; and a second partition mode prediction result for indicating the probability of the CU sample to carry out BT_VER partition.
Step 1420: and obtaining a loss function value of the network prediction model based on the optimal binary tree division mode of the CU sample and a binary tree division mode prediction result.
In this embodiment of the present application, the computer device compares the prediction result of the binary tree partitioning method of the CU sample obtained in the above step 1410 with the optimal binary tree partitioning method of the CU sample, and calculates the loss function value of the network prediction model.
Step 1430: and updating parameters of the network prediction model based on the loss function value of the network prediction model.
In the embodiment of the present application, the computer device performs parameter updating on the network prediction model according to the loss function value obtained in the step 1420, so as to improve the prediction accuracy of the network prediction model.
In the embodiment of the present application, the computer device repeats steps 1410, 1420, and 1430, and performs iterative updating and training until the preset condition of the network prediction model is reached.
The developer may set the preset condition as a certain training iteration number or a certain convergence condition.
In some embodiments, a fully connected network is employed to construct a network prediction model.
Fully connected networks, also known as multi-layer perceptrons (Multilayer Perceptron, MLP), are a common model of artificial neural networks. The MLP is made up of multiple fully connected layers, each neuron being connected to all neurons of the previous layer.
Please refer to fig. 15, which illustrates a schematic diagram of a fully connected network structure related to the present application. As shown in fig. 15, a simple 3-layer fully connected network structure comprises an input layer, an hidden layer and an output layer.
Principle of fully connected network: each neuron receives input from all neurons of the previous layer based on the connection weights and activation functions between the neurons, and performs a weighted summation by the connection weights. The result of the weighted summation is then input into the activation function, producing the output of the neuron. This process is repeated in each layer until the output layer is reached.
The predictive function of a fully connected network can be expressed as:
where y is the prediction, x is the input data, weight is the weight matrix, bias is the bias vector, and f is the activation function.
Specifically, let the input data x be a vector, the dimension n, weight be a weight matrix with dimension m×n, bias be a bias vector with dimension m. The prediction function is calculated as follows:
multiplying the input data x with a weight matrix weight:obtaining a vector with a dimension of m;
adding the bias vector b to the above result:obtaining a vector with a dimension of m;
and carrying out nonlinear transformation on the result through an activation function f to obtain a final prediction result y.
Common activation functions include Sigmoid functions, linear rectification (Rectified Linear Unit, reLU) functions, hyperbolic tangent (Tanh) functions, etc., with the specific choice of which activation function depends on the requirements of the task and the design of the network.
The embodiment of the application provides a training scheme of a network prediction model, which comprises the steps of inputting a feature vector sample into the network prediction model; obtaining a binary tree division mode prediction result of a CU sample; acquiring a loss function value according to an optimal binary tree partitioning mode of the CU sample and a binary tree partitioning mode prediction result so as to update parameters of the network prediction model; the method comprises the steps of obtaining two prediction results respectively used for indicating probabilities of horizontal binary tree division and vertical binary tree division based on a network prediction model, so as to obtain the network prediction model through a feature vector sample of a CU sample and an optimal binary tree division mode of the CU sample, and further achieve the purpose of skipping unnecessary binary tree division.
Based on the method shown in the above embodiments of the present application. Referring to fig. 16, a flowchart of a method for training a network prediction model according to another exemplary embodiment of the present application is shown. As shown in fig. 16, step 1410 in the embodiment shown in fig. 14 described above may be implemented as step 1410a, step 1420 may be implemented as step 1420a and step 1420b, and step 1430 may be implemented as step 1430a and step 1430b.
Step 1410a: and respectively inputting the feature vector samples into a first network prediction model and a second network prediction model to obtain a first division mode prediction result output by the first network prediction model and a second division mode prediction result output by the second network prediction model.
In this embodiment of the present application, during initialization, parameters of the first network prediction model and the second network prediction model are random initial values.
The computer equipment inputs the feature vector sample for training into the initialized first network prediction model; the first network prediction model processes the feature vector samples, and outputs a prediction result of a first partition mode of the CU samples so as to predict the probability that the partition type of the CU samples is BT_HOR partition.
Accordingly, the computer device inputs the feature vector samples for training into the initialized second network prediction model; and the second network prediction model processes the feature vector samples, and outputs a second partition mode prediction result of the CU samples so as to predict the probability that the partition type of the CU samples is BT_VER partition.
Step 1420a: and calculating a first loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode of the CU sample and the prediction result of the first partitioning mode.
In this embodiment of the present application, the computer device compares the prediction result of the first partition manner of the CU sample obtained in the above step 1410a with the optimal binary tree partition manner of the CU sample, and calculates the first loss function value.
Step 1420b: and calculating a second loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode and the second partitioning mode prediction result of the CU sample.
In this embodiment of the present application, the computer device compares the prediction result of the second partition mode of the CU sample obtained in the above step 1410a with the optimal binary tree partition mode of the CU sample, and calculates the second loss function value.
Step 1430a: and updating parameters of the first network prediction model based on the first loss function value.
In this embodiment of the present application, the computer device performs parameter updating on the first network prediction model according to the first loss function value obtained in the above step 1420a, so as to improve the accuracy of the bt_hor partition probability predicted by the first network prediction model.
Step 1430b: and updating parameters of the second network prediction model based on the second loss function value.
In this embodiment of the present application, the computer device performs parameter updating on the second network prediction model according to the second loss function value obtained in the above step 1420b, so as to improve the accuracy of the bt_ver partition probability predicted by the second network prediction model.
The embodiment of the application provides a training scheme of a first network prediction model and a second network prediction model, which comprises the steps of respectively inputting the same set of feature vector samples into the first network prediction model and the second network prediction model, respectively obtaining prediction results, respectively calculating respective corresponding loss function values, respectively carrying out parameter updating to obtain two network prediction models, and respectively predicting the probability of binary tree horizontal division or binary tree vertical division.
Referring to FIG. 17, an exemplary training and application flow diagram of a network prediction model is shown in accordance with the present application. As shown in fig. 17, an exemplary training procedure of the network prediction model related to the present application is as follows:
step A1: acquiring a feature vector sample 1701 of a CU sample and an optimal binary tree partitioning mode 1702 of the CU sample;
Step A2: inputting the feature vector samples 1701 into the network prediction model 1710;
step A3: after the network prediction model 1710 processes the feature vector samples 1701, a first division mode prediction result 1703 and a second division mode prediction result 1704 are output;
step A4: comparing the first partition mode prediction result 1703 and the second partition mode prediction result 1704 with the optimal binary tree partition mode 1702 to calculate a loss function 1705;
step A5: updating the network prediction model 1710 according to the loss function 1705;
step A6: and (3) repeating the steps A2 to A5 until the preset training conditions are reached, and storing the current trained network prediction model 1710 for subsequent application.
As shown in fig. 17, an exemplary reasoning flow of the network prediction model related to the present application is as follows:
step B1: acquiring a feature vector 1706 of the CU;
step B2: inputting the feature vector 1706 into the network prediction model 1710;
step B3: the network prediction model 1710 processes the feature vector 1706 and outputs a first predicted value 1707 and a second predicted value 1708;
step B4: in response to the first predictor 1707 being greater than a first threshold and the second predictor 1708 being less than a second threshold, skipping binary tree vertical partitioning of the CU;
Step B5: in response to the second predictor 1708 being greater than the first threshold and the first predictor 1707 being less than the second threshold, binary tree horizontal partitioning of the CU is skipped.
Referring to fig. 18, an exemplary training and application flow diagram of the first and second network prediction models involved in the present application is shown. As shown in fig. 18, an exemplary training procedure for the first network prediction model and the second network prediction model according to the present application is as follows:
step C1: acquiring a feature vector sample 1801 of a CU sample and an optimal binary tree partitioning mode 1802 of the CU sample;
step C2: inputting the feature vector samples 1801 into a first network prediction model 1810;
step C3: after the first network prediction model 1810 processes the feature vector sample 1801, a first division mode prediction result 1803 is output;
step C4: comparing the first partition mode prediction result 1803 with the optimal binary tree partition mode 1802, and calculating a first loss function 1805;
step C5: updating the first network prediction model 1810 according to the first loss function 1805;
step C6: and repeating the steps C2 to C5 until the preset training conditions are reached, and storing the first network prediction model 1810 which is trained currently for subsequent application.
Step C7: inputting the feature vector samples 1801 into a second network prediction model 1820;
step C8: after the second network prediction model 1820 processes the feature vector sample 1801, a second partition mode prediction result 1804 is output;
step C9: comparing the second partition mode prediction result 1804 with the optimal binary tree partition mode 1802, and calculating a second loss function 1806;
step C10: updating the second network prediction model 1820 according to the second loss function 1806;
step C11: and repeating the steps C7 to C10 until the preset training condition is reached, and storing the second network prediction model 1820 which is trained currently for the subsequent application.
As shown in fig. 18, an exemplary reasoning flow of the network prediction model related to the present application is as follows:
step D1: acquiring a feature vector 1807 of the CU;
step D2: the feature vector 1807 is input into the first network prediction model 1810 and the second network prediction model 1820, respectively;
step D3: after the feature vector 1807 is processed by the first network prediction model 1810 and the second network prediction model 1820, a first predicted value 1808 and a second predicted value 1809 are output, respectively;
step D4: in response to the first predictor 1808 being greater than a first threshold and the second predictor 1809 being less than a second threshold, skipping binary tree vertical partitioning of the CU;
Step D5: in response to the second predictor 1809 being greater than the first threshold and the first predictor 1808 being less than the second threshold, binary tree horizontal partitioning of the CU is skipped.
In summary, the scheme introduces machine learning based on texture features, gradient information, sub-block variances and other technologies, utilizes the same CU to divide mutual exclusion rules, extracts a feature vector for two network models to use, and can well skip binary tree horizontal division or binary tree vertical division by combining a predicted value (i.e. confidence level) to realize further acceleration of CU division.
In the video coding process, a frame of image is divided into CTUs, and each CTU is divided into CUs according to a division rule layer by layer. Where the layer is depth, the depth is added once for each refinement with respect to the CTU layer.
The following division types exist in the division process: NONE division (non-division), QT division (quadtree division), BT division (binary tree division) and TT division (trigeminal tree division), wherein BT and TT are collectively called MT (multi-type tree division), so that after various divisions, the optimal depth of BT division, QT division and TT division can be obtained. Specific partition shapes are shown in fig. 19, which shows various CU partition type diagrams related to the present application.
Referring to fig. 20, a CU partitioning sequence example diagram provided in an exemplary embodiment of the present application is shown. In the encoding process, the judgment may be performed sequentially according to the sequence of fig. 20, as shown in fig. 20, the division sequence is: non-split (NONE split), quadtree (QT) split, horizontal binary tree (bt_hor) split, vertical binary tree (bt_ver) split, horizontal trigeminal tree (tt_hor) split, vertical trigeminal tree (tt_ver) split.
In fig. 20, for explaining a specific example, the order is not fixed and can be freely adjusted according to the predicted result, for example, the dividing order is: NONE division→QT division→TT VER division→BT HOR division, for example, the division order is: NONE division→BT_HOR division→BT_VER division→TT_VER division.
Taking fig. 20 as an example, after completing bt_ver (position 1 in fig. 20), or tt_hor (position 2 in fig. 20), or tt_ver (position 3 in fig. 20), it can be known whether the current CU finally selects bt_ver or bt_hor, i.e., obtains the prediction target, and thus the prediction target can be extracted at these three positions (positions 1, 2, 3 in fig. 20).
During the network training process, the extraction of feature vectors and predicted targets is separated. Wherein, extracting the feature vector is before BT division, such as position 0 in fig. 20; the extraction prediction target is after BT partitioning, such as positions 1, 2, 3 in fig. 20.
Only after the BT partition is made, it can be known whether the BT partition is selected, and the specific location of the extracted prediction target may be located after the bt_ver partition or the bt_hor partition, such as location 1 in fig. 20; for example, position 2 or 3 in fig. 20 (i.e., after tt_hor division or tt_ver division); such as further rearward positions (not shown).
When the network is applied, the network prediction is placed before the BT partition, that is, at position 0 in fig. 20, the feature vector is extracted first, and whether bt_hor and bt_ver are partitioned is predicted according to the trained network, and if the partition is not needed, the corresponding partition prediction is skipped.
The feature value extraction is the same in the network prediction process and the training process, and the position of the extracted feature value is different from the position of the extracted prediction target, because the three positions can only acquire the prediction target, namely the position 1, the position 2 and the position 3.
The method comprises the steps that under the condition that the characteristic value extraction and network training processes are offline, characteristic vectors are required to be extracted on an encoder, then a network is trained, and finally the trained network is saved; when the method is applied, the feature vector is also extracted, then a trained network is loaded, network prediction is carried out before binary tree division is calculated, and whether the corresponding binary tree division is skipped or not is guided according to a prediction result.
After NONE division, feature vectors are extracted, and a selected binary tree horizontal division network and a binary tree vertical division network are trained by machine learning respectively. In the encoding process, for the same feature vector, if the selected binary tree horizontal division probability is high and the selected binary tree vertical division probability is low, combining the threshold value can skip binary tree vertical division; conversely, if the selected binary tree vertical partition probability is high and the selected binary tree horizontal partition probability is low, the binary tree horizontal partition may be skipped in conjunction with the threshold.
The implementation process is divided into two stages: network training and application. Wherein, the network training includes: extracting characteristic values, setting a network and acquiring network parameters; applications include feature value extraction, loading network parameters, network prediction, and applications. In network training and application, the feature value extraction method is the same.
According to the scheme, machine learning is applied to CU segmentation, and prediction accuracy is greatly improved by combining other specific conditions, so that good acceleration performance can be realized. It should be noted that, the technical solution provided in the embodiment of the present application may be applied to all video compression protocols, such as h.266/VVC standard, h.265/HEVC standard, AVS (e.g. AVS 3) or next-generation video codec standard; the implementation details may include access, for example, different thresholds, different prediction networks, different excitation functions, and the like, which are not limited in the embodiments of the present application.
Referring to fig. 21, a flowchart of an implementation of a binary tree partitioning method for coding units according to an exemplary embodiment of the present application is shown. As shown in fig. 21, the implementation flow of the method includes the following steps.
S211, data preparation:
input feature vectors and training targets are prepared.
S212, network structure definition:
the number of layers of the network, the number of neurons per layer, the activation function, etc. are determined. Wherein the number of neurons of the input layer is related to the number of eigenvalues; the number of hidden layer neurons can be set by the user; the number of output layer neurons is related to the output target.
S213, initializing weight and bias:
an initial value is set for each connection weight and the bias of each neuron, typically initialized to a random number.
S214, forward propagation:
starting from the input layer, the input data are weighted and summed through the neurons of each layer, and then are processed by an activation function, so that an output result is obtained.
S215, loss function calculation:
and comparing the output result of the network with the label, and calculating the value of the loss function.
S216, back propagation:
the connection weights and offsets are updated by gradient descent according to the value of the loss function to reduce the value of the loss function.
S217, repeating S214 to S216:
and stopping training until the preset training iteration times are reached or convergence conditions are reached, and storing network parameters.
S218, prediction:
and predicting the new input data by using the trained model.
In the implementation process: s211, extracting corresponding characteristic values; s212 and S213 correspond to the network settings of the first stage; s214 to S217 correspond to the network training in the first stage, and after S217 is finished, the trained network parameters are obtained and stored, that is, the network parameters are obtained in the first stage. During network prediction, firstly extracting characteristic values through S211, then loading network parameters stored in S217, and finally calling a trained model to perform second-stage prediction.
The first stage: and (5) training a network.
Referring to fig. 22, a network training flowchart provided in an exemplary embodiment of the present application is shown. As shown in fig. 22, the network training flow is divided into S221 feature value extraction, S222 network setting, S223 network training, and S224 acquisition of network parameters, wherein the network parameters include weights and offsets.
S221, extracting characteristic values:
the total of 26 characteristic values are stored in an array Featenes [28].
The feature values include the following 3 aspects.
Eigenvalue 1 st aspect: optimal information of neighboring blocks.
Wherein, feats [0 ]. About.Feats [11] represent the optimal information of the current block:
featurs [0] = curr_depth/6.0; curr_depth represents the current CU depth;
featues [1] = (5+left_depth-curr_depth)/10.0; wherein left_depth represents the left block CU depth;
features [2] = (5+above_depth-curr_depth)/10.0; where above_depth represents the upper block CU depth;
features [3] = (5+left_qtDepth-qt_depth)/10.0; wherein left_qtdepth represents the left block QT depth, qt_depth represents the current block QT depth;
features [4] = (5+above_qtDepth-qt_depth)/10.0; wherein above_qtdepth represents the upper block QT depth;
features [5] = (5+left_btDepth-bt_Depth)/10.0; wherein left_btdepth represents the upper block bt depth, and bt_depth represents the current block bt depth;
features [6] = (5+above_btDepth-bt_depth)/10.0; where above_btdepth represents the left block bt depth;
femto ones [7] = (5+above_mtdepth-mt_depth)/10.0; wherein above_mtdepth represents the upper block mt depth, and mt_depth represents the current block mt depth;
features [8] = above_mtdepth-mt_depth; wherein, above_mtdepth-mt_depth represents the left block mt depth-the current block mt depth;
Features [9] = (above_par+above_right_par)/4.0; wherein, above_par and above_right_par respectively represent the division types of the upper block and the left block;
featues [10] = (above_depth+above_right_depth)/10.0; wherein, above_depth and above_right_depth represent CU depths at 2 positions of the upper block and the upper right corner, respectively;
featues [11] = log10 (0.001 f+ (left_depth+top_left_depth+left_down_depth)/15.0); wherein left_depth, top_left_depth, and left_down_depth represent CU depths at 3 positions of the left block, upper left corner, and lower left corner, respectively.
Aspect 2 of the eigenvalue: information of the current block.
Wherein, featues [12] to Featues [14]:
featues [12] = slice_level/6.0; wherein slice_level represents the current frame level;
featurs [13] = qp/64.0; wherein qp is a quantization coefficient of the current CU block;
featurs [14] = col_depth/6.0; where col_depth represents the depth of the corresponding homobit block.
Please refer to fig. 23, which illustrates a schematic diagram of reference relationships corresponding to different frame types related to the present application. As shown in fig. 23, the frame types are ordered from large to small in importance by reference relationship I frame > P frame > B frame > non-reference B frame.
In the embodiment of the present application, different GOP structures are possible.
Taking GOP16 as an example, please refer to fig. 24, which shows a reference relationship diagram of GOP16 related to the present application.
As shown in fig. 24, the reference relationships are as follows: poc16 references poc0; poc8 refers to poc0 and poc16; poc4 references poc0 and poc8; poc2 references poc0 and poc4; while poc1, poc3, poc5, poc7, poc9, poc11, poc13 and poc15 are not referenced.
Thus, the weights are shown in table 1 below.
TABLE 1
Weight class Corresponding poc Frame type
0 poc0 I-frame
1 poc16 P frame/GPB frame
2 poc8 B frame
3 poc4、poc12 B frame
4 poc2, poc6, poc10 and poc14 B frame
5 poc1, poc3, poc5, poc7, poc9, poc11, poc13 and poc15 Non-reference B frame
Aspect 3 of the eigenvalue: and (5) analyzing and calculating the current block.
Wherein, feats [15 ]. About.Feats [19] represent gradient information of the current block:
featurs [15] = min (grad_hor×1.0/grad_dup, 10)/10.0; wherein grad_hor and grad_dup represent horizontal and diagonal gradients, respectively;
featurs [16] = min (grad_ver×1.0/grad_dup, 10)/10.0; wherein grad_ver and grad_dup represent vertical and diagonal gradients, respectively;
featurs [17] = min (grad_hor×1.0/grad_dow, 10)/10.0; wherein grad_hor and grad_dup represent the gradient of the horizontal and diagonal lines, respectively;
Featurs [18] = min (grad_ver×1.0/grad_dow, 10)/10.0; wherein grad_ver and grad_dup represent the gradient of the vertical and diagonal lines, respectively;
featurs [19] = min (grad_ratio, 20)/20.0; where grad_ratio represents the ratio of the horizontal gradient to the vertical gradient, respectively.
Wherein, the Features [20] to [24] represent variance information after NONE division is completed on the current block:
featurs [20] = min (pixel_var, 20)/20.0; where pixel_var represents the variance of the current block.
After the current block is NONE, the residual difference is divided into 4 sub-blocks, and sub_var corresponds to the variance of each sub-block.
sum_var = sub_var[0]+ sub_var[1]+ sub_var[2]+ sub_var[3];
If sum_var=0, extract:
Features[21] = Features[22]= Features[23] = Features[24]=0;
otherwise, extracting:
Features[21] = sub_var[0]×1.0/ sum_var;
Features[22] = sub_var[1]×1.0/ sum_var;
Features[23] = sub_var[2]×1.0/ sum_var;
Features[24] = sub_var[3]×1.0/ sum_var。
wherein Featues [25] represents shape information of the current block:
featues [25] = min (cu_luma_w×1.0/cu_luma_h, 8.0)/8.0; where cu_luma_w represents the width of the current block and cu_luma_h represents the height of the current block.
Features [26] =targe 0; and storing a target value target 0, wherein if the current optimal division mode is binary tree horizontal division, the target 0 is 1, and otherwise, the target value is 0.
Features [27] =targe 1; and storing a target value target 1, wherein if the current optimal division mode is binary tree vertical division, the target 1 is 1, and otherwise, the target value is 0.
Wherein, when extracting the characteristic value, the extraction is separated according to the block size.
S222, network setting:
the number of layers of the network, the number of neurons per layer, the activation function, etc. are determined.
Wherein the number of neurons of the input layer is related to the number of eigenvalues; the number of hidden layer neurons can be set by the user; the number of output layer neurons is related to the output target.
In the embodiment of the present application, a 3-layer fully connected network as shown in fig. 15 is used, including an input layer, an hidden layer, and an output layer.
The number of neurons of the input layer is 26, the number of neurons of the hidden layer is 40, and the number of neurons of the output layer is 1.
The kernel activation function uses the ReLU function, i.e., the following formula:
s223, network training:
training by taking Feats [0] to Feats [25] as feature vectors and Feats [26] as a prediction target of BT_HOR division; featues [27] are used as prediction targets of BT_VER division for training; the BT_HOR dividing network and the BT_VER dividing network share the same group of feature vectors to respectively train.
The blocks are classified into 5 classes by size, according to the following rule:
class 1: 64x64 blocks or more;
class 2: 64x32, 32x64, 32x32 blocks;
class 3: 32x16, 16x32, 16x16 blocks;
class 4: 16x8, 8x16, 8x8 blocks;
Class 5: other blocks.
Each class independently extracts the characteristic value and trains independently.
Setting the network iteration 3000 times, and stopping training when the prediction accuracy exceeds 70%.
S224, acquiring network parameters:
after the network training meets the preset conditions, the training is terminated, and the trained parameters weight and bias are saved for use in prediction.
The network parameters are related to a preset network structure, and include an implicit layer weight hidden_weight and an offset hidden_bias, and an output layer weight output_weight and an offset output_bias.
Wherein the weight of the hidden layer is an array with one dimension of 26x 40; the hidden layer offset is an array with a one-dimensional size of 40; the weight of the output layer is an array with one-dimensional size of 40; the hidden layer offset is an array with a one-dimensional size of 1.
Specific data may be obtained through training, and in the embodiment of the present application, taking a class 2 block as an example, the weights and offsets are as follows:
implicit layer weights:
static const double input_weights []= {0.04760, -0.03647, -0.10104, -0.07907, 0.01049, 0.16327, -0.03622, 0.08991, 0.16108, 0.36200, 0.16145, -0.21879, -0.49739, 0.50523, 0.35837, 0.33373, -0.69866, 0.29655, -1.09732, 0.10920, -0.70387, 0.38070, 0.04859, 0.28567, -0.48115, 0.06174, -0.20432, 0.22820, -0.22743, -0.06041, -0.04055, -0.24876, 0.06157, 0.06158, 0.27322, 0.02066, -0.08252, 0.29567, 0.45072, -0.10309, -0.10968, 0.41083, 0.16058, 0.11033, -0.04076, -0.15501, -0.23484, 0.33383, 0.00822, -0.37877, -0.05320, 0.29072, -0.07172, -0.13401, -0.42713, 0.10317, -1.15190, 0.03944, -0.42386, -0.11117, 0.04519, 0.25985, 0.73584, 0.35122, -0.39767, 0.04125, 0.00534, -0.84173, 0.38044, -0.54883, 0.81034, -0.54770, 0.37716, -0.04578, 0.15733, 0.07246, 0.06513, 0.36784, 0.05780, -0.18343, -0.09454, 0.05180, 0.02493, -0.10948, -0.03806, -0.01572, -0.17459, -0.09895, 0.12878, 0.11201, 0.01240, -0.00031, -0.08884, 0.05982, 0.08162, 0.00265, -0.08367, -0.18883, -0.18849, -0.19499, -0.01002, 0.07683, -0.17548, 0.00012, 0.26017, -0.08461, -0.22506, 0.46334, 0.38601, 0.24870, -0.72013, -0.51037, -0.19903, -0.35210, -0.32294, -0.00511, -0.32072, -0.23067, -0.45318, 1.05682, 0.74720, 0.68890, 0.25756, -0.88745, -0.93780, 0.01837, 0.37825, 0.19392, -0.01311, 0.75573, 0.11638, -0.03966, -0.24912, -0.61898, -0.71968, 0.11227, -0.00881, 0.18293, -0.19987, 0.33330, 0.56147, 0.05090, -0.09890, 0.33675, 0.14805, -1.07460, -0.28573, -1.06477, 0.11057, 0.59421, 1.44287, -0.00595, -0.30727, -0.24584, 0.34241, 0.21160, -0.41796, 0.08333, -0.50816, 0.39642, -0.40606, -0.24906, -0.68565, 0.05162, -0.16754, -0.35203, 0.04575, 0.03911, -0.05410, 1.31715, -0.21610, -0.37107, 0.88715, -0.62470, 0.96371, -0.27272, 0.70255, 0.25653, 0.03809, 0.19279, 0.23180, -0.18487, -0.05263, -0.14196, -0.16434, -0.11875, 0.04599, 0.20261, 0.27684, -0.16826, -0.03455, -0.14689, 0.09332, -0.10545, -0.09217, -0.01769, 0.19546, -0.04754, -0.02555, -0.05274, 0.01808, -0.06730, -0.09835, 0.02890, 0.19857, -0.12077, 0.13080, 0.08109, -0.18216, -0.07773, 0.07601, -0.07410, 0.37668, -0.12389, -0.93333, -0.04569, 0.07719, -0.10938, -0.01018, 0.37297, 0.15589, 0.03450, -0.18821, -0.19953, -0.05840, -0.09981, -0.35305, 0.42295, -0.20853, 0.35999, 1.06022, -0.52197, -0.23480, 0.36116, -0.01999, 0.10757, -0.00468, 0.08761, 0.18705, 0.01958, -0.06577, 0.01123, 0.00317, -0.21835, 0.17420, -0.14475, 0.14789, -0.08369, -0.07598, 0.00387, -0.18346, 0.04961, 0.18490, -0.00809, 0.05356, -0.06251, -0.13455, 0.15201, 0.18737, 0.05239, 0.10133, -0.31787, -0.26668, -0.09922, 0.42248, 0.08348, 0.28222, -0.45658, -0.09959, 0.19574, -0.03565, -0.20378, 0.29226, -0.51944, -0.04053, -0.10016, -0.33089, 0.11978, -0.64116, -1.18684, -0.45069, 0.04378, 0.51212, -0.55573, 1.01261, -0.24068, -0.05533, 0.14926, -0.04840, 0.01683, 0.15168, 0.09575, -0.14698, 0.04320, 0.18180, 0.11509, -0.09088, 0.07791, 0.25678, -0.16156, 0.03509, -0.12920, 0.10865, 0.16629, 0.21100, -0.19050, -0.14992, -0.13119, 0.07837, -0.04250, -0.03228, 0.12708, 0.07128, -0.00816, 0.19389, -0.05792, 0.14158, -0.16758, -0.05202, -0.19466, -0.14720, -0.18937, -0.10283, 0.04271, -0.01911, -0.04545, 0.00959, -0.03610, 0.07671, -0.16251, -0.11264, 0.15900, -0.01981, -0.01017, 0.11101, -0.16035, -0.04945, -0.16915, 0.17462, -0.46449, -0.12944, -0.18419, 0.22181, -0.19341, -1.03948, 0.25446, 0.24272, 0.14222, -0.23281, -0.07330, -0.00616, 0.08312, -0.08990, 0.41195, 1.06754, 0.19642, 0.90615, -0.45216, -1.03210, 0.31971, 0.37961, -0.02528, -0.17478, 0.09856, 0.04934, -0.27992, 0.25879, -0.77417, 0.54613, -0.03051, 0.74489, -0.13106, 0.35704, 0.14777, -0.24547, -0.15893, -0.12315, 0.24757, 0.15321, 0.61416, 0.59648, -0.10639, 0.70786, 0.44613, 0.57965, 0.55758, 0.09410, -0.68465, -0.10358, -0.81610, -0.15443, -0.05684, -0.06949, 0.00159, -0.09392, 0.11547, -0.17315, 0.10908, -0.16384, -0.17779, 0.12739, 0.18310, -0.09363, -0.08880, 0.10614, 0.14317, -0.12913, 0.16317, 0.13601, 0.03363, 0.18589, 0.06349, -0.17134, -0.04415, -0.01579, 0.15000, -0.80758, 0.37233, 0.49112, 0.72428, -0.11614, 0.13128, 0.17396, -0.14351, 0.56652, -0.58502, 0.41877, -0.38723, -0.97285, 0.83027, 0.03739, -0.34084, 0.35583, -0.24664, 0.26045, 0.37646, 0.18766, -0.22604, -0.20127, -0.14182, -0.18578, 0.28692, -0.24560, 0.04553, 0.21472, 0.20301, 0.01370, -0.00587, 0.79476, -0.11023, 0.25067, -0.33597, -0.17191, -0.43860, -0.22309, 0.12400, -0.06540, 0.57425, -0.18913, 0.12232, -0.30972, 0.31309, -0.03151, 0.06225, 0.06008, 0.47963, -0.57675, -0.12080, 0.29570, -0.31297, 0.04083, -0.06865, 0.08459, 0.11889, 0.02706, -0.16589, 0.05407, -0.16040, 0.42108, -0.08077, -0.32177, 0.14174, -0.14697, -0.37091, 0.32372, -0.59014, 0.29479, -0.97553, -0.20057, 0.21497, 0.13354, 0.19295, 0.04106, -3.45083, -0.13773, -0.00518, -0.13434, -0.18538, -0.24256, 0.51066, 0.38237, 0.48322, -0.10555, 0.20421, 0.10690, 0.02424, -0.27003, -0.23207, -0.28995, 0.55867, -0.19528, 0.31399, -0.10269, -0.39086, -0.23559, -0.40344, -0.18389, -0.09583, 0.41200, 0.27083, 0.58885, -0.27986, 0.14488, -0.63326, 0.50426, -0.19862, -0.42787, -0.17754, 0.05555, 0.15126, 0.17615, -0.11758, -0.32264, -1.33270, -0.11077, -0.19344, -0.07854, 0.33231, 0.42338, 0.06876, -0.26895, 0.41235, -0.03240, -0.03065, 0.39798, -0.01567, -0.19140, 0.36862, -0.62475, 0.23160, -0.19279, 0.13537, 1.04618, -0.01667, -0.55001, -0.04237, -0.29501, 0.06151, -0.46286, 0.18899, -0.20024, 0.13467, -0.44504, -0.01020, -0.54095, 0.54657, 0.44897, -0.07541, -0.22953, 0.06171, -0.31137, -0.34704, 0.08565, -0.03217, 0.14172, -0.18791, 0.09542, 0.07110, 0.21889, 0.12372, 0.03618, 0.17532, -0.19688, 0.12242, 0.44215, -1.15293, -0.30010, 0.09620, 0.94306, -0.39402, 1.31146, 0.03955, 0.79284, -0.36249, 0.34764, -0.18840, 0.50619, -0.03156, 0.69526, -0.89954, 0.30294, -1.35902, 0.18244, 0.05656, 0.31639, -0.19152, -0.12969, 0.74703, 0.49449, -0.01640, 0.08275, -1.42506, 0.01248, 0.42484, 0.10259, 0.38655, -0.68024, 0.21444, -0.37933, 0.34536, 0.19402, 0.64728, -0.38382, 0.33298, 0.05526, -0.05961, -0.09757, 0.12523, -0.16179, -0.00724, 0.17719, -0.09389, 0.06343, 0.00228, -0.10013, 0.08673, 0.08562, -0.13591, -0.09240, 0.16017, 0.10558, 0.00202, -0.12715, 0.08542, 0.03103, -0.13239, 0.17299, 0.08035, -0.12890, -0.01340, -0.15675, -0.11221, -0.16642, 0.18511, -0.17120, 0.17106, -0.10721, -0.13473, 0.07994, -0.17551, -0.15389, -0.08902, 0.07608, -0.09933, 0.06955, 0.01058, -0.12180, -0.16627, -0.02240, -0.00889, 0.02697, -0.08092, -0.15135, -0.09582, 0.08550, -0.17719, -0.19396, -0.13129, 0.01746, -0.13852, 0.10985, -0.02542, 0.13931, -0.18050, -0.06131, -0.12801, 0.08454, -0.13640, 0.04764, 0.05124, -0.08648, -0.18627, 0.16935, -0.09059, 0.10356, -0.08924, -0.11810, 0.15300, -0.18737, -0.12739, 0.17431, 0.08218, -0.19264, 0.23869, 0.08060, -0.28332, -0.18890, 0.18871, 0.19284, 0.04769, -0.05957, -0.11046, 0.17683, 0.14709, -0.20498, 0.03274, 0.11269, -0.31240, -0.00250, -0.03111, -0.31514, -0.16157, 0.72389, 0.12819, 0.80841, -0.08480, -1.19084, 0.07514, -0.38341, 0.11757, 0.09460, -0.14778, 0.05796, -0.01100, -0.01657, -0.10209, -0.04309, 0.04993, 0.01769, 0.02706, -0.02499, -0.07421, 0.09528, -0.09313, 0.46343, -0.06097, 0.61984, -0.14329, -0.06388, 0.00761, 1.22202, -0.01042, -1.33522, 0.21833, 0.03128, 0.10748, 0.20988, 0.02513, 0.42270, 0.61083, -0.07267, 0.09068, -0.42178, -0.05420, -0.50084, -0.01951, 0.14768, -0.28950, 0.25145, -0.17120, 0.18452, -0.12168, 0.56126, 0.20966, 0.40942, 0.17493, 0.72921, -0.72384, -0.42633, -0.00928, -0.04223, 0.08723, -0.12447, 0.09616, 0.11134, 0.08946, 0.06235, -0.00216, 0.02692, 0.09565, 0.06527, -0.06120, -0.19545, 0.28526, 0.12610, 0.02485, -0.24686, 0.01505, -0.18337, 0.17922, -0.22882, 0.24161, 0.06305, -0.00941, -0.03963, 0.06657, -0.08778, 0.02306, -0.07568, -0.44057, -0.02187, 0.01269, 0.00047, 0.46093, 0.00053, 0.14052, -0.01136, 0.09048, 0.02697, 0.10940, -0.10914, 0.29490, -0.48298, 0.13201, -0.36853, 0.66412, 0.34504, -0.85159, -0.59096, 1.08483, -0.04399, -0.21786, -0.03471, 0.26464, -0.23305, -0.02498, 0.03526, 0.26375, 0.14783, 0.34781, -0.08904, 0.20431, 0.29050, 0.00515, -0.68633, 0.36539, 0.02222, 0.16630, -0.65813, 0.13030, -0.93291, -0.08679, -0.60195, 0.00167, -0.07015, 0.21024, -0.07327, -0.07604, -0.06634, 0.04399, -0.32850, -0.11399, -0.22686, 0.11699, 0.85337, -0.19981, 0.10073, -0.04051, 0.09898, -0.55663, -0.07765, -0.02340, 0.20614, 0.33593, -0.02591, 0.41097, -0.48349, -0.00521, -0.05976, 0.16761, 0.02667, -0.18094, -0.15205, 0.86197, -0.27232, 0.03400, 0.07707, 0.01885, 0.31753, 0.06489, 0.18272, -0.13177, -0.08914, 0.10711, -0.24754, 0.49023, -0.33077, -0.49683, 0.22859, -0.07681, 0.07996, 0.07153, -0.03930, -0.65425, 0.25223, 0.19076, 0.30282, -0.45070, 0.28126, -0.58153, -0.17906, 0.00714, -0.00011, 0.14388, 0.12977, 0.06349, 0.13489, -0.11073, 0.15896, 0.11212, -0.16403, -0.19486, -0.03981, 0.02534, 0.03628, -0.19770, -0.09480, 0.11529, 0.08618, -0.05851, -0.04927, -0.18698, -0.03119, 0.15585, 0.03342, -0.17333, 0.63606, -0.07486, -0.16375, 0.24231, -0.60277, -0.07415, 0.17920, -0.34517, 0.00834, -0.31800, 0.05503, -0.06555, 0.01556, 0.91247, -0.00988, -0.09255, -0.57756, 0.73445, -0.65966, 0.45647, 0.13118, -0.06159, -0.01303, -0.10327, -0.14559, 3.07420, 0.04326, -0.17263, -0.08769, 0.06322, -0.13695, 0.02142, -0.15202, -0.05089, -0.01839, -0.18187, 0.07607, -0.17917, -0.05866, 0.11383, 0.11826, -0.04722, -0.15570, -0.15826, 0.19583, 0.13311, -0.07741, -0.15912, 0.00298, 0.06985, -0.10757, -0.13796, 0.07895, 0.14776, -0.16065, 0.02721, 0.19718, -0.13357, -0.08039, 0.10543, -0.11383, -0.17659, 0.10317, -0.14896, -0.01410, -0.11459, -0.09734, 0.08240, 0.01062, -0.02052, -0.03532, 0.08966, -0.01393, -0.11822, 0.15044, -0.13362, -0.14914, -0.17725, -0.32301, 0.48353, 0.15588, -0.11787, -0.02686, -0.02420, -0.14384, 0.40574, 0.31438, 0.58118, 0.44178, 0.09752, 0.08094, 0.28604, -0.64433, -0.26022, -0.42784, -0.03032, -0.62485, 0.65217, -0.29910, -0.54750, -0.23918, -0.09485, 0.39482, 0.03712};
output layer weights:
static const double output_weights [] = { 1.09488, -0.93884, -1.77023, 0.01392, 2.32947, -1.79913, -1.90265, -0.31672, 1.45659, -0.10612, 1.49255, -0.23891, 0.13934, -2.18159, -1.46394, -0.08207, -1.67509, -1.35871, 3.37578, 1.13655, -1.77573, -1.81435, -1.51022, 1.97785, 0.12707, -0.04356, -0.06310, 1.70617, -1.64525, -1.18029, 0.26660, -1.49411, 0.99013, 1.19943, 1.08996, 0.04481, 2.57918, 0.11002, 0.12663, 1.26044 };
implicit layer offset:
static const double input_bias [] = { 0.37278, -0.13274, 0.38478, -0.03453, 0.79225, -0.15214, 0.37416, -0.12674, 0.67264, 0.04300, 0.59819, 0.03999, -0.19433, 0.61860, -0.34357, -0.18109, -0.53503, -0.35872, 0.32299, -0.09930, 0.54637, -0.10248, 0.28151, 0.68674, -0.15636, 0.10470, -0.12352, -0.54218, 0.02838, -0.12596, 0.00768, -0.02299, 0.19139, -0.01603, 0.13173, 0.07038, -0.15892, 0.15038, -0.02363, -0.34877 };
output layer offset:
static const double output_bias [] = { 0.23259 }。
and a second stage: application.
Referring to fig. 25, a flowchart of an application provided in an exemplary embodiment of the present application is shown. As shown in fig. 25, the application flow includes S251 eigenvalue extraction, S252 loading network parameters, S253 network prediction, and S254 application.
S251, extracting characteristic values:
the feature value extraction method is the same as the network training in the first stage, only the first 26 feature values need to be recorded, and the description is omitted.
S252, loading network parameters:
the network parameters include the number of neurons at each layer of the network, the activation function, the hidden layer weights hidden_weight and the offset hidden_bias, and the output layer weights output_weight and the offset output_bias. The number of neurons and fierce functions of each layer are consistent with parameters in network training.
Wherein the respective network parameters are loaded according to the block type.
S253, network prediction:
the predictive function of a fully connected network can be expressed as:
step1, hidden layer calculation:
the Feature vector Feature [ j ] is multiplied by the hidden layer weight hidden_weight [ i ], and then an offset hidden_bias [ i ] is added to obtain a hidden layer output result, which is recorded as a value [ i ], and each hidden layer obtains a result.
Wherein i is a value range [0, 39], and j is a value range [0, 25].
The result is then modified with an activation function ReLU, namely:
step2, output layer calculation:
multiplying the final output result value [ j ] of the hidden layer by the output layer weight output_weight [ i ], adding the offset output_bias [0] to obtain an output layer result, and recording the final prediction result y.
And the same group of feature vectors are used for respectively carrying out BT_HOR partition prediction and BT_VER partition prediction to respectively obtain predicted values of BT_HOR partition and BT_VER partition, and the predicted values are respectively marked as score_h and score_v.
S254, application:
after NONE is done, before BT segmentation, whether BT segmentation is done or not is judged, and if so, the following flow judgment is performed.
If score_v is greater than threshold thr0 and score_h is less than threshold thr1, skipping binary tree horizontal partitioning;
if score_h is greater than threshold thr0 and score_v is less than threshold thr1, binary tree vertical partitioning is skipped.
In the embodiment of the present application, the threshold thr0 is 1.5, and thr1 is 0.3.
Wherein, different grade thresholds may be different.
Fig. 26 is a block diagram illustrating a binary tree partitioning processing apparatus for an encoding unit, which may be used to perform all or part of the steps performed by a computer device in the method of fig. 9, 11, 12, 13, 14, or 16, as shown in fig. 26, according to an exemplary embodiment of the present application, the apparatus comprising:
a first obtaining module 2601, configured to obtain a eigenvalue vector of the coding unit CU; the feature value vector is used for representing the image features of the CU and the neighborhood image features of the CU;
A second obtaining module 2602, configured to obtain a first predicted value and a second predicted value based on the feature value vector; the first predicted value is used for indicating the probability that the binary tree partition type of the CU is horizontal partition; the second predicted value is used for indicating the probability that the binary tree partition type of the CU is vertical partition;
an execution module 2603 is configured to skip at least one partition procedure of binary tree horizontal partition and binary tree vertical partition of the CU based on the first predicted value and the second predicted value.
In some embodiments, the module 2603 is executed to,
in response to the first predictor being greater than a first threshold and the second predictor being less than a second threshold, skipping binary tree vertical partitioning of the CU; the first threshold is greater than the second threshold;
in response to the second predictor being greater than the first threshold and the first predictor being less than the second threshold, binary tree horizontal partitioning of the CU is skipped.
In some embodiments, a second acquisition module 2602, for,
inputting the feature vector into a network prediction model to obtain a first predicted value and a second predicted value which are output by the network prediction model;
the network prediction model is a machine learning model obtained by training a feature vector sample of the CU sample and an optimal binary tree division mode of the CU sample.
In some embodiments, the apparatus shown in fig. 26 above further comprises: a third acquisition module, for acquiring, by the first acquisition module,
obtaining the block size of the CU; the block size is used for dividing the CUs into a plurality of categories, and respectively training a network prediction model according to the categories;
acquiring a category of the CU based on the block size;
based on the category, a network prediction model is obtained.
In some embodiments, the network prediction model includes a first network prediction model and a second network prediction model;
the second acquisition module 2602, is also configured to,
and respectively inputting the feature vector into a first network prediction model and a second network prediction model, and obtaining a first predicted value output by the first network prediction model and a second predicted value output by the second network prediction model.
In some embodiments, the feature vector comprises:
information of neighboring blocks of the CU, and information of the CU;
the information of the neighboring blocks includes at least one of: CU depth of the neighboring block, QT depth of the neighboring block, BT depth of the neighboring block, MT depth of the neighboring block, and optimal partition type of the neighboring block; the neighboring block is an encoded block that is neighboring the CU and has been encoded;
the information of the CU includes at least one of: the depth of a CU, the frame level of a CU, the quantization coefficients of the coding unit CU, the depth of the co-located blocks of a CU, gradient information of a CU, variance information of a CU, and shape information of a CU.
In some embodiments, the apparatus shown in fig. 26 above further comprises: a training module, which is used for the training of the training device,
inputting the feature vector sample into a network prediction model to obtain a binary tree division mode prediction result of the CU sample, which is output by the network prediction model; the binary tree partition mode prediction result comprises a first partition mode prediction result and a second partition mode prediction result; the first partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is horizontal partition, and the second partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is vertical partition;
acquiring a loss function value of a network prediction model based on an optimal binary tree division mode of the CU sample and a binary tree division mode prediction result;
and updating parameters of the network prediction model based on the loss function value of the network prediction model.
In some embodiments, the training module is further configured to,
respectively inputting the feature vector sample into a first network prediction model and a second network prediction model, and obtaining a first division mode prediction result output by the first network prediction model and a second division mode prediction result output by the second network prediction model;
Calculating a first loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode of the CU sample and the prediction result of the first partitioning mode;
calculating a second loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode of the CU sample and the second partitioning mode prediction result;
based on the first loss function value, carrying out parameter updating on the first network prediction model;
and updating parameters of the second network prediction model based on the second loss function value.
Fig. 27 is a block diagram illustrating a binary tree partitioning processing apparatus for an encoding unit, which may be used to perform all or part of the steps performed by a computer device in the method of fig. 17 or 18, as shown in fig. 27, according to another exemplary embodiment of the present application, the apparatus comprising:
the sample obtaining module 2701 is configured to obtain a eigenvalue vector sample of the CU sample and an optimal binary tree partitioning manner of the CU sample;
the input-output module 2702 is configured to input the eigenvalue vector samples into the network prediction model to obtain a binary tree partitioning mode prediction result of the CU samples output by the network prediction model; the binary tree partition mode prediction result comprises a first partition mode prediction result and a second partition mode prediction result; the first partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is horizontal partition, and the second partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is vertical horizontal partition;
The loss acquisition module 2703 is configured to acquire a loss function value of the network prediction model based on an optimal binary tree partitioning method of the CU samples and a binary tree partitioning method prediction result;
a parameter updating module 2704, configured to update parameters of the network prediction model based on the loss function value of the network prediction model;
the network prediction model is used for outputting a first predicted value and a second predicted value based on the input characteristic vector of the CU; the first predicted value is used for indicating the probability that the division type of the CU is horizontal division; the second prediction value is used to indicate a probability that the partition type of the CU is a vertical partition.
In some embodiments, input-output module 2702, for,
respectively inputting the eigenvalue vector samples into a first network prediction model and a second network prediction model to obtain a first division mode prediction result output by the first network prediction model and a second division mode prediction result output by the second network prediction model;
the loss acquisition module 2703, for,
calculating a first loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode of the CU sample and the prediction result of the first partitioning mode;
Calculating a second loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode of the CU sample and the second partitioning mode prediction result;
a parameter update module 2704, for,
based on the first loss function value, carrying out parameter updating on the first network prediction model;
and updating parameters of the second network prediction model based on the second loss function value.
Fig. 28 illustrates a block diagram of a computer device 2800, shown in accordance with one exemplary embodiment of the present application. The computer device may be implemented as a server in the above-described aspects of the present application. The computer device 2800 includes a central processing unit (Central Processing Unit, CPU) 2801, a system Memory 2804 including a random access Memory (Random Access Memory, RAM) 2802 and a Read-Only Memory (ROM) 2803, and a system bus 2805 connecting the system Memory 2804 and the central processing unit 2801. The computer device 2800 also includes a mass storage device 2806 for storing an operating system 2809, application programs 2810, and other program modules 2811.
The mass storage device 2806 is connected to the central processing unit 2801 by a mass storage controller (not shown) connected to the system bus 2805. The mass storage device 2806 and its associated computer-readable media provide non-volatile storage for the computer device 2800. That is, the mass storage device 2806 may include a computer readable medium (not shown) such as a hard disk or a compact disk-Only (CD-ROM) drive.
Without loss of generality, the computer readable medium may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, erasable programmable read-Only register (Erasable Programmable Read Only Memory, EPROM), electrically erasable programmable read-Only Memory (EEPROM) flash Memory or other solid state Memory technology, CD-ROM, digital versatile disks (Digital Versatile Disc, DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the ones described above. The system memory 2804 and mass storage 2806 described above may be collectively referred to as memory.
The computer device 2800 can also operate through a network, such as the internet, to a remote computer on the network, according to various embodiments of the disclosure. That is, the computer device 2800 can be connected to the network 2808 through a network interface unit 2807 connected to the system bus 2805, or alternatively, the network interface unit 2807 can be used to connect to other types of networks or remote computer systems (not shown).
The memory further includes at least one computer program stored in the memory, and the central processing unit 2801 implements all or part of the steps of the methods shown in the above respective embodiments by executing the at least one computer program.
In an exemplary embodiment, a chip is also provided, the chip comprising programmable logic circuits and/or program instructions for implementing the binary tree partitioning method for an encoding unit of the above aspect when the chip is run on a computer device.
In an exemplary embodiment, a computer program product is also provided, the computer program product comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor reads and executes the computer instructions from the computer readable storage medium to implement the binary tree partitioning processing method for the coding unit provided in the above-mentioned method embodiments.
In an exemplary embodiment, there is also provided a computer readable storage medium having stored therein a computer program that is loaded and executed by a processor to implement the binary tree partitioning processing method for an encoding unit provided by the above-described method embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing description of the preferred embodiments is merely exemplary in nature and is in no way intended to limit the invention, since it is intended that all modifications, equivalents, improvements, etc. that fall within the spirit and scope of the invention.

Claims (15)

1. A binary tree partitioning method for an encoding unit, the method comprising:
acquiring a characteristic vector of a coding unit CU; the feature vector is used for representing the image features of the CU and the neighborhood image features of the CU;
based on the feature vector, a first predicted value and a second predicted value are obtained; the first predicted value is used for indicating the probability that the binary tree division type of the CU is horizontal division; the second predicted value is used for indicating the probability that the binary tree partition type of the CU is vertical partition;
and skipping at least one partition flow of binary tree horizontal partition and binary tree vertical partition of the CU based on the first predicted value and the second predicted value.
2. The method of claim 1, wherein the skipping at least one of a binary tree horizontal partition and a binary tree vertical partition for the CU based on the first predictor and the second predictor comprises:
in response to the first predictor being greater than a first threshold and the second predictor being less than a second threshold, skipping binary tree vertical partitioning of the CU; the first threshold is greater than the second threshold;
In response to the second predictor being greater than the first threshold and the first predictor being less than the second threshold, binary tree horizontal partitioning of the CU is skipped.
3. The method according to claim 1 or 2, wherein the obtaining a first predicted value and a second predicted value based on the feature vector comprises:
inputting the feature vector into a network prediction model to obtain the first predicted value and the second predicted value output by the network prediction model;
the network prediction model is a machine learning model obtained by training a feature vector sample of a CU sample and an optimal binary tree division mode of the CU sample.
4. The method of claim 3, wherein prior to obtaining the first predicted value and the second predicted value based on the feature vector, further comprising:
acquiring the block size of the CU; the block size is used for dividing the CU into a plurality of categories, and training the network prediction model according to the categories respectively;
acquiring a category of the CU based on the block size;
and acquiring the network prediction model based on the category.
5. The method of claim 3, wherein the network prediction model comprises a first network prediction model and a second network prediction model;
The step of inputting the feature vector into a network prediction model to obtain the first predicted value and the second predicted value output by the network prediction model includes:
and respectively inputting the feature vector into the first network prediction model and the second network prediction model, and acquiring the first predicted value output by the first network prediction model and the second predicted value output by the second network prediction model.
6. The method of claim 1, wherein the feature vector comprises:
information of neighboring blocks of the CU, and information of the CU;
the information of the neighboring blocks includes at least one of: the CU depth of the neighboring block, the QT depth of the neighboring block, the BT depth of the neighboring block, the MT depth of the neighboring block, and the optimal partition type of the neighboring block; the neighboring block is an encoded block that is neighboring the CU and has been encoded;
the information of the CU includes at least one of: the depth of the CU, the frame level of the CU, the quantization coefficients of the CU, the depth of the co-located blocks of the CU, the gradient information of the CU, the variance information of the CU, and the shape information of the CU.
7. The method of claim 5, wherein the method further comprises:
inputting the feature vector sample into the network prediction model to obtain a binary tree partitioning mode prediction result of the CU sample, wherein the binary tree partitioning mode prediction result is output by the network prediction model; the binary tree partition mode prediction result comprises a first partition mode prediction result and a second partition mode prediction result; the first partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is horizontally partitioned, and the second partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is vertically partitioned;
acquiring a loss function value of the network prediction model based on an optimal binary tree partitioning mode of the CU sample and a prediction result of the binary tree partitioning mode;
and updating parameters of the network prediction model based on the loss function value of the network prediction model.
8. The method according to claim 7, wherein the inputting the feature vector samples into the network prediction model to obtain the binary tree partitioning mode prediction result of the CU samples output by the network prediction model includes:
Respectively inputting the feature vector sample into the first network prediction model and the second network prediction model, and obtaining the first division mode prediction result output by the first network prediction model and the second division mode prediction result output by the second network prediction model;
the obtaining the loss function value of the network prediction model based on the optimal binary tree partitioning method of the CU samples and the binary tree partitioning method prediction result includes:
calculating a first loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode of the CU sample and the prediction result of the first partitioning mode;
calculating a second loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode of the CU sample and the second partitioning mode prediction result;
the parameter updating of the network prediction model based on the loss function value of the network prediction model comprises the following steps:
based on the first loss function value, updating parameters of the first network prediction model;
And updating parameters of the second network prediction model based on the second loss function value.
9. A binary tree partitioning method for an encoding unit, the method comprising:
obtaining a feature vector sample of a CU sample and an optimal binary tree division mode of the CU sample;
inputting the feature vector sample into a network prediction model to obtain a binary tree division mode prediction result of the CU sample, wherein the binary tree division mode prediction result is output by the network prediction model; the binary tree partition mode prediction result comprises a first partition mode prediction result and a second partition mode prediction result; the first partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is horizontally partitioned, and the second partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is vertically partitioned;
acquiring a loss function value of the network prediction model based on an optimal binary tree partitioning mode of the CU sample and a prediction result of the binary tree partitioning mode;
based on the loss function value of the network prediction model, carrying out parameter updating on the network prediction model;
The network prediction model is used for outputting a first prediction value and a second prediction value based on the input characteristic vector of the CU; the first prediction value is used for indicating the probability that the partition type of the CU is horizontal partition; the second predictor is for indicating a probability that a partition type of the CU is a vertical partition.
10. The method according to claim 9, wherein the inputting the feature vector samples into the network prediction model to obtain the binary tree partitioning mode prediction result of the CU samples output by the network prediction model includes:
respectively inputting the feature vector sample into a first network prediction model and a second network prediction model, and obtaining the first division mode prediction result output by the first network prediction model and the second division mode prediction result output by the second network prediction model;
the obtaining the loss function value of the network prediction model based on the optimal binary tree partitioning method of the CU samples and the binary tree partitioning method prediction result includes:
calculating a first loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode of the CU sample and the prediction result of the first partitioning mode;
Calculating a second loss function value in the loss function values of the network prediction model based on the difference between the optimal binary tree partitioning mode of the CU sample and the second partitioning mode prediction result;
the parameter updating of the network prediction model based on the loss function value of the network prediction model comprises the following steps:
based on the first loss function value, updating parameters of the first network prediction model;
and updating parameters of the second network prediction model based on the second loss function value.
11. A binary tree partitioning processing apparatus for an encoding unit, said apparatus comprising:
the first acquisition module is used for acquiring the characteristic vector of the coding unit CU; the feature vector is used for representing the image features of the CU and the neighborhood image features of the CU;
the second obtaining module is used for obtaining a first predicted value and a second predicted value based on the feature vector; the first predicted value is used for indicating the probability that the binary tree division type of the CU is horizontal division; the second predicted value is used for indicating the probability that the binary tree partition type of the CU is vertical partition;
and the execution module is used for skipping at least one partition flow of binary tree horizontal partition and binary tree vertical partition of the CU based on the first predicted value and the second predicted value.
12. A binary tree partitioning processing apparatus for an encoding unit, said apparatus comprising:
the sample acquisition module is used for acquiring the feature vector samples of the CU samples and the optimal binary tree division mode of the CU samples;
the input/output module is used for inputting the feature vector sample into a network prediction model to obtain a binary tree division mode prediction result of the CU sample, wherein the binary tree division mode prediction result is output by the network prediction model; the binary tree partition mode prediction result comprises a first partition mode prediction result and a second partition mode prediction result; the first partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is horizontally partitioned, and the second partition mode prediction result is used for indicating the probability that the binary tree partition type of the CU sample is vertically partitioned;
the loss acquisition module is used for acquiring a loss function value of the network prediction model based on an optimal binary tree division mode of the CU sample and a binary tree division mode prediction result;
the parameter updating module is used for updating parameters of the network prediction model based on the loss function value of the network prediction model;
The network prediction model is used for outputting a first prediction value and a second prediction value based on the input characteristic vector of the CU; the first prediction value is used for indicating the probability that the partition type of the CU is horizontal partition; the second predictor is for indicating a probability that a partition type of the CU is a vertical partition.
13. A computer device comprising a processor and a memory storing at least one computer instruction that is loaded and executed by the processor to implement a binary tree partitioning method for an encoding unit according to any one of claims 1 to 10.
14. A computer readable storage medium having stored therein at least one computer instruction that is loaded and executed by a processor to implement a binary tree partitioning method for coding units according to any one of claims 1 to 10.
15. A computer program product, the computer program product comprising computer instructions stored in a computer readable storage medium; the computer instructions are read and executed by a processor of a computer device to implement a binary tree partitioning method for an encoding unit as claimed in any one of claims 1 to 10.
CN202410131178.3A 2024-01-31 2024-01-31 Binary tree partitioning processing method, equipment and storage medium for coding unit Pending CN117692663A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410131178.3A CN117692663A (en) 2024-01-31 2024-01-31 Binary tree partitioning processing method, equipment and storage medium for coding unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410131178.3A CN117692663A (en) 2024-01-31 2024-01-31 Binary tree partitioning processing method, equipment and storage medium for coding unit

Publications (1)

Publication Number Publication Date
CN117692663A true CN117692663A (en) 2024-03-12

Family

ID=90135638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410131178.3A Pending CN117692663A (en) 2024-01-31 2024-01-31 Binary tree partitioning processing method, equipment and storage medium for coding unit

Country Status (1)

Country Link
CN (1) CN117692663A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781588A (en) * 2021-07-01 2021-12-10 杭州未名信科科技有限公司 Intra-frame coding unit size dividing method based on neural network
CN114173120A (en) * 2021-12-03 2022-03-11 北京达佳互联信息技术有限公司 Video coding block division method and video coding block division prediction model training method
CN115695802A (en) * 2022-10-24 2023-02-03 杭州师范大学 Coding unit division method and device for accelerating video coding
CN115941960A (en) * 2022-09-28 2023-04-07 南华大学 Method for skipping CU partition between VVC frames in advance based on lightweight neural network
CN116634183A (en) * 2023-05-19 2023-08-22 上海大学 Fast inter-frame block dividing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113781588A (en) * 2021-07-01 2021-12-10 杭州未名信科科技有限公司 Intra-frame coding unit size dividing method based on neural network
CN114173120A (en) * 2021-12-03 2022-03-11 北京达佳互联信息技术有限公司 Video coding block division method and video coding block division prediction model training method
CN115941960A (en) * 2022-09-28 2023-04-07 南华大学 Method for skipping CU partition between VVC frames in advance based on lightweight neural network
CN115695802A (en) * 2022-10-24 2023-02-03 杭州师范大学 Coding unit division method and device for accelerating video coding
CN116634183A (en) * 2023-05-19 2023-08-22 上海大学 Fast inter-frame block dividing method

Similar Documents

Publication Publication Date Title
US11019355B2 (en) Inter-prediction method and apparatus using reference frame generated based on deep learning
US10841577B2 (en) Method and apparatus for video encoding and video decoding based on neural network
US11825077B2 (en) Image encoding/decoding image method and device, and recording medium storing bit stream
US11166014B2 (en) Image encoding and decoding method and device using prediction network
KR100739714B1 (en) Method and apparatus for intra prediction mode decision
CN117201777A (en) Video encoding/decoding method and apparatus, and recording medium storing bit stream
JP5672302B2 (en) Moving picture decoding apparatus, moving picture decoding method, moving picture encoding apparatus, and moving picture encoding method
US20210136416A1 (en) Method and apparatus for processing images using image transform neural network and image inverse-transforming neural network
Fischer et al. Video coding for machines with feature-based rate-distortion optimization
CN102396230A (en) Image processing apparatus and method
JP2007053561A (en) Device and method for encoding image
CN111837389A (en) Block detection method and device suitable for multi-sign bit hiding
EP3410720A1 (en) Method and device for picture encoding and decoding
WO2022068716A1 (en) Entropy encoding/decoding method and device
CN111770345B (en) Motion estimation method, device and equipment of coding unit and storage medium
US20230297833A1 (en) Method and device for providing compression and transmission of training parameters in distributed processing environment
KR20130101137A (en) Moving image encoding device, moving image decoding device, moving image encoding method and moving image decoding method
CN111448798A (en) Method and apparatus for block shape based video encoding and decoding
WO2010123057A1 (en) Image-processing device and method
US20230269385A1 (en) Systems and methods for improving object tracking in compressed feature data in coding of multi-dimensional data
KR20200098433A (en) Method and apparatus for encoding/decoding image and recording medium for storing bitstream
CN117676171B (en) Three-tree division processing method, equipment and storage medium for coding unit
CN117692663A (en) Binary tree partitioning processing method, equipment and storage medium for coding unit
JP2023544562A (en) Intra prediction method and device
JP2014090326A (en) Moving image encoder, moving image decoder, moving image encoding method and moving image decoding method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination