CN115243042A

CN115243042A - Quantization parameter determination method and related device

Info

Publication number: CN115243042A
Application number: CN202210883402.5A
Authority: CN
Inventors: 张贤国; 王诗淇; 陈易
Original assignee: Tencent Technology Shenzhen Co Ltd; City University of Hong Kong CityU
Current assignee: Tencent Technology Shenzhen Co Ltd; City University of Hong Kong CityU
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-10-25

Abstract

The application relates to the technical field of computers, and provides a quantization parameter determination method and a related device, which are used for improving the accuracy of quantization parameters and improving the coding performance, wherein the method can be applied to a cloud server, and comprises the following steps: obtaining a target characteristic vector corresponding to the frame to be coded and bit distribution information corresponding to the frame to be coded based on the content attribute of the frame to be coded, and determining a corresponding target coding bit based on the bit distribution information and the specified code rate; inputting the target characteristic vector into a trained coefficient prediction model to obtain a target correction coefficient corresponding to a frame to be coded; and obtaining a target quantization parameter based on the target correction coefficient and the target coding bit, and coding the frame to be coded based on the target quantization parameter to obtain a target coding frame.

Description

Quantization parameter determination method and related device

Technical Field

The application relates to the technical field of video coding and provides a quantization parameter determination method and a related device.

Background

The code rate control is an important component of video coding, can perform optimized allocation on the current residual bits according to the state of a channel, and dynamically adjusts quantization parameters according to the current condition of an encoder, thereby providing the optimal video quality under the condition of limited bandwidth.

In the related art, the quantization parameter corresponding to the target frame is usually determined in the following manner: firstly, according to the frame type and the block level of the target frame, determining a target correction coefficient corresponding to the target frame from all candidate correction coefficients, then, according to the target distribution bit and the target correction coefficient distributed to the target frame, calculating to obtain a quantization parameter corresponding to the target frame, then, coding the target frame by adopting the calculated quantization parameter, and updating the candidate correction coefficient according to the coding result.

However, on one hand, since the target correction coefficient is fixed before the start of encoding, it cannot adapt to different images, resulting in an inaccurate calculated quantization parameter, and on the other hand, in the encoding process, although the candidate correction coefficient is updated according to the encoding result, the coefficient update has a certain hysteresis, and cannot be updated in advance when a scene change occurs, resulting in an inaccurate calculated quantization parameter, which further results in a decrease in the overall encoding performance, and the number of bits consumed increases under the same video quality.

Disclosure of Invention

The embodiment of the application provides a quantization parameter determination method and a related device, which are used for improving the accuracy of quantization parameters and improving the coding performance.

In a first aspect, an embodiment of the present application provides a quantization parameter determining method, including:

acquiring a frame to be coded, and acquiring a target characteristic vector corresponding to the frame to be coded and bit distribution information corresponding to the frame to be coded based on the content attribute of the frame to be coded;

determining corresponding target coding bits based on the bit allocation information and the specified code rate;

inputting the target characteristic vector into a trained coefficient prediction model to obtain a target correction coefficient corresponding to the frame to be encoded, wherein the coefficient prediction model is obtained by training based on a training data set, each training data comprises a sample frame and a corresponding reference correction coefficient, and the reference correction coefficient is determined after multi-round encoding is carried out on the sample frame;

and obtaining a target quantization parameter based on the target correction coefficient and the target coding bit, and coding the frame to be coded based on the target quantization parameter to obtain a target coding frame.

In a second aspect, an embodiment of the present application provides a quantization parameter determining apparatus, including:

the device comprises an acquisition unit, a coding unit and a decoding unit, wherein the acquisition unit is used for acquiring a frame to be coded, acquiring a target characteristic vector corresponding to the frame to be coded based on the content attribute of the frame to be coded, and acquiring bit distribution information corresponding to the frame to be coded;

a bit allocation unit for determining a corresponding target coded bit based on the bit allocation information and a specified code rate;

a coefficient determining unit, configured to input the target feature vector into a trained coefficient prediction model to obtain a target correction coefficient corresponding to the frame to be encoded, where the coefficient prediction model is obtained by training based on a training data set, each training data includes a sample frame and a corresponding reference correction coefficient, and the reference correction coefficient is determined after performing multiple rounds of encoding on the sample frame;

and the coding unit is used for obtaining a target quantization parameter based on the target correction coefficient and the target coding bit, and coding the frame to be coded based on the quantization parameter to obtain a target coding frame.

As a possible implementation manner, when determining a target offset from the multiple candidate offsets based on the multiple encoding results, the training unit is specifically configured to:

determining a coding error corresponding to each of the candidate offsets based on the plurality of coding results;

and determining a candidate offset of which the corresponding coding error meets a preset error condition from the plurality of candidate offsets, and taking the determined candidate offset as a target offset.

As a possible implementation manner, when the target feature vector corresponding to the frame to be encoded is obtained based on the content attribute of the frame to be encoded, the training unit is specifically configured to:

dividing the frame to be coded into non-overlapping N blocks according to a preset block size, wherein the value of N is a positive integer;

respectively carrying out intra-frame prediction on the N blocks, and obtaining target intra-frame prediction costs corresponding to the N blocks based on intra-frame prediction results;

respectively carrying out inter-frame prediction on the N blocks, and obtaining target inter-frame prediction costs corresponding to the N blocks based on inter-frame prediction results;

obtaining a plurality of characteristic attributes based on the inter-frame prediction cost and the intra-frame prediction cost corresponding to the N blocks respectively;

and obtaining a target feature vector corresponding to the frame to be coded based on the multiple feature attributes.

As a possible implementation, the training unit is specifically configured to:

dividing the training data set into a training set, a verification set and a test set;

training the coefficient prediction model based on the training set to obtain a first error index;

training the coefficient prediction model based on the verification set to obtain a second error index;

if the second error index is larger than the first error index, performing model parameter adjustment on the coefficient prediction model based on the second error index, and training the adjusted coefficient prediction model based on the verification set until the second error index is not larger than the first error index;

and inputting the test set into the coefficient prediction model to obtain a third error index, and outputting the trained coefficient prediction model when the third error index is smaller than a preset threshold value.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to execute the steps of the quantization parameter determination method.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes a computer program, and when the computer program runs on an electronic device, the computer program is configured to enable the electronic device to execute the steps of the quantization parameter determination method.

In a fifth aspect, the present application provides a computer program product, where the program product includes a computer program, where the computer program is stored in a computer-readable storage medium, and a processor of an electronic device reads the computer program from the computer-readable storage medium and executes the computer program, so that the electronic device executes the steps of the quantization parameter determination method.

In the embodiment of the application, a target characteristic vector corresponding to a frame to be coded is obtained based on the content attribute of the frame to be coded, bit distribution information corresponding to the frame to be coded is obtained, and corresponding target coding bits are determined based on the bit distribution information and the specified code rate; inputting the target characteristic vector into a trained coefficient prediction model to obtain a target correction coefficient corresponding to a frame to be coded; and obtaining a target quantization parameter based on the target correction coefficient and the target coding bit, and coding the frame to be coded based on the quantization parameter to obtain a target coding frame.

Like this, through the machine learning algorithm, solved the problem that correction coefficient lags behind to improve quantization parameter's accuracy, promoted the coding performance, simultaneously, can carry out the self-adaptation to different images, through more accurate correction coefficient, further improved quantization parameter's accuracy, in addition, through carrying out many rounds of coding to the sample and obtaining training data, can improve the accuracy of model label, thereby promote the model prediction accuracy.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a method for determining a quantization parameter provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of determining a target feature vector according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a process for acquiring a training data set provided in an embodiment of the present application;

FIG. 5 is a schematic flow chart illustrating a method for determining a quantization parameter provided in an embodiment of the present application;

fig. 6 is a schematic flowchart of a method for determining an offset of a target quantization parameter provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of a candidate offset provided in an embodiment of the present application;

FIG. 8 is a schematic flow chart of a model training method provided in an embodiment of the present application;

FIG. 9A is a schematic diagram of a pending sequence provided in an embodiment of the present application;

FIG. 9B is a schematic illustration of a hierarchy provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of another process for acquiring a training data set provided in an embodiment of the present application;

FIG. 11 is a schematic view of another process for determining a target feature vector in the embodiment of the present application;

FIG. 12 is a schematic structural diagram of a quantization parameter determination apparatus provided in an embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.

First, terms related to the present application are explained:

code rate control: and controlling the quantization parameter to make the output code rate of the output video equal to a given code rate in the video coding process.

Quantization parameters: an important parameter in the video encoding process affects the number of coded bits and the coding quality.

Code rate: in the video transmission process, the number of bits transmitted in one second, for example, the code rate may be the total number of bits of the video divided by the duration of the video.

Bit-quantization parameter model: according to the statistical law, the relationship between the obtained coding bit number and the quantization parameter can be represented by formula (1), in formula (1), E is a constant, when the FRAME type of the current FRAME is KEY _ FRAME, the value is 2000000, otherwise the value is 1500000, R _tar The target number of bits, q, and c are quantization parameters (correction factors):

frame: an image.

And (3) image group: the set is composed of a plurality of adjacent frames according to the display order.

KEY FRAME (KEY _ FRAME): a frame that is all encoded in intra mode.

AOM-AV1: reference software of the AV1 standard.

The coding performance is as follows: the efficiency of video compression is that under the same quality, the less the bit number is occupied, the higher the performance is; the higher the quality, the higher the performance for the same number of bits. The coding performance can be measured by a bit rate distortion ratio (Bj-ntergaard delta bit, BDBR), the physical meaning of the BDBR is the saving proportion of the code rate under the same quality, and a negative value represents the performance improvement.

Code stream: a string of binary characters representing the compressed video.

Intra-frame prediction: a prediction mode predicts a current block using pixels located above and to the left of the current block.

Inter-frame prediction: in a prediction mode, a block (reference block) most similar to a current block is searched for in a reference frame, and the current block is predicted.

Motion vector: one vector (x, y), x and y representing the number of pixels that the current block is offset in the horizontal and vertical directions, respectively, compared to the reference block.

Cloud technology refers to a hosting technology for unifying series of resources such as hardware, software, and network in a wide area network or a local area network to realize calculation, storage, processing, and sharing of data.

Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of technical network systems require a large amount of computing and storage resources, such as video websites, picture-like websites and more portal websites. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

Cloud computing (cloud computing)) is a computing model that distributes computing tasks across a large pool of computers, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user as being infinitely expandable and available at any time, available on demand, expandable at any time, and paid for on-demand.

As a basic capability provider of cloud computing, a cloud computing resource pool (referred to as a cloud platform for short, and generally referred to as an Infrastructure as a Service (IaaS) platform) is established, and multiple types of virtual resources are deployed in the resource pool and are selected by external clients for use.

According to the logic function division, a Platform as a Service (PaaS) layer can be deployed on an Infrastructure as a Service (IaaS) layer, a Software as a Service (SaaS) layer is deployed on the PaaS layer, and the SaaS layer can be directly deployed on the IaaS layer. PaaS is a platform on which software runs, such as a database, a web container, etc. SaaS is a variety of business software, such as web portal, sms group sender, etc. Generally speaking, saaS and PaaS are upper layers relative to IaaS.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross subject, and relates to multi-domain subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning. The scheme provided by the embodiment of the application relates to the machine learning technology of artificial intelligence, a coefficient prediction model is constructed through the machine learning technology, and then the coefficient prediction model is adopted to predict the correction coefficient.

The code rate control is an important component of video coding, and can perform optimized allocation on the current residual bits according to the state of a channel and dynamically adjust quantization parameters according to the current condition of an encoder, thereby providing optimal video quality under the condition of limited bandwidth.

The rate control in AV1 usually includes the stages of pre-analysis, bit allocation, quantization parameter calculation, encoding, etc., where the quantization parameter calculation stage is usually implemented as follows:

firstly, according to the frame type and the block level of the target frame, determining a target correction coefficient corresponding to the target frame from all candidate correction coefficients, then, according to the target distribution bit and the target correction coefficient distributed for the target frame, adopting the formula (1) to calculate the quantization parameter corresponding to the target frame, and then, adopting the calculated quantization parameter to encode the target frame. After encoding, the candidate correction coefficients are updated according to the encoding result.

However, on one hand, since the correction coefficients are fixed before the start of encoding, adaptation to different images cannot be achieved, which results in an inaccurate calculated quantization parameter, and on the other hand, in the encoding process, although the candidate correction coefficients are updated according to the encoding result, the coefficient update has a certain hysteresis, and cannot be updated in advance when a scene change occurs, which results in an inaccurate calculated quantization parameter, which further results in a decrease in overall encoding performance, and an increase in the number of bits consumed under the same video quality.

In the embodiment of the application, a target feature vector corresponding to a frame to be coded is obtained based on the content attribute of the frame to be coded, bit distribution information corresponding to the frame to be coded is obtained, and a corresponding target coding bit is determined based on the bit distribution information and the specified code rate; inputting the target characteristic vector into a trained coefficient prediction model to obtain a target correction coefficient corresponding to a frame to be coded; the target quantization parameter is obtained based on the target correction coefficient and the target coding bit, and the frame to be coded is coded based on the quantization parameter to obtain the target coding frame.

Fig. 1 is a schematic diagram of an application scenario provided in the embodiment of the present application. The application scenario includes at least a terminal device 110 and a server 120. The number of the terminal devices 110 may be one or more, and the number of the servers 120 may also be one or more, and the number of the terminal devices 110 and the number of the servers 120 are not particularly limited in this application.

In this embodiment of the application, the terminal device 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, an internet of things device, a smart home appliance, a vehicle-mounted terminal, and the like, but is not limited thereto.

The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, content Delivery Network (CDN), big data, and an artificial intelligence platform. The terminal device 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

Illustratively, a client corresponding to a video-related application is installed in the terminal device 110, and the video-related application includes, but is not limited to, a conference application, a video application, a live application, and the like.

It should be noted that the quantization parameter determination method mentioned in the present application may be applied to the terminal device 110, may also be applied to the server 120, and may also be executed by both the terminal device 110 and the server 120. Only taking the application to the server 120 as an example, the server 120 obtains a frame to be encoded, where the frame to be encoded may be any one frame of a video to be encoded, and the video to be encoded may be a video such as a live broadcast, a conference, a tv show, and the like, but is not limited thereto, obtains a target feature vector corresponding to the frame to be encoded and bit allocation information corresponding to the frame to be encoded, determines corresponding target encoding bits based on the bit allocation information and a specified code rate, then inputs the target feature vector into a trained coefficient prediction model to obtain a target correction coefficient corresponding to the frame to be encoded, obtains a target quantization parameter based on the target correction coefficient and the target encoding bits, then encodes the frame to be encoded based on the quantization parameter to obtain a target encoding frame, and then sends the target frame to the terminal device 110.

Referring to fig. 2, it is a schematic flowchart of a method for determining a quantization parameter provided in this embodiment of the application, where the method is applied to an electronic device, and the electronic device may be a terminal device or a server, and a specific flow of the method is as follows:

s201, obtaining a frame to be coded, obtaining a target characteristic vector corresponding to the frame to be coded based on the content attribute of the frame to be coded, and obtaining bit distribution information corresponding to the frame to be coded.

In the embodiment of the present application, the frame to be encoded may be any frame in a video to be encoded.

Specifically, referring to fig. 3, based on the content attribute of the frame to be encoded, the target feature vector corresponding to the frame to be encoded is obtained, and the following steps may be adopted, but are not limited to:

s301, dividing a frame to be coded into non-overlapping N blocks according to a preset block size, wherein the value of N is a positive integer.

A frame to be coded may be divided into one or more maximum coding units, where the maximum coding unit is usually 128 × 128, and in an actual application process, each maximum coding unit may also be subjected to depth division to obtain one or more coding units. Wherein the size of the coding unit is a preset block size.

The size of the coding unit exists but is not limited to the following: 4 × 4, 4 × 8, 8 × 4, 8 × 8, 8 × 16, 16 × 8, 16 × 16, 16 × 32, 32 × 16, 32 × 32, 32 × 64, 64 × 32, 64 × 128, 128 × 64, 128 × 128, 4 × 16, 16 × 4, 8 × 32, 32 × 8, 16 × 64, and 64 × 16. For example, the preset block size is 16 × 16, the size of the frame to be encoded is 128 × 128, and the frame to be encoded is divided into 64 blocks, each of which is 16 × 16.

S302, intra-frame prediction is respectively carried out on the N blocks, and target intra-frame prediction costs corresponding to the N blocks are obtained based on intra-frame prediction results.

Taking the block x as an example, the block x is any one of N blocks, specifically, the block x is predicted for each intra-frame prediction mode to obtain an intra-frame prediction result, intra-frame prediction costs corresponding to each intra-frame prediction mode are calculated according to the intra-frame prediction result, then, the intra-frame prediction mode corresponding to the intra-frame prediction cost meeting a preset cost condition in the calculated intra-frame prediction costs is used as a target intra-frame prediction mode of the block x, and the intra-frame prediction cost corresponding to the target intra-frame prediction mode is used as the target intra-frame prediction cost corresponding to the block x. The preset cost condition may be, but is not limited to, that the cost value is minimum. The intra-prediction cost is used to characterize the difference between the original value of block x and the intra-prediction result, which may also be referred to as a predicted pixel value.

In the embodiment of the present application, the encoding strategy may include an intra prediction mode and an inter prediction mode. Taking AV1 as an example, the intra prediction modes include a directional prediction mode and a non-directional prediction mode.

The directional prediction mode may specifically include: angle prediction mode 1 (e.g., V _ PRED mode for prediction in the vertical direction), angle prediction mode 2 (e.g., H _ PRED for prediction in the horizontal direction), angle prediction mode 3 (e.g., D45_ PRED for prediction in the 45 degree angle direction), angle prediction mode 4 (e.g., D135_ PRED for prediction in the 135 degree angle direction), angle prediction mode 5 (e.g., D113_ PRED for prediction in the 113 degree angle direction), angle prediction mode 6 (e.g., D157_ PRED mode for prediction in the 157 degree angle direction), angle prediction mode 7 (e.g., D203_ PRED mode for prediction in the 203 degree angle direction), and angle prediction mode 8 (e.g., D67_ PRED mode for prediction in the 67 degree angle direction). Wherein each angle comprises 6 angle offsets which are respectively plus or minus 3 degrees, plus or minus 6 degrees and plus or minus 9 degrees.

The non-directional prediction modes may include a plurality of modes, such as an intra prediction mode 1 (e.g., DC _ PRED mode, applicable to a large area flat region, prediction based on an average value of left and/or upper reference pixels), an intra prediction mode 2 (e.g., SMOOTH _ PRED mode, for prediction using secondary interpolation in horizontal and vertical directions), an intra prediction mode 3 (e.g., SMOOTH _ V _ PRED mode, for prediction using secondary interpolation in vertical directions), an intra prediction mode 4 (e.g., SMOOTH _ H _ PRED mode, for prediction using secondary interpolation in horizontal directions), and an intra prediction mode 5 (pass _ PRED, for prediction in a gradient minimum direction). In addition, the intra prediction mode may further include a palette prediction mode and an intra block copy prediction mode.

It should be noted that, in the intra prediction process, the video encoder provided in the embodiment of the present application performs further upgrade on the granularity of directional prediction, and the non-directional prediction incorporates gradient and correlation, and the uniformity of luminance and chrominance signals are also fully utilized.

Illustratively, for any intra-prediction mode, the intra-prediction cost can be calculated by using the following formula (2):

intra_error＝∑ _i (intra_ori _i -intra_pred _i ) Formula (2)

Wherein intra _ error represents intra prediction cost, intra _ ori _i Representing the original value of the ith pixel in block x, intra _ pred _i And the predicted value of the ith pixel point in the block x obtained by adopting an intra-frame prediction mode is represented. Herein, the original value represents an original pixel value, and the predicted value represents a predicted pixel value.

Taking the intra-frame prediction mode adopting the angle prediction mode 1 as an example, for a pixel P in a block x, according to a prediction angle in the vertical direction, the position of a reference pixel is determined from a pixel row which is located above the block x and is already coded, then, the value of the reference pixel is used as a prediction value of the pixel P, and then, an intra-frame prediction cost of the pixel P is obtained based on an element value of the pixel P and the prediction value of the pixel P, so that, for the angle prediction mode 1, an intra-frame prediction cost corresponding to each pixel in the block x can be obtained, and then, an intra-frame prediction cost corresponding to the block x is obtained based on the intra-frame prediction cost corresponding to each pixel.

Supposing that the intra-frame prediction modes are 10, respectively predicting the block x for the 10 intra-frame prediction modes to obtain corresponding intra-frame prediction results, and calculating the intra-frame prediction cost corresponding to each of the 10 intra-frame prediction modes according to the intra-frame prediction results as follows: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, the intra-prediction cost 0.1 is taken as the target intra-prediction cost corresponding to the block x.

And S303, performing inter-frame prediction on the N blocks respectively, and obtaining target inter-frame prediction costs corresponding to the N blocks respectively based on inter-frame prediction results.

The inter-frame prediction mainly utilizes the correlation of a video time domain, and uses pixels in other adjacent coded images to predict pixels in a current image so as to achieve the aim of effectively removing video time domain redundancy and effectively save bits of coding residual data.

The inter prediction mode may include a single reference frame mode and a combined reference frame mode. The single reference frame mode may include a motion estimation mode and a non-motion estimation mode, among others.

The Motion estimation mode may be a NEWMV mode, which requires transmission of a residual Difference (MVD) between a unit to be encoded and a prediction unit during encoding.

The non-motion estimation modes may include a non-motion estimation mode 1 (e.g., a nearsestmv mode), a non-motion estimation mode 2 (e.g., a NEARMV mode), and a non-motion estimation mode 3 (e.g., a GLOBALMV mode). Motion Vectors (MVs) of prediction blocks in the near etmv mode and the near evm mode are derived according to surrounding block information, and transmission of a residual error is not required; the motion vectors of the prediction blocks in the GLOBALMV mode need to be derived from the global motion.

The combined reference frame modes may include a combined reference frame mode 1 (e.g., NEAREST _ NEARESTMV mode), a combined reference frame mode 2 (e.g., NEAR _ NEARMV mode), a combined reference frame mode 3 (e.g., NEAREST _ NEWMV mode), a combined reference frame mode 4 (e.g., NEW _ NEARESTMV mode), a combined reference frame mode 5 (e.g., NEAR _ NEWMV mode), a combined reference frame mode 6 (e.g., NEW _ NEARMV mode), a combined reference frame mode 7 (e.g., GLOBAL _ GLOBALMV mode), and a combined reference frame mode 8 (e.g., NEW _ NEWMV mode).

In this embodiment of the present application, the inter-frame prediction cost is used to represent a difference between an original value of the block x and an inter-frame prediction result, where the target inter-frame prediction cost includes one or more of a first target inter-frame prediction cost or a second target inter-frame prediction cost.

Specifically, the first target inter-frame prediction cost may be obtained by:

and taking LAST _ FRAME of the FRAME to be coded as a target reference FRAME, and performing inter-FRAME prediction on the block x, wherein the LAST _ FRAME represents the FRAME which is closest to the FRAME to be coded in the video to be coded, and the FRAME number of the LAST _ FRAME is smaller than that of the reference FRAME of the FRAME to be coded. Specifically, for each inter-frame prediction mode, the block x is predicted respectively to obtain a corresponding inter-frame prediction result, first inter-frame prediction costs corresponding to each inter-frame prediction mode are calculated according to the inter-frame prediction results, then, the inter-frame prediction mode corresponding to the first inter-frame prediction cost with the smallest value in the calculated first inter-frame prediction costs is used as a first target inter-frame prediction mode of the block x, and the first inter-frame prediction cost corresponding to the first target inter-frame prediction mode is used as a first target inter-frame prediction cost corresponding to the block x.

It should be noted that the first inter-FRAME prediction cost is used to represent a difference between an original value of the block x and an inter-FRAME prediction result when the block x is inter-predicted by using LAST _ FRAME of a FRAME to be encoded as a reference FRAME.

After the first target inter-frame prediction mode corresponding to the block x is determined, the motion vector mv corresponding to the block x is recorded _last ，mv _last Indicates the motion vector corresponding to the block x in the first target inter prediction mode, i.e., the number of pixels of the block x in the first target inter prediction mode that are shifted in the horizontal and vertical directions from the reference frame.

For example, for each inter prediction mode, the first inter prediction cost may be calculated by the following formula (3):

coded_error＝∑ _i (coded_ori _i -coded_pred _i ) Formula (3)

Wherein coded _ error represents a first inter prediction cost, coded _ ori _i Representing the original value, coded _ pred, of the ith pixel in block x _i The prediction value of the ith pixel point in the block x obtained in the inter-FRAME prediction mode under the condition that LAST _ FRAME is used as the reference FRAME is shown. Specifically, the second target inter-frame prediction cost may be obtained by:

taking GOLDEN _ FRAME of a FRAME to be coded as a target reference FRAME, performing inter-FRAME prediction on a block x, wherein GOLDEN _ FRAME represents that the FRAME number is smaller than an I FRAME or a Generalized B FRAME (GPB) corresponding to the FRAME to be coded, specifically, predicting the block x respectively aiming at each inter-FRAME prediction mode, calculating second inter-FRAME prediction costs respectively corresponding to each inter-FRAME prediction mode according to an inter-FRAME prediction result, then, taking the inter-FRAME prediction mode corresponding to the second inter-FRAME prediction cost with the smallest value in the calculated second inter-FRAME prediction costs as a second target inter-FRAME prediction mode of the block x, and taking the second inter-FRAME prediction cost corresponding to the second target inter-FRAME prediction mode as the second target inter-FRAME prediction cost corresponding to the block x.

It should be noted that the second inter-FRAME prediction cost is used to represent a difference value between an original value of the block x and an inter-FRAME prediction result when the block x is inter-predicted by using GOLDEN _ FRAME of the FRAME to be encoded as a reference FRAME.

After determining the second target inter-frame prediction mode corresponding to the block x, recording the motion vector mv corresponding to the block x _golden ，mv _golden The motion vector corresponding to the block x in the second target inter prediction mode, i.e., the number of pixels of the block x in the first target inter prediction mode that are shifted in the horizontal direction and the vertical direction from the reference frame, is represented.

Illustratively, for each inter prediction mode, the second inter prediction cost may be calculated by the following formula (4):

sr_coded_error＝∑ _i (sr_coded_ori _i -sr_coded_pred _i ) Formula (4)

Wherein sr _ coded _ error represents a second inter prediction cost, sr _ coded _ ori _i Is the original value of the ith pixel point in the block x, sr _ coded _ pred _i And the predicted value of the ith pixel point in the block x is obtained in an inter-FRAME prediction mode under the condition that GOLDEN _ FRAME is used as a reference FRAME.

It should be noted that in the implementation of the present application, there are 7 reference frames in each of the 4 single-reference frame prediction modes: LAST _ FRAME, LAST2_ FRAME, LAST3_ FRAME, GOLDEN _ FRAME, BWDREF _ FRAME, ALTREF2_ FRAME, and ALTREF _ FRAME, respectively. Wherein LAST2_ FRAME represents the second closest FRAME to be encoded and the FRAME number is smaller than the reference FRAME (forward reference) of the FRAME to be encoded, LAST3_ FRAME represents the third closest FRAME to be encoded and the FRAME number is smaller than the reference FRAME (forward reference) of the FRAME to be encoded, BWDREF _ FRAME represents the closest FRAME to be encoded and the FRAME number is greater than the reference FRAME (backward reference) of the FRAME to be encoded, ALTREF2_ FRAME represents the second closest FRAME to be encoded and the FRAME number is greater than the reference FRAME (backward reference) of the FRAME to be encoded, ALTREF _ FRAME represents the third closest FRAME to be encoded and the FRAME number is greater than the reference FRAME (backward reference) of the FRAME to be encoded.

There are 16 reference FRAME combinations in each of the 8 combined reference FRAME prediction modes, respectively { LAST _ FRAME, ALTREF _ FRAME }, { LAST2_ FRAME, ALTREF _ FRAME }, { LAST3_ FRAME, ALTREF _ FRAME }, { GOLDEN _ FRAME, ALTREF _ FRAME }, { LAST _ FRAME, BWDREF _ FRAME }, { LAST2_ FRAME, BWDREF _ FRAME }, { LAST3_ FRAME, BWDREF _ FRAME }, { GOLDEN _ FRAME, BWDREF _ FRAME }, { LAST _ FRAME, ALTREF2_ FRAME }, { LAST2_ FRAME, ALTREF2_ FRAME }, { LAST3_ FRAME, ALTREF2_ FRAME }, { GOLDEN _ FRAME, ALTREF2_ FRAME }, { LAST _ FRAME, ALTREF2_ FRAME }, { LAST _ FRAME, LAST3_ FRAME }, { LAST _ FRAME, ldgoen _ FRAME }, { BWDREF _ FRAME, ALTREF _ FRAME }.

In the above, LAST _ FRAME and GOLDEN _ FRAME are used as the target reference FRAMEs for illustration, and in practical applications, any one of the reference FRAMEs or the reference FRAME combination may be used as the first target inter-FRAME prediction cost or the second target inter-FRAME prediction cost.

S304, obtaining a plurality of characteristic attributes based on the target inter-frame prediction cost and the target intra-frame prediction cost corresponding to the N blocks respectively.

Taking the vector dimension of the target feature vector corresponding to the frame to be encoded as 12 as an example, correspondingly, based on the target inter-frame prediction cost and the target intra-frame prediction cost corresponding to each of the N blocks, 12 feature attributes are obtained.

The target feature vector is expressed by the feature vector X, and as shown in Table 1, the physical meanings of 12 feature attributes X [0] to X [11] are as follows:

TABLE 1 characteristic attributes

It should be noted that, in the embodiment of the present application, when initializing the feature vector X, the value of each feature vector is set to a first numerical value, for example, 0.

The following describes the calculation procedures of the 12 feature attributes X [0] to X [11 ].

Wherein X0 is N.

X < 1 > can be calculated using equation (5):

wherein, intra _ error _i Representing the target intra prediction cost for the ith block of the N blocks.

X2 can be calculated by using the formula (6):

wherein, intra _ error _i Represents a target intra prediction cost, coded _ error, for the ith block of the N blocks _i Represents a first target inter prediction cost for an ith block among the N blocks, and min () is a minimum function.

X3 can be calculated by equation (7):

wherein, intra _ error _i Represents a target intra prediction cost, sr _ coded _ error, for the ith block of the N blocks _i Representing in N blocksA second target inter prediction cost, min () for the ith block is a minimum function.

X4 can be calculated by using the formula (8):

x4/num 1/X1 formula (8)

Where num1 represents the number of blocks satisfying condition 1 or condition 2 among the N blocks, where condition 1 is intra _ error > coded _ error, and condition 2 is intra _ error > sr _ coded _ error.

X < 5 > can be calculated using equation (9):

x < 5 > = num2/X < 1 > formula (9)

Where num3 represents the number of blocks satisfying condition 3 or condition 4 among the N blocks, where condition 3 is intra _ error>coded _ error and | | mv _last ||>0, condition 4 is intra _ error>sr _ coded _ error and | | | mv _golden ||>0。

X < 6 > can be calculated using equation (10):

x < 6 > = num3/X < 1 > formula (10)

Where num3 denotes the number of blocks satisfying condition 5 among the N blocks, where condition 5 is intra _ error > sr _ coded _ error and coded _ error > sr _ coded _ error.

X < 7 > can be calculated using equation (11):

x7/num 4/X1 equation (11)

Where num4 denotes the number of blocks satisfying condition 6 among the N blocks, where condition 6 is intra _ error =0.

In the embodiment of the application, a set s of motion vectors which are initially empty is established, and when the cost of inter-frame prediction is less than that of intra-frame prediction, the motion vectors of inter-frame prediction are stored. Specifically, mv corresponding to the block that will satisfy condition 7 _last Adding the mv of the block meeting the condition 8 into the set s for the block not meeting the condition 7 _golden And adding into the set s.

X8 can be calculated using equation (12):

wherein, mv _i,x Is the value of x in the ith motion vector (x, y) in s, and M is the number of the motion vectors in s.

X < 9 > can be calculated using equation (13):

wherein, mv _i,y Is the value of y in the ith motion vector (x, y) in s, and M is the number of the motion vectors in s.

X10 can be calculated using equation (14):

X < 11 > can be calculated using equation (15):

S305, obtaining a target feature vector corresponding to the frame to be coded based on the multiple feature attributes.

Specifically, the target feature vector is represented by a feature vector X, and as shown in Table 1, the target feature vector can be represented as { X [0], X [1], X [2], X [3], X [4], X [5], X [6], X [7], X [8], X [9], X [10], and X [11 }.

It should be noted that, in the embodiment of the present application, S302 may be executed first, and then S303 is executed, or S303 may be executed first, and then S302 is executed, which is not limited herein.

S202, based on the bit distribution information and the specified code rate, corresponding target coding bits are determined.

In the embodiment of the present application, the bit allocation information refers to relevant variables for bit allocation, and in AV1, the bit allocation includes bit allocation of a key frame image group, bit allocation of a key frame, bit allocation of an image group, and bit allocation of each frame in the image group. The embodiment of the application mainly relates to bit allocation of each frame in a group of pictures, specifically, bit allocation information comprises a frame rate, and target coding bits allocated to frames to be coded are determined based on the bit allocation information and a specified code rate. The frame rate refers to the number of frames transmitted in one second.

S203, inputting the target characteristic vector into the trained coefficient prediction model to obtain a target correction coefficient corresponding to the frame to be coded, wherein the coefficient prediction model is obtained by training based on a training data set, each training data comprises a sample frame and a corresponding reference correction coefficient, and the reference correction coefficient is determined after multi-round coding is carried out on one sample frame.

And S204, obtaining a target quantization parameter based on the target correction coefficient and the target coding bit, and coding the frame to be coded based on the target quantization parameter to obtain a target coding frame.

Specifically, the target quantization parameter may be calculated by using formula (1), that is, the target quantization parameter = target correction coefficient × (E/target coding bit).

And in the process of coding the frame to be coded based on the target quantization parameter to obtain the target coding frame, firstly quantizing the frame to be coded based on the target quantization parameter to obtain quantized fragment data, and then coding the quantized fragment data to obtain the target coding frame.

The principle of Quantization (Quantization) is to divide a transformed matrix by a constant, which may be called a Quantization parameter, where the Quantization parameter is used to indicate the Quantization refinement degree in the Quantization processing stage, and when the QP value is large, the coefficient indicating a larger value range is quantized into the same output, which generally brings about larger distortion and lower code rate; conversely, when the value of QP is small, the coefficient representing a small value range will be quantized to the same output, and therefore, usually, the distortion will be small, and at the same time, the code rate will be high.

Coding (Coding) is performed on the resulting quantized fragment data after quantization processing is performed on a frame to be coded using a target quantization parameter. The encoding process may be performed by, but not limited to, entropy encoding (Entropy encoding) or statistical encoding.

In the implementation of the method, the problem of lag of correction coefficients is solved through a machine learning algorithm, so that the accuracy of quantization parameters is improved, the coding performance is improved, meanwhile, self-adaption can be carried out on different images, the accuracy of the quantization parameters is further improved through more accurate correction coefficients, in addition, training data are obtained through multi-round coding of samples, the accuracy of model labels can be improved, and the model prediction accuracy is improved.

Next, a model training process according to the present application is described, which includes a sample acquisition phase and a model training phase.

In the sample collection stage, in order to obtain a model label with better performance, in the implementation of the method, for a sample frame, a target quantization parameter offset corresponding to the sample frame can be determined through multiple times of coding, and then a corresponding reference correction coefficient is determined according to the target quantization parameter offset.

Specifically, referring to fig. 4, a specific flow of a process for acquiring a training data set provided in the embodiment of the present application is as follows:

s401, a sample sequence including the sample frames is acquired, and based on the positions of the sample frames in a Group of Pictures (GOP), the corresponding layers of the sample frames are determined according to the preset corresponding relationship between the positions and the layers.

In this embodiment of the application, the sample sequence may be any one sequence in the standard test set, for example, 6 sequences are selected from the standard test set, where the 6 sequences are: foodMarket4, catrobot, basketallDrive, partyScene, BQSquad and KristenndSara, any one of the 6 sequences can be used as a sample sequence. Thus, each selected sample sequence in each sequence selected from the standard test set can be used as a sample sequence, and is processed through S401-S404, so as to obtain a training data set.

The position of each sample frame in the group of pictures GOP can be understood as the display order of each sample frame in the GOP. The positions of the sample frames in the GOP can be represented by sequence numbers, and the GOP is usually composed of 17 frames, so the sequence numbers are 0 to 16.

In the embodiment of the present application, only 6 hierarchies will be described as an example, and the 6 hierarchies are hierarchy 0, hierarchy 1, hierarchy 2, hierarchy 3, hierarchy 4, and hierarchy 5, respectively. Level 0 may be referred to as level 0, level 1 may be referred to as level 1, level 2 may be referred to as level 2, level 3 may be referred to as level 3, level 4 may be referred to as level 4, and level 5 may be referred to as level 5.

For example, referring to fig. 5, the frames included in the GOP are IBBBBBBBBBBBBBBBB in sequence, the sequence number of the frame I is 0, the frame I is located at the 0 th layer, the 16 th layer is located at the 1 st layer, the 8 th layer is located at the 2 nd layer, the 4 th and 12 th layers are located at the 3 rd layer, the 2 nd, 6 th, 10 th and 14 th layers are located at the 4 th layer, and the 1 st, 3 rd, 5 th, 7 th, 9 th, 11 th, 13 th and 15 th layers are located at the 5 th layer.

S402, based on the initial quantization parameter preset aiming at the sample sequence and the preset initial quantization parameter offset corresponding to each layer, multi-round coding is carried out on various frames, and the target quantization parameter offset corresponding to each layer is obtained.

In the embodiment of the present application, for each layer, a sample frame may be encoded according to multiple candidate offsets, and a quantization parameter offset with the best encoding performance is determined, specifically, as shown in fig. 6, when S402 is executed, for each layer, the following steps are executed:

s601, determining multiple candidate offsets corresponding to the level L based on the initial quantization parameter offset corresponding to the level L, and a preset offset step length and offset times.

Here, the level L may be any one of level 0, level 1, level 2, level 3, level 4, and level 5. The initial quantization parameter offset corresponding to each layer may be the same or different, and is not limited thereto, and the description will be given only by taking the example that the initial quantization parameter offset corresponding to each layer is the same.

Assuming that the preset offset step is 1, the preset offset frequency is 10 times, the initial quantization parameter offset corresponding to the level L is-5, and 11 candidate offsets corresponding to the level L are determined, where the 11 candidate offsets are respectively: -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5. Hereinafter, Δ q is used _L The candidate offset amounts corresponding to the levels L are expressed, and the candidate offset amounts corresponding to the levels L may be expressed as Δ q0, Δ q1, Δ q2, Δ q3, Δ q4, and Δ q5.

And S602, respectively coding at least one sample frame corresponding to the layer L based on an initial quantization parameter preset for the sample sequence and multiple candidate offsets to obtain multiple coding results.

Specifically, when S602 is executed, the following steps may be adopted:

and S6021, determining an initial quantization parameter corresponding to each at least one sample frame based on an initial quantization parameter preset for the sample sequence.

An initial quantization parameter preset for the sample sequence is referred to as a quantization parameter Q, and an initial quantization parameter corresponding to the sample frame is referred to as a quantization parameter Q. Illustratively, the quantization parameter Q has a value of 128.

And calculating the quantization parameter Q corresponding to each sample frame by using the AOM-AV1 mode based on the quantization parameter Q.

And S6022, respectively coding at least one sample frame based on the multiple candidate offsets and the initial quantization parameters corresponding to the at least one sample frame to obtain multiple coding results.

In particular, for sample frame y, based on a plurality of candidate offsets, and based on the sample framey, the quantization parameter q corresponding to y can be determined as q + Δ q when the sample frame y is coded _L And further based on the quantization parameter q +. DELTA.q _L The sample frame y is encoded. The sample frame y is any one of at least one sample frame corresponding to the level L.

For example, referring to fig. 7, 11 candidate offsets include: -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, at encoding time 1, Δ q _L Is-5, the sample frame y is coded according to the quantization parameter q-5 to obtain a coding result 1, and during the 2 nd coding, the delta q is obtained _L Is-4, the sample frame y is coded according to the quantization parameter q-4 to obtain a coding result 2, and when coding for the 3 rd time, the delta q is obtained _L Is-3, the sample frame y is coded according to the quantization parameter q-3 to obtain a coding result 3, and when coding for the 4 th time, the delta q _L The value of (a) is-2, the sample frame y is encoded according to the quantization parameter q-2 to obtain an encoding result 4, and similarly, the sample frame y is encoded 11 times to obtain respective encoding results corresponding to the 11 times of encoding.

Through the implementation mode, the sequence can be coded 11 times according to the values of the candidate offsets-5 to 5, and when the target offset is determined according to the coding result, the accuracy of the target offset can be improved, so that the target offset is the offset with the best performance. In addition, in the embodiment of the present application, in order to improve the data processing efficiency, the offset of the sample frame included in another layer may be fixed, and the offset of the current layer may be changed.

And S603, determining a target offset from the multiple candidate offsets based on the multiple coding results, and taking the target offset as a target quantization parameter offset corresponding to the level L.

Specifically, when S603 is executed, the coding error corresponding to each of the multiple candidate offsets may be determined based on multiple coding results, and a candidate offset corresponding to a coding error meeting a preset error condition may be determined from the multiple candidate offsets, and the determined candidate offset is used as the target offset.

Wherein, the preset error condition may be any one of the following conditions:

condition a: and taking the smallest coding error from all the coding errors.

For example, the encoding errors corresponding to the 11 candidate offsets are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9, respectively, and a candidate offset having the smallest encoding error is determined from the 11 candidate offsets, and if the candidate offset corresponding to the encoding error of 0.1 is 5, the candidate offset of 5 is set as the target offset.

Condition B: and (4) taking the coding error which is closest to the average value from all the coding errors.

For example, the coding errors corresponding to the 11 types of candidate offsets are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9, respectively, the average value of the coding errors is 0.7, the coding error having the value closest to the average value is 0.7, the candidate offset having the value closest to the average value is determined from the 11 types of candidate offsets, and if the candidate offset corresponding to the coding error of 0.7 is 4, the candidate offset 4 is set as the target offset.

Through the implementation mode, the target offset is expressed by having the best coding performance, and when the correction coefficient of the sample frame is further determined, the determined sample correction coefficient is also the correction coefficient under the best coding performance, so that the correction coefficient output by the model can ensure that the frame to be coded has better coding performance to a certain extent, and the model prediction accuracy is improved.

In the embodiment of the present application, in the process of processing layer L, a plurality of times of encoding may be performed on the respective frames corresponding to the layers other than layer L, and when encoding is performed on the respective frames corresponding to the layers other than layer L, the quantization parameter of the respective frames corresponding to the layers other than layer L is q +. DELTA.q _{Others are} 。

And S403, determining sample quantization parameters of corresponding sample frames based on initial quantization parameters preset for the sample sequence and target quantization parameter offsets corresponding to the respective layers.

Specifically, based on the quantization parameter Q, the method of AOM-AV1 is used to calculate the quantization parameter Q corresponding to each sample frame, and then, the target quantization parameter offset corresponding to each layer is used as the target quantization parameter offset of the sample frame corresponding to the layer, and based on the quantization parameter Q corresponding to each sample frame and the target quantization parameter offset corresponding to each sample frame, the sample quantization parameter corresponding to each sample frame is determined.

Taking the sample frame y as an example, the level corresponding to the sample frame y is level L, and the target quantization parameter offset Δ q corresponding to the level L is calculated _L As the target quantization parameter offset corresponding to the sample frame y, and then, the quantization parameter q corresponding to the sample frame y and the target quantization parameter offset Δ q corresponding to the sample frame y _L And determining a sample quantization parameter corresponding to the sample frame y as q + delta qL.

S404, obtaining reference correction coefficients corresponding to the sample frames based on the sample quantization parameters corresponding to the sample frames, and obtaining a training data set based on the sample frames and the corresponding reference correction coefficients.

Still taking the sample frame y as an example, specifically, when S404 is executed, the sample frame y is encoded based on the sample quantization parameter corresponding to the sample frame y, the sample quantization parameter and the encoding bit number R corresponding to the sample frame y are recorded, then the reference correction coefficient c corresponding to the sample frame y is calculated according to the formula (16), and then a set of training data is obtained based on the sample frame y and the corresponding reference correction coefficient c.

Where E is a constant, for example, when the FRAME type of the sample FRAME y is KEY _ FRAME, the value is 2000000, otherwise the value is 1500000.

It should be noted that, in the embodiment of the present application, the feature acquisition process of the sample frame y is the same as that in S301 to S305, which is not described herein again, and a training data (X, c) is formed based on the feature vector corresponding to the sample frame y and the reference correction coefficient c corresponding to the sample frame y.

The model training phase of the present application is described by taking a support vector regression algorithm as an example. It should be noted that, in the embodiment of the present application, other machine learning or deep learning methods, such as random forest, may be used instead of the prediction of the correction coefficient.

The purpose of model training is to train with a given training sample D = { (X) ₁ ,c ₁ ),(X ₂ ,c ₂ )…(X _n ,c _n ) Finding a relation f between input and output such that the predicted value f (X) from the feature is as close as possible to c, using the principle of support vector regression, finding f such that it is in a high-dimensional feature space

By introducing lagrange multipliers and converting to the dual-solving problem, and then introducing a kernel function, the solution f of the support vector regression is expressed as:

wherein w and b are parameters to be determined, k () represents a kernel function, kernel represents a kernel function type, C represents a penalty factor, and epsilon represents a termination condition.

Referring to fig. 8, in the embodiment of the present application, the model training process is as follows:

s801, dividing the training data set into a training set, a verification set and a test set.

It should be noted that, in the embodiment of the present application, the training data set may be divided into a training set, a verification set, and a test set according to a set proportion, or the training data set may be divided into the training set, the verification set, and the test set by adopting a random division manner.

For example, in a random division manner, 60% of the training data in the training data set is divided into the training set, 20% of the training data is divided into the validation set, and the remaining 20% of the training data is divided into the test set.

S802, training the coefficient prediction model based on the training set to obtain a first error index.

In the embodiment of the present application, when initializing the coefficient prediction model, the parameters of the coefficient prediction model are respectively: the kernel function adopts a radial basis kernel function (RBF), a penalty factor C =3, and a termination condition E =2e ^-3 The nuclear coefficient γ of RBF =0.1.

The coefficient prediction model has the input of a feature vector formed by a subset of a set formed by feature quantities and feature quantity mathematical operation results, and the output of the coefficient prediction model is a predicted model parameter. The first error indicator may employ, but is not limited to, mean Absolute Error (MAE).

And S803, training the coefficient prediction model based on the verification set to obtain a second error index.

In the embodiment of the present application, the types of the second error indicator, the first error indicator, and the following third error indicator may be the same, for example, the second error indicator, the first error indicator, and the third error indicator are all MAE.

S804, if the second error index is larger than the first error index, model parameters of the coefficient prediction model are adjusted based on the second error index, and the adjusted coefficient prediction model is trained based on the verification set until the second error index is not larger than the first error index.

When the model parameters of the coefficient prediction model are adjusted, the adjusted parameters are C, epsilon and gamma.

And S805, inputting the test set into the coefficient prediction model to obtain a third error index, and outputting the trained coefficient prediction model when the third error index is smaller than a preset threshold value.

Finally, on the training set and validation set, the MAE was 0.0511 and the correlation between the predicted values and the label values was 0.9192. On the test set, the MAE was 0.1184 and the correlation between the predicted value and the tag value was 0.7359.

In the embodiment of the present application, only the AVI coding standard is taken as an example for description, in an actual application process, the quantization parameter determination method may be applied to other coding standards, and when the quantization parameter determination method is applied to other coding standards, specifically, a rate control process of the other coding standards may be replaced with the rate control process proposed in the present application.

On AOM-AV1, referring to table 2, compared with the prior art, in the present application, the encoding performance is improved in different videos, and the average improvement is 2.01%, where the training sequences indicate video sequences for machine learning model training, and the test sequences indicate video sequences for testing.

TABLE 2 test results

The present application will be described with reference to specific examples.

Referring to fig. 9A, the sequence to be processed includes a plurality of frames, and referring to fig. 9B, only 17 frames are taken as an example, where the layer 0 includes a frame 1, the layer 1 includes a frame 16, the layer 2 includes a frame 8, the layer 3 includes a frame 4 and a frame 12, the layer 4 includes a frame 2, a frame 6, a frame 10, and a frame 14, the layer 5 includes a frame 1, a frame 3, a frame 5, a frame 7, a frame 9, a frame 11, a frame 13, and a frame 15, offsets Δ Q0 to Δ Q5 of each layer are initialized to-5, and the optimal offsets Δ qb0 to Δ qb5 of each layer are all-5, it should be noted that the optimal offsets may be understood as the above quantization parameter offsets, the current layer i =0 is set, and the quantization parameter Q =128 used for encoding the current sequence is set.

Referring to fig. 10, in the acquisition process of the training data set, for each sequence to be processed, the following operations are performed:

s1001, acquiring a sequence to be processed;

and S1002, judging whether i is equal to 5, if so, executing S1013, otherwise, executing S1003.

And S1003, judging whether the delta qi is equal to 5, if so, executing S1012, otherwise, executing S1004.

S1004, determining whether all frames included in the sequence to be processed are encoded, if so, executing S1009, otherwise, executing S1005.

S1005, reading a frame as a current frame.

And S1006, calculating a quantization parameter Q corresponding to the current frame according to the given Q.

And S1007, if the current frame belongs to the level i, setting the quantization parameter of the current frame as delta q plus delta qi, otherwise, setting the quantization parameter of the current frame as delta q plus delta qbj, wherein j is the level of the current frame.

S1008 encodes the current frame based on the quantization parameter q corresponding to the current frame calculated in S1007, and returns to S1004.

And S1009, if the performance of the current layer i when using the delta qi is better than that of the current layer i when using the delta qbi, executing S1010, otherwise executing S1011.

And S1010, setting the value of the delta qbi as delta qi.

S1011、△qi＝△qi+1。

S1012、i＝i+1。

S1013, coding the sequence to be processed according to the delta qb0 to delta qb5 obtained in the coding process;

s1014, recording the quantization parameter q and the coding bit number R of each frame;

s1015, based on the quantization parameter q and the number of coded bits R of each frame, calculates a check parameter c of each frame.

And S1016, forming training data (X, c) based on the feature vector X of each frame and the check parameter c of each frame.

Referring to fig. 11, the feature vector X of each frame in the video to be processed can be obtained by the following steps:

s1101, reading the video to be processed.

And S1102, judging whether all frames contained in the video to be processed are subjected to pre-analysis, if so, ending, and otherwise, executing S1003.

S1103, reading a frame which is not subjected to pre-analysis from the current video, and initializing a frame-level feature vector X, wherein the value of each feature attribute in the feature vector X is initially 0;

s1104, dividing the current frame into non-overlapping blocks, each block having a width and a height of 16, assigning the number of the blocks to a characteristic attribute X [0], and establishing an initial empty motion vector set S for storing the motion vectors predicted between frames when the cost of the inter-frame prediction is less than the cost of the intra-frame prediction.

S1105, pre-analyzing each block contained in the current frame respectively to obtain the feature vector X corresponding to the current frame. See S302 to S305 for details, which are not described herein.

Based on the same inventive concept, the embodiment of the application provides a quantization parameter determination device. As shown in fig. 12, which is a schematic structural diagram of a quantization parameter determining apparatus 1200, the apparatus may include:

an obtaining unit 1201, configured to obtain a frame to be encoded, obtain a target feature vector corresponding to the frame to be encoded based on a content attribute of the frame to be encoded, and obtain bit allocation information corresponding to the frame to be encoded;

a bit allocation unit 1202 for determining a corresponding target coded bit based on the bit allocation information and a specified code rate;

a coefficient determining unit 1203, configured to input the target feature vector into a trained coefficient prediction model, so as to obtain a target correction coefficient corresponding to the frame to be encoded, where the coefficient prediction model is obtained by training based on a training data set, each piece of training data includes a sample frame and a corresponding reference correction coefficient, and the reference correction coefficient is determined after performing multiple rounds of encoding on the sample frame;

an encoding unit 1204, configured to obtain a target quantization parameter based on the target correction coefficient and the target encoding bit, and encode the frame to be encoded based on the target quantization parameter to obtain a target encoded frame.

As a possible implementation manner, the method further includes a training unit 1205, where the training unit 1205 is configured to:

acquiring a sample sequence containing each sample frame, and determining the corresponding layers of each sample frame according to the preset corresponding relation between each position and each layer based on the position of each sample frame in a GOP (group of pictures);

performing multi-round coding on each sample frame based on an initial quantization parameter preset for the sample sequence and a preset initial quantization parameter offset corresponding to each layer to obtain a target quantization parameter offset corresponding to each layer;

determining sample quantization parameters of corresponding sample frames based on the initial quantization parameters preset for the sample sequence and based on target quantization parameter offsets corresponding to the respective layers;

and obtaining reference correction coefficients corresponding to the sample frames based on the sample quantization parameters corresponding to the sample frames, and obtaining the training data set based on the sample frames and the corresponding reference correction coefficients.

As a possible implementation manner, when performing multiple rounds of encoding on the sample frames based on the initial quantization parameter preset for the sample sequence and based on the preset initial quantization parameter offset corresponding to each layer, to obtain the target quantization parameter offset corresponding to each layer, the training unit 1205 is specifically configured to:

for each of the layers, respectively performing the following operations:

determining multiple candidate offsets corresponding to one layer based on the initial quantization parameter offset corresponding to the one layer, and a preset offset step length and offset times;

respectively coding at least one sample frame corresponding to the one layer based on an initial quantization parameter preset for the sample sequence and the multiple candidate offsets to obtain multiple coding results;

and determining a target offset from the multiple candidate offsets based on the multiple coding results, and taking the target offset as a target quantization parameter offset corresponding to the one hierarchy.

As a possible implementation manner, when the at least one sample frame corresponding to the one layer is encoded based on the initial quantization parameter preset for the sample sequence and the multiple candidate offsets, and multiple encoding results are obtained, the training unit 1205 is specifically configured to:

determining an initial quantization parameter corresponding to each of the at least one sample frame based on an initial quantization parameter preset for the sample sequence;

and respectively coding the at least one sample frame based on the multiple candidate offsets and the initial quantization parameters corresponding to the at least one sample frame to obtain multiple coding results.

As a possible implementation manner, when determining a target offset from the multiple candidate offsets based on the multiple coding results, the training unit 1205 is specifically configured to:

As a possible implementation manner, when determining a target offset from the multiple candidate offsets based on the multiple encoding results, the training unit 1205 is specifically configured to:

As a possible implementation manner, when obtaining the target feature vector corresponding to the frame to be encoded based on the content attribute of the frame to be encoded, the training unit 1205 is specifically configured to:

As a possible implementation manner, the training unit 1205 is specifically configured to:

For convenience of description, the above parts are described separately as modules (or units) according to functions. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.

With regard to the apparatus in the above-described embodiment, the specific manner in which each unit executes the request has been described in detail in the embodiment related to the method, and will not be elaborated here.

In the embodiment of the application, a target characteristic vector corresponding to a frame to be coded is obtained based on the content attribute of the frame to be coded, bit distribution information corresponding to the frame to be coded is obtained, and corresponding target coding bits are determined based on the bit distribution information and the specified code rate; inputting the target characteristic vector into a trained coefficient prediction model to obtain a target correction coefficient corresponding to a frame to be coded; the target quantization parameter is obtained based on the target correction coefficient and the target coding bit, and the frame to be coded is coded based on the quantization parameter to obtain the target coding frame.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Based on the same inventive concept, the embodiment of the application also provides the electronic equipment. In one embodiment, the electronic device may be a server or a terminal device. Referring to fig. 13, which is a schematic structural diagram of a possible electronic device provided in an embodiment of the present application, in fig. 13, an electronic device 1300 includes: a processor 1310 and a memory 1320.

The memory 1320 stores a computer program executable by the processor 1310, and the processor 1310 may execute the steps of the quantization parameter determination method by executing the instructions stored in the memory 1320.

The memory 1320 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the Memory 1320 may also be a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD); or memory 1320 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. Memory 1320 may also be a combination of the above.

Processor 1310 may include one or more Central Processing Units (CPUs), or be a digital processing unit, or the like. A processor 1310 for implementing the quantization parameter determination method described above when executing a computer program stored in the memory 1320.

In some embodiments, processor 1310 and memory 1320 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The specific connection medium between the processor 1310 and the memory 1320 is not limited in the embodiment of the present application. In the embodiment of the present application, the processor 1310 and the memory 1320 are connected by a bus, the bus is depicted by a thick line in fig. 13, and the connection manner between other components is merely illustrative and is not limited thereto. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of description, only one thick line is depicted in fig. 13, but only one bus or one type of bus is not depicted.

Based on the same inventive concept, embodiments of the present application provide a computer-readable storage medium, which includes a computer program for causing an electronic device to perform the steps of the quantization parameter determination method described above when the computer program runs on the electronic device. In some possible embodiments, the various aspects of the quantization parameter determination method provided in the present application may also be implemented in the form of a program product including a computer program for causing an electronic device to perform the steps of the quantization parameter determination method described above when the program product is run on the electronic device, for example, the electronic device may perform the steps as shown in fig. 2.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable Disk, a hard Disk, a RAM, a ROM, an erasable programmable Read-Only Memory (EPROM or flash Memory), an optical fiber, a portable Compact Disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of the embodiments of the present application may be a CD-ROM and include a computer program, and may be run on an electronic device. However, the program product of the present application is not so limited, and in this document, a readable storage medium may be any tangible medium that can contain, or store a computer program for use by or in connection with a command execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with a readable computer program embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a computer program for use by or in connection with a command execution system, apparatus, or device.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A quantization parameter determination method, characterized in that the method comprises:

acquiring a frame to be coded, and acquiring a target feature vector corresponding to the frame to be coded and bit distribution information corresponding to the frame to be coded based on the content attribute of the frame to be coded;

determining a corresponding target encoding bit based on the bit allocation information and a specified code rate;

inputting the target characteristic vector into a trained coefficient prediction model to obtain a target correction coefficient corresponding to the frame to be coded, wherein the coefficient prediction model is obtained by training based on a training data set, each training data comprises a sample frame and a corresponding reference correction coefficient, and the reference correction coefficient is determined after multi-round coding is carried out on the sample frame;

2. The method of claim 1, wherein the training data set is obtained by:

acquiring a sample sequence containing various sample frames, and determining respective corresponding layers of the various sample frames according to preset corresponding relations between various positions and the layers based on the positions of the various sample frames in a group of pictures (GOP);

performing multi-round coding on each sample frame based on an initial quantization parameter preset for the sample sequence and based on a preset initial quantization parameter offset corresponding to each layer to obtain a target quantization parameter offset corresponding to each layer;

3. The method of claim 2, wherein the performing multiple rounds of encoding on the sample frames based on the preset initial quantization parameter for the sample sequence and based on the preset initial quantization parameter offset corresponding to each of the layers to obtain the target quantization parameter offset corresponding to each of the layers comprises:

for each of the layers, respectively performing the following operations:

and determining a target offset from the multiple candidate offsets based on the multiple coding results, and taking the target offset as a target quantization parameter offset corresponding to the one layer.

4. The method of claim 3, wherein the encoding at least one sample frame corresponding to the one layer based on the initial quantization parameter preset for the sample sequence and the candidate offsets respectively to obtain a plurality of encoding results comprises:

5. The method of claim 3, wherein determining a target offset from the plurality of candidate offsets based on the plurality of coding results comprises:

6. The method according to any one of claims 1 to 5, wherein the obtaining the target feature vector corresponding to the frame to be encoded based on the content attribute of the frame to be encoded comprises:

7. The method of any one of claims 1-5, wherein the coefficient prediction model is trained by:

8. A quantization parameter determination apparatus, comprising:

a coefficient determining unit, configured to input the target feature vector into a trained coefficient prediction model to obtain a target correction coefficient corresponding to the frame to be encoded, where the coefficient prediction model is obtained by training based on a training data set, each piece of training data includes a sample frame and a corresponding reference correction coefficient, and the reference correction coefficient is determined after performing multiple rounds of encoding on the sample frame;

and the coding unit is used for obtaining a target quantization parameter based on the target correction coefficient and the target coding bit, and coding the frame to be coded based on the target quantization parameter to obtain a target coding frame.

9. The apparatus of claim 8, further comprising a training unit to:

10. The apparatus according to claim 9, wherein the training unit is specifically configured to, when performing multiple rounds of encoding on the sample frames based on an initial quantization parameter preset for the sample sequence and based on a preset initial quantization parameter offset corresponding to each of the respective layers to obtain a target quantization parameter offset corresponding to each of the respective layers:

for each of the layers, respectively performing the following operations:

11. The apparatus according to claim 10, wherein, when the at least one sample frame corresponding to the one layer is encoded based on an initial quantization parameter preset for the sample sequence and the candidate offsets, respectively, and a plurality of encoding results are obtained, the training unit is specifically configured to:

12. The apparatus according to claim 10, wherein, when determining the target offset from the plurality of candidate offsets based on the plurality of coding results, the training unit is specifically configured to:

13. An electronic device, characterized in that it comprises a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 7.

14. A computer-readable storage medium, characterized in that it comprises a computer program for causing an electronic device to carry out the steps of the method according to any one of claims 1 to 7, when said computer program is run on said electronic device.

15. A computer program product, characterized in that it comprises a computer program, which is stored in a computer-readable storage medium, from which a processor of an electronic device reads and executes the computer program, causing the electronic device to perform the steps of the method of any one of claims 1 to 7.