WO2023134523A1

WO2023134523A1 - Content adaptive video coding method and apparatus, device and storage medium

Info

Publication number: WO2023134523A1
Application number: PCT/CN2023/070555
Authority: WO
Inventors: 刘芳; 袁子逸; 洪旭东; 崔同兵
Original assignee: 百果园技术(新加坡)有限公司; 刘芳
Priority date: 2022-01-14
Filing date: 2023-01-04
Publication date: 2023-07-20
Also published as: CN114554211A

Abstract

Embodiments of the present application provide a content adaptive video coding method and apparatus, a device and a storage medium. The method comprises: obtaining video data to be coded, and dividing the video data into a plurality of image sets containing continuous frame images; determining coding features of the image sets, and inputting the coding features and set video picture evaluation parameters into a pre-trained machine learning model to output code rate control parameters; and coding the image sets according to the coding features and the code rate control parameters. The present solution improvs the video coding efficiency, and is suitable for a real-time video scene.

Description

Content Adaptive Video Coding Method, Device, Equipment and Storage Medium

This application claims priority to a Chinese patent application with application number 202210043241.9 filed with the China Patent Office on January 14, 2022, the entire contents of which are incorporated herein by reference.

technical field

The embodiments of the present application relate to the technical field of video processing, and in particular, to a content-adaptive video coding method, device, device, and storage medium.

Background technique

With the rapid development of mobile Internet technology, video has become the mainstream medium used by users, and live video, on-demand, short video and video chat have become part of people's lives. However, compared with text and pictures, the amount of video data is huge, and video transmission and storage are also facing huge challenges. Video codec technology is to achieve as high a compression ratio as possible and as high as possible within the available computing resources. The quality of video reconstruction can meet the requirements of storage capacity and bandwidth. Early video service providers usually use a pre-determined general encoding configuration to process almost all video content, so there may be insufficient bit rate for high-motion videos, resulting in low encoding quality, and for low-speed motion videos There may be a bit rate waste problem. Content-adaptive encoding sets different encoding configurations for different videos according to the video content, and finds the minimum bit rate for each video or video segment that meets the requirements of clarity and subjective sensitivity, so as to save bandwidth.

When performing video encoding, the encoded data is extracted as features by pre-encoding the training video data, and combined with the corresponding constant bit rate coefficient values, the machine learning model is trained. In the production environment, by using this model to predict encoding parameters based on video features, and then use the predicted values for encoding, a balance between encoding bit rate and encoding quality can be achieved to improve the viewing experience of most viewers. However, this encoding method extracts features by encoding the entire video, and then uses a machine learning model to predict the encoding constant bit rate coefficient value of the entire video. For long videos containing complex and mixed content, this method will cause video The encoding quality of complex parts is poor, and the code rate of simple parts is wasted. At the same time, in the encoding process, first encode the entire video to extract features and predict the constant bit rate coefficient value, and then encode according to the predicted value will consume a lot of time, which is not suitable for live broadcast scenarios.

Contents of the invention

Embodiments of the present application provide a content-adaptive video coding method, device, device, and storage medium, which solves the problem of unsatisfactory video coding effects in complex scenes in the related art, improves video coding efficiency, and is applicable to Live video scene.

In the first aspect, the embodiment of the present application provides a content-adaptive video coding method, the method including:

Obtaining video data to be encoded, and dividing the video data into a plurality of image sets comprising continuous frame images;

Determining the coding features of the image collection, inputting the coding features and the set video picture evaluation parameters to the pre-trained machine learning model to output code rate control parameters;

Encoding the set of images according to the encoding features and the rate control parameters.

In the second aspect, the embodiment of the present application also provides a content-adaptive video coding device, including:

An image set determination module configured to acquire video data to be encoded, and divide the video data into multiple image sets comprising continuous frame images;

The code rate parameter determination module is configured to determine the encoding characteristics of the image set, and input the encoding characteristics and the set video picture evaluation parameters to the pre-trained machine learning model to output the code rate control parameters;

An encoding module configured to encode the set of images according to the encoding feature and the rate control parameter.

In the third aspect, the embodiment of the present application also provides a content-adaptive video coding device, which includes:

one or more processors;

a storage device configured to store one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the content-adaptive video coding method described in the embodiment of this application.

In a fourth aspect, the embodiment of the present application further provides a storage medium storing computer-executable instructions, the computer-executable instructions are configured to execute the content-adaptive video coding described in the embodiment of the present application when executed by a computer processor method.

In a fifth aspect, the embodiment of the present application further provides a computer program product, including a computer program. When the computer program is executed, the steps of the content-adaptive video coding method described above can be realized.

In the embodiment of the present application, by obtaining the video data to be encoded, the video data is divided into multiple image sets containing continuous frame images, the encoding features of the image sets are determined, and the encoding features and the set video picture evaluation parameters are input into the pre-training The machine learning model of the machine learning model outputs bit rate control parameters, and encodes the image set according to the encoding characteristics and bit rate control parameters, which solves the problem of unsatisfactory encoding effect of video encoding in complex scenes in related technologies, improves video encoding efficiency, and at the same time Suitable for real-time video scenarios.

Description of drawings

FIG. 1 is a flowchart of a content-adaptive video coding method provided by an embodiment of the present application;

FIG. 2 is a flow chart of a method for performing secondary encoding based on a primary encoding result provided in an embodiment of the present application;

FIG. 3 is a flow chart of another content-adaptive video coding method provided by an embodiment of the present application;

FIG. 4 is a flow chart of another content-adaptive video coding method provided by an embodiment of the present application;

FIG. 5 is a structural block diagram of a content-adaptive video coding device provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a content-adaptive video coding device provided by an embodiment of the present application.

Detailed ways

The embodiments of the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It can be understood that the specific embodiments described here are only used to explain the embodiments of the present application, but not to limit the embodiments of the present application. In addition, it should be noted that, for the convenience of description, only a part but not all structures related to the embodiment of the present application are shown in the drawings.

The terms "first", "second" and the like in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application can be practiced in sequences other than those illustrated or described herein, and that references to "first," "second," etc. distinguish Objects are generally of one type, and the number of objects is not limited. For example, there may be one or more first objects. In addition, "and/or" in the specification and claims means at least one of the connected objects, and the character "/" generally means that the related objects are an "or" relationship.

Fig. 1 is a flow chart of a content adaptive video coding method provided by the embodiment of the present application, which can be applied to video data coding, and the method can be used by computing devices such as notebooks, desktops, smart phones, servers and tablet computers, etc. Execute, specifically include the following steps:

Step S101. Acquire video data to be encoded, and divide the video data into multiple image sets including continuous frame images.

Wherein, the video data to be encoded includes recorded video data and real-time generated video data that needs to be transmitted and displayed, such as live video data.

In one embodiment, when encoding video data, for a piece of video data, the video data is first divided into multiple image sets including continuous frame images. That is, when video coding is performed, separate video coding is performed for each subdivided image set. Exemplarily, the video data may be divided into multiple consecutive GOPs (Group of pictures, a group of pictures), and each GOP represents a group of continuous pictures in a coded video stream. For example, each GOP contains 15 or 20 frames of images, that is, the video data to be encoded is divided into multiple consecutive image sets, and each image set contains 15 to 20 frames of images, that is, the encoding of video data is performed using GOP as the encoding unit. coding.

Step S102: Determine the coding features of the image set, and input the coding features and the set video picture evaluation parameters into the pre-trained machine learning model to output code rate control parameters.

In an embodiment, the manner of determining the encoding feature of the image set may be achieved by using pre-encoding to obtain the encoding feature of the image set. For example, an encoder is used to encode an image set to obtain corresponding encoding features.

In one embodiment, the encoding features of the image set are obtained by performing feature extraction and analysis on each frame of image in the image set. Optionally, the encoding features include motion vector features, distortion degree parameters, complexity parameters, etc. used to describe each frame of images in the image set. Among them, the motion vector feature is used to characterize the degree of change of the image, and the more severe the changes between the frames of images, the larger the motion vector is. On the contrary, if each frame of image describes a still picture, the motion vector is relatively smaller; the distortion degree parameter It is used to represent the degree of distortion of the image. The greater the degree of image distortion, the higher the value of this parameter. On the contrary, if the degree of image distortion is low, the corresponding value of this parameter is relatively low. The complexity parameter is used to characterize the complexity of the image, such as An image contains many different objects, and the greater the pixel difference between each object, the higher the complexity. Optionally, the identification and determination of the above-mentioned encoding features can be realized through existing encoder modules and image processing algorithms.

Wherein, the video picture evaluation parameter is a comprehensive evaluation index used to characterize the image quality. Optionally, the video frame evaluation parameters may be represented by VMAF (Video Multimethod Assessment Fusion, video multimethod assessment fusion). Among them, VMAF is an objective evaluation index proposed by Netflix that combines human visual modeling and machine learning. VMAF uses a large amount of subjective data as a training set, and integrates algorithms of different evaluation dimensions by means of machine learning, which is currently a relatively mainstream objective evaluation index. Generally, it can be considered that the higher the VMAF score, the better the video quality. However, from the perspective of human perception, when the VMAF score of the same video is increased to a certain threshold, the human eye cannot perceive the improvement in image quality. Therefore, different video designs can be used for different videos. The VMAF value is used to save the encoding bit rate without changing the subjective quality of the video.

In one embodiment, the determined encoding features of the image set and the set video picture evaluation parameters are input to the pre-trained machine learning model to output the code rate control parameters, wherein the set video picture evaluation parameters can be based on different picture quality Customized settings can be made according to requirements, different playback devices, etc., and the set values can also be adjusted. The input machine learning model is a pre-trained neural network model, which can output corresponding bit rate control parameters based on the encoding characteristics of the image set and the set video picture evaluation parameter input. Optionally, the rate control parameter may be CRF (Constant Rate Factor, constant rate factor) or CQF (Constant Quality Factor, constant quality factor). CRF is a kind of bit rate control. The smaller the CRF value, the higher the video quality, and the larger the file size. The larger the CRF value, the higher the video compression rate, but the lower the video quality. Optionally, different CRF values correspond to different code rates. A mapping table may be used to record different CRF values and corresponding code rates, or a function curve may be used to characterize the relationship between CRF and code rates.

Step S103, encoding the image set according to the encoding feature and the rate control parameter.

In one embodiment, after the code rate control parameters are obtained through the machine learning model, the image set is finally re-encoded based on the code rate control parameters and the coding features determined in step S101 to output code stream data.

In an embodiment, FIG. 2 is a flow chart of a method for performing secondary encoding based on the primary encoding result provided in the embodiment of the present application, as shown in FIG. 2 , specifically including:

Step S1031. Determine frame type information and scene information according to the encoding feature.

Wherein, the encoding feature records the frame type of each frame, such as different frame types divided by I frame, P frame and B frame. Wherein, different frame type information requires encoding and compression with different qualities due to their different reference relations. Among them, the I frame represents a key frame, which is a frame that is completely reserved. When decoding, only the frame data is needed to complete image decoding without referring to other frame images; Difference, when decoding, you need to superimpose the difference defined in this frame with the previously cached picture to generate the final picture; B frame means a two-way difference frame, that is, the B frame records the difference between the current frame and the preceding and following frames when decoding the B frame image, Not only to obtain the previous cached picture, but also the decoded picture, and obtain the final picture by superimposing the front and back pictures with the current frame data.

Wherein, the scene information can be divided into, for example, a moving scene and a static scene. It can be determined from encoded features by an integrated scene discrimination module. Among them, the encoding feature records image features related to motion displacement changes such as the motion vector and motion compensation of each frame of the image, and the scene information of the image is determined by analyzing the data such as the motion vector and motion compensation.

Step S1032, perform prediction analysis according to the frame type information, the scene information and the code rate control parameters to obtain coding parameters.

Wherein, the encoding parameter takes HEVC (High Efficiency Video Coding, high-efficiency video coding) as an example, which corresponds to a quantization parameter QP (quantization parameter, quantization parameter). Wherein, the quantization parameter QP is the sequence number of the quantization step Qstep. For luma coding, the quantization step Qstep has 52 values in total, and the value of QP is 0-51. For chroma coding, the value of QP is 0-39.

The encoding parameter takes the quantization parameter QP as an example, which reflects the compression of spatial details. The smaller the encoding parameter value, the finer the quantization, the higher the image quality, and the longer the generated code stream; if the quantization parameter QP value is small, most of the details in the image will be preserved, and when the quantization parameter QP value increases, Some details in the image are correspondingly lost, and the bit rate is reduced. Taking the above-mentioned QP values of 0 to 51 as an example, when the QP takes the minimum value of 0, it means the quantization is the finest; on the contrary, when the QP takes the maximum value of 51, it means the quantization is the roughest. Quantization is to reduce the length of image coding and reduce unnecessary information in visual restoration without reducing the visual effect.

In one embodiment, the process of obtaining encoding parameters by predictive analysis based on frame type information, scene information, and rate control parameters, taking HEVC high-efficiency video encoding as an example, can be implemented using its integrated encoder module. That is, different frame type information (I frame, B frame, P frame), scene information (static scene, dynamic scene), and code rate control parameters (CRF) are jointly determined to obtain the final coding parameters (frame-level QP). Exemplarily, the frame type is a key frame, the scene information is a dynamic scene, and the higher the value of the rate control parameter is, the lower the determined frame-level QP value is obtained.

Step S1033, encode the image set based on the encoding parameters.

In one embodiment, after obtaining the coding parameters, HEVC high-efficiency video coding is performed by taking the frame-level QP parameters in HEVC high-efficiency video coding as an example, so as to realize code stream output.

In another embodiment, in order to improve the accuracy of secondary encoding, the above process of performing predictive analysis to obtain encoding parameters, and encoding the set of images based on the encoding parameters includes: performing predictive analysis to obtain the first encoding parameters; Coding parameters, coding feedback information, cache information, frame type information, and scene information determine the second coding parameter; adjust the quantization offset parameter according to the first coding parameter; adjust the quantization offset parameter according to the second coding parameter and the adjusted quantization offset parameter A collection of images is encoded to output codestream data. Among them, taking HEVC encoding as an example, the first encoding parameter can be understood as the base QP information (base QP), which determines the frame-level QP according to the first encoding parameter, encoding feedback information, cache information, frame type information, and scene information information. Wherein, the cache information represents the parameters of the buffer memory in the process of video encoding, and the larger the cache occupation, the larger the corresponding QP value, so as to reduce the calculation amount and storage amount of video encoding. The encoding feedback information can be obtained during the pre-encoding process or the information fed back after encoding the previous round of this image set or video, such as the degree of distortion. If the degree of distortion is higher, the corresponding QP value needs to be reduced to improve the encoding quality. . While determining the second encoding parameter according to the first encoding parameter, the quantization offset parameter is further adjusted according to the first encoding parameter. Taking HEVC video coding as an example, the quantization offset parameter can be represented by the intensity of the cutree, which indicates the quantization offset adjustment performed according to the degree to which the current block is referenced. In one embodiment, if the current block is referenced, it is further determined whether a certain number of blocks after the current block refer to the current block, and if more are referenced by subsequent image blocks, it indicates that the current block belongs to a slowly changing scene , then correspondingly lower the QP value to improve the image quality. Finally, the set of images is comprehensively encoded by using the determined second encoding parameters and the determined quantization offset parameters to output code stream data, so as to ensure the optimal balance of encoding effect between image quality and compression rate.

It can be seen from the above scheme that when performing video encoding, the video is first divided into image sets, and after encoding them once to obtain the encoding features, the trained machine learning model is used to output accurate bit rate control parameters, and then based on the bit rate control parameters And the encoding features obtained in the first encoding process are used to encode the image set twice to finally obtain the video encoding result. This method uses the video live content adaptive encoding technology of twice encoding and machine learning, and uses HEVC's multiple encoding and The predictive model of machine learning dynamically adjusts the encoding configuration according to the complexity of the video content, realizes content adaptive encoding, and better balances video fluency and clarity. It can be applied to real-time live video scenes, and the video encoding effect is good.

Fig. 3 is a flow chart of another content-adaptive video coding method provided by the embodiment of the present application, which provides a method for determining the coding characteristics of an image set, as shown in Fig. 3 , specifically including:

Step S201. Acquire video data to be encoded, and divide the video data into multiple image sets including continuous frame images.

Step S202. Obtain a preset number of frame images in the image set, encode the preset number of frame images to obtain encoding features, and determine the encoding features as the encoding features of the image set.

In one embodiment, taking the image set as a GOP image as an example, the preset number of frame images may be miniGOP images in a GOP image, that is, taking a GOP image with an image set of 15 frames as an example, the preset number of frame images The frame images may be 5 frame images among them. Wherein, the process of encoding the preset number of frame images may be pre-encoding by an encoder to obtain encoding features. Then determine the coding features of the preset number of frame images as the coding features of the image set.

Step S203 , input the coding features and the set evaluation parameters of the video picture into the pre-trained machine learning model to output the code rate control parameters.

Step S204, encoding the image set according to the encoding feature and the rate control parameter.

It can be seen from the above scheme that in the process of video encoding, the video live content adaptive encoding technology of two encodings and machine learning is adopted, and the encoding configuration is dynamically adjusted according to the complexity of the video content. Among them, by obtaining the preset number of frames in the image set Image, encode a preset number of frame images to obtain encoding features, and determine the encoding features as the encoding features of the image set, which can significantly increase the encoding speed, and the effect of video encoding for real-time requirements is outstanding, while reducing the amount of data calculation , realize content adaptive coding, and better balance video fluency and clarity. It can be applied to real-time live video scenes, and the video coding effect is good.

Fig. 4 is a flow chart of another content-adaptive video coding method provided by an embodiment of the present application, which shows a method for outputting bit rate control parameters through a machine learning model in an embodiment, wherein the machine learning model includes The joint model formed by the first training model and the second training model, as shown in Figure 4, specifically includes:

Step S301. Acquire video data to be encoded, and divide the video data into multiple image sets including continuous frame images.

Step S302: Determine the encoding features of the image set, input the encoding features and the set video frame evaluation parameters into the first training model and the second training model respectively, and obtain the first training model output by the first training model A code rate control parameter, and a second code rate control parameter output by the second training model.

In one embodiment, the first training model is an XGBoost model, and the second training model is a LightGBM model, both of which are decision tree-based machine learning algorithms. Exemplarily, the first rate control parameter output by the first training model is denoted as CRF1, and the second rate control parameter output by the second training model is denoted as CRF2.

Step S303, performing weighted average calculation on the first rate control parameter and the second rate control parameter to obtain a rate control parameter.

Wherein, the finally calculated code rate control parameter is denoted as CRF3, optionally, it is calculated by the formula CRF3=λ ₁ *CRF1+λ ₂ *CRF2. Wherein, λ ₁ +λ ₂ =1, λ ₁ ∈[0,1], λ ₂ ∈[0,1].

Step S304, encode the image set according to the encoding feature and the rate control parameter.

It can be seen from the above that when outputting the code rate control parameters through the machine learning model, two different decision tree-based models are used to output the corresponding code rate control parameters and then weighted and averaged to obtain the final code rate control parameters, so that the obtained code rate The accuracy of the control parameters is higher, and the final video encoding effect is better.

In one embodiment, before inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model respectively, it also includes: acquiring video sample data of different scene types and corresponding to different resolutions; The data is divided into training set samples, test set samples and verification set samples, which are respectively input to the first training model and the second training model for training. In the process of model training, this solution first distinguishes the scene types of video images, such as distinguishing them into dynamic scenes and static scenes, and at the same time performs training based on different resolution video images as sample data. During the training process, the video sample data Divide into training set samples, test set samples and validation set samples to get the final training model with good prediction effect.

Fig. 5 is a structural block diagram of a content-adaptive video coding device provided in an embodiment of the present application. The device is configured to execute the content-adaptive video coding method provided in the above embodiment, and has corresponding functional modules and beneficial effects for executing the method. As shown in Figure 5, the device specifically includes: an image set determination module 101, a code rate parameter determination module 102 and an encoding module 103, wherein,

The image set determination module 101 is configured to acquire video data to be encoded, and divide the video data into a plurality of image sets comprising continuous frame images;

The code rate parameter determination module 102 is configured to determine the encoding characteristics of the image set, and input the encoding characteristics and the set video picture evaluation parameters to the pre-trained machine learning model to output the code rate control parameters;

The coding module 103 is configured to code the image set according to the coding feature and the code rate control parameter.

It can be seen from the above scheme that when performing video encoding, the video is first divided into image sets, and after encoding them once to obtain the encoding features, the trained machine learning model is used to output accurate bit rate control parameters, and then based on the bit rate control parameters And the encoding features obtained in the first encoding process are used to encode the image set twice to finally obtain the video encoding result. This method uses the video live content adaptive encoding technology of twice encoding and machine learning, and uses HEVC's multiple encoding and The predictive model of machine learning dynamically adjusts the encoding configuration according to the complexity of the video content, realizes content adaptive encoding, and better balances video fluency and clarity. It can be configured as a real-time live video scene, and the video encoding effect is good .

In a possible embodiment, the code rate parameter determination module 102 is specifically configured as:

Acquiring a preset number of frame images in the image collection;

Encoding the preset number of frame images to obtain encoding features, and determining the encoding features as the encoding features of the image set.

In a possible embodiment, the machine learning model includes a joint model composed of a first training model and a second training model, and the code rate parameter determination module 102 is specifically configured as:

Input the encoding features and the set video picture evaluation parameters into the first training model and the second training model respectively to obtain the first bit rate control parameters output by the first training model, and the second training model The second rate control parameter output by the model;

A code rate control parameter is obtained by performing weighted average calculation on the first code rate control parameter and the second code rate control parameter.

In a possible embodiment, the code rate parameter determining module 102 is further configured to:

Before inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model respectively, obtain video sample data of different scene types and corresponding to different resolutions;

The video sample data is divided into training set samples, test set samples and verification set samples, and are respectively input to the first training model and the second training model for training.

In a possible embodiment, the encoding module 103 is specifically configured as:

determining frame type information and scene information according to the encoding feature;

performing prediction analysis according to the frame type information, the scene information and the rate control parameters to obtain encoding parameters;

The set of images is encoded based on the encoding parameters.

performing predictive analysis to obtain a first encoding parameter;

Determine a second encoding parameter based on the first encoding parameter, encoding feedback information, cache information, the frame type information, and the scene information.

adjusting a quantization offset parameter according to the first coding parameter;

Encode the image set according to the second encoding parameter and the adjusted quantization offset parameter, so as to output code stream data.

FIG. 6 is a schematic structural diagram of a content-adaptive video coding device provided in an embodiment of the present application. As shown in FIG. 6, the device includes a processor 201, a memory 202, an input device 203, and an output device 204; The number of can be one or more, take a processor 201 as an example in Fig. 6; The processor 201, memory 202, input device 203 and output device 204 in the equipment can be connected by bus or other ways, in Fig. 6 by Take the bus connection as an example. As a computer-readable storage medium, the memory 202 can be configured to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the content-adaptive video coding method in the embodiment of the present application. The processor 201 executes various functional applications and data processing of the device by running the software programs, instructions and modules stored in the memory 202, that is, implements the above-mentioned content adaptive video coding method. The input device 203 can be configured to receive input numbers or character information, and generate key signal input related to user settings and function control of the device. The output device 204 may include a display device such as a display screen.

The embodiment of the present application also provides a storage medium containing computer-executable instructions, and the computer-executable instructions are configured to execute a content-adaptive video coding method described in the above-mentioned embodiments when executed by a computer processor, specifically including:

It should be noted that, in the above-mentioned embodiment of the content-adaptive video coding device, each unit and module included are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; In addition, the specific names of the functional units are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the embodiments of the present application.

In some possible implementations, various aspects of the method provided in this application can also be implemented in the form of a program product, which includes program code, and when the program product is run on a computer device, the program code is configured to The computer device is made to execute the steps in the methods described above in this specification according to various exemplary implementations of the present application. For example, the computer device may execute the content adaptive video coding method described in the embodiments of the present application. The program product can be implemented using any combination of one or more readable media.

Claims

A content-adaptive video coding method, including:

Obtaining video data to be encoded, and dividing the video data into a plurality of image sets comprising continuous frame images;

Determining the coding features of the image collection, inputting the coding features and the set video picture evaluation parameters to the pre-trained machine learning model to output code rate control parameters;

Encoding the set of images according to the encoding features and the rate control parameters.
The content-adaptive video coding method according to claim 1, wherein said determining the coding characteristics of said set of images comprises:

Acquiring a preset number of frame images in the image collection;

Encoding the preset number of frame images to obtain encoding features, and determining the encoding features as the encoding features of the image set.
The content-adaptive video coding method according to claim 1 or 2, wherein the machine learning model includes a joint model composed of a first training model and a second training model, and the coding feature and the set video picture The evaluation parameters are input to the pre-trained machine learning model to output the rate control parameters, including:

Input the encoding features and the set video picture evaluation parameters into the first training model and the second training model respectively to obtain the first bit rate control parameters output by the first training model, and the second training model The second rate control parameter output by the model;

A code rate control parameter is obtained by performing weighted average calculation on the first code rate control parameter and the second code rate control parameter.
The content-adaptive video coding method according to claim 3, wherein, before inputting the coding features and the set video picture evaluation parameters into the first training model and the second training model respectively, further comprising:

Obtain video sample data corresponding to different scene types and different resolutions;

The video sample data is divided into training set samples, test set samples and verification set samples, and are respectively input to the first training model and the second training model for training.
The content-adaptive video coding method according to any one of claims 1-4, wherein said coding said set of images according to said coding features and said code rate control parameters comprises:

determining frame type information and scene information according to the encoding feature;

performing prediction analysis according to the frame type information, the scene information and the rate control parameters to obtain encoding parameters;

The set of images is encoded based on the encoding parameters.
The content-adaptive video coding method according to claim 5, wherein said performing predictive analysis to obtain coding parameters includes:

performing predictive analysis to obtain a first encoding parameter;

Determine a second encoding parameter based on the first encoding parameter, encoding feedback information, cache information, the frame type information, and the scene information.
The content-adaptive video coding method according to claim 5 or 6, wherein said coding said set of images based on said coding parameters comprises:

adjusting a quantization offset parameter according to the first coding parameter;

Encode the image set according to the second encoding parameter and the adjusted quantization offset parameter, so as to output code stream data.
A content adaptive video encoding device, including:

An image set determination module configured to acquire video data to be encoded, and divide the video data into multiple image sets comprising continuous frame images;

The code rate parameter determination module is configured to determine the encoding characteristics of the image set, and input the encoding characteristics and the set video picture evaluation parameters to the pre-trained machine learning model to output the code rate control parameters;

An encoding module configured to encode the set of images according to the encoding feature and the rate control parameter.
A content adaptive video encoding device, the device comprising: one or more processors; a storage device configured to store one or more programs, when the one or more programs are executed by the one or more processors Executing, so that the one or more processors implement the content adaptive video coding method according to any one of claims 1-7.
A storage medium storing computer-executable instructions configured to execute the content-adaptive video coding method according to any one of claims 1-7 when executed by a computer processor.