CN117676156A

CN117676156A - Video coding data prediction method, video coding method and related equipment

Info

Publication number: CN117676156A
Application number: CN202311560447.XA
Authority: CN
Inventors: 宁沛荣; 曲建峰; 陈靖
Original assignee: Shuhang Technology Beijing Co ltd
Current assignee: Shuhang Technology Beijing Co ltd
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-03-08

Abstract

The embodiment of the application discloses a video coding data prediction method, a video coding method and related equipment, wherein the method comprises the following steps: acquiring a sample video dataset; training a data prediction model to be trained by using a sample video data set, so that after the compressed domain feature data and sample video quality parameters in each sample video data are input into the data prediction model obtained through training, the loss function value between the sample coding parameter prediction data and the corresponding sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters is smaller than a function threshold; the data prediction model obtained through training is helpful for obtaining the coding parameter prediction data of the video to be processed under the target video quality parameter more accurately; the method is beneficial to encoding the video to be processed through the encoding parameter prediction data, and a target video code stream of the video to be processed under the target video quality parameter is obtained, so that the video encoding quality is improved.

Description

Video coding data prediction method, video coding method and related equipment

Technical Field

The present disclosure relates to the field of video coding technologies, and in particular, to a video coding data prediction method, a video coding method, and related devices.

Background

In the video encoding process, the resolution of the video is generally set correspondingly in cooperation with the rate control, and the mode of controlling the rate generally includes fixed bit rate (CBR), dynamic bit rate (VBR), constant rate factor (Constant Rate Factor, CRF) and the like, and in the fields of short video and live broadcast, more rate control modes are used to encode by using the same encoding parameter CRF, namely constant quality. Because video has different characteristics, sources and complexity, encoding using the same encoding parameters can leave the quality of these videos floating, so it is important how to better encode the video and ensure the quality of the video.

Disclosure of Invention

The embodiment of the application provides a video coding data prediction method, a video coding method and related equipment, which can determine coding parameter prediction data of a video to be processed under a target video quality parameter by utilizing a trained data prediction model, and further encode the video to be processed by utilizing the coding parameter prediction data so as to improve the video coding quality.

In a first aspect, an embodiment of the present application provides a method for predicting video encoded data, including:

acquiring a sample video data set, wherein the sample video data set comprises a plurality of sample video data, and each sample video data comprises compressed domain feature data of a sample video, sample video quality parameters and sample coding parameter labeling data corresponding to the sample video quality parameters for labeling the sample video;

training a data prediction model to be trained by using the sample video data set, so that after compressed domain feature data and sample video quality parameters in each sample video data in the sample video data set are input into the data prediction model obtained through training, the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters is smaller than a function threshold;

the data prediction model obtained through training is used for outputting and obtaining coding parameter prediction data of the video to be processed under the target video quality parameter according to the compressed domain characteristic data of the video to be processed and the target video quality parameter;

The coding parameter prediction data is used for coding the video to be processed according to the coding parameter prediction data, and a target video code stream of the video to be processed under the target video quality parameter is obtained.

In a second aspect, an embodiment of the present application provides a video encoding method, including:

acquiring video data to be processed, wherein the video data to be processed comprises compressed domain characteristic data of video to be processed and target video quality parameters, and the video to be processed is obtained by decoding a video code stream to be processed;

inputting the compressed domain characteristic data of the video to be processed and the target video quality parameter into a data prediction model to obtain coding parameter prediction data of the video to be processed under the target video quality parameter;

and encoding the video to be processed according to the encoding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameter.

In a third aspect, an embodiment of the present application provides a video coding data prediction apparatus, including:

the system comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring a sample video data set, the sample video data set comprises a plurality of sample video data, and each sample video data comprises compressed domain characteristic data of a sample video, sample video quality parameters and sample coding parameter marking data corresponding to the sample video quality parameters for marking the sample video;

The training unit is used for training the data prediction model to be trained by using the sample video data set, so that the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameter and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameter is smaller than a function threshold value after the compressed domain feature data and the sample video quality parameter in each sample video data in the sample video data set are input into the data prediction model obtained through training;

the coding parameter prediction data are used for determining target parameter data by combining coding parameter data of the video to be processed under the target video quality parameters, and recoding the video to be processed by utilizing the target parameter data.

In a fourth aspect, embodiments of the present application provide a video encoding apparatus, including:

The device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring video data to be processed, the video data to be processed comprises compressed domain characteristic data of video to be processed, target video quality parameters and coding parameter data of the video to be processed under the target video quality parameters, and the video to be processed is obtained by decoding a video code stream to be processed;

the prediction unit is used for inputting the compressed domain characteristic data of the video to be processed and the target video quality parameter into a data prediction model to obtain coding parameter prediction data of the video to be processed under the target video quality parameter;

and the encoding unit is used for encoding the video to be processed according to the encoding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameter.

In a fifth aspect, embodiments of the present application provide a computer device, the computer device comprising: a processor and a memory, the processor being configured to perform the method according to the first or second aspect.

In a sixth aspect, embodiments of the present application further provide a computer readable storage medium, where program instructions are stored, the program instructions when executed implement the method according to the first or second aspect.

In a seventh aspect, embodiments of the present application further provide a computer program product comprising program instructions which, when executed by a processor, implement the method of the first or second aspect described above.

The embodiment of the application can acquire a sample video data set; training a data prediction model to be trained by using a sample video data set, so that after compressed domain feature data and sample video quality parameters in each sample video data in the sample video data set are input into the data prediction model obtained through training, the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters is smaller than a function threshold; the data prediction model obtained through training is used for outputting and obtaining coding parameter prediction data of the video to be processed under the target video quality parameter according to the compressed domain characteristic data of the video to be processed and the target video quality parameter; the coding parameter prediction data are used for coding the video to be processed according to the coding parameter prediction data, and a target video code stream of the video to be processed under the target video quality parameters is obtained. And training a sample video data set to obtain a data prediction model, determining coding parameter prediction data of the video to be processed under the target video quality parameter by using the trained data prediction model, and further coding the video to be processed by using the coding parameter prediction data so as to improve the video coding quality.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a video coding data prediction method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a video encoding method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a relationship between coding parameters, code rate and video quality parameters according to an embodiment of the present application;

fig. 4 is a flowchart of another video encoding method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a PSNR profile for encoding video using the same CRF value;

FIG. 6 is a schematic diagram of PSNR distribution provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of a video coding data prediction apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a video encoding device according to an embodiment of the present application;

Fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The application provides a video coding data prediction method and a video coding method, which can be applied to various video coding scenes. Because of the source, characteristics, etc. of each video, video quality may be unstable if the same coding parameters are used for coding the video during the coding of the video. Therefore, the application provides a video coding data prediction method, by acquiring a sample video data set, wherein the sample video data set comprises a plurality of sample video data, and each sample video data comprises compressed domain characteristic data of a sample video, sample video quality parameters and sample coding parameter labeling data which are labeled for the sample video and correspond to the sample video quality parameters; training the data prediction model to be trained by using a sample video data set, so that after the compressed domain feature data and the sample video quality parameters in each sample video data in the sample video data set are input into the data prediction model obtained through training, the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters is smaller than a function threshold value.

Furthermore, the application also provides a video coding method, wherein the data prediction model obtained through training is used for outputting and obtaining coding parameter prediction data of the video to be processed under the target video quality parameter according to the compressed domain characteristic data and the target video quality parameter of the video to be processed, and coding the video to be processed according to the coding parameter prediction data, so as to obtain a target video code stream of the video to be processed under the target video quality parameter. In some embodiments, the target video quality parameter may include, but is not limited to, peak signal-to-noise ratio (PSNR); the encoding parameters may include, but are not limited to, CRF.

An embodiment of the present application is a method of finding a coding parameter (e.g., CRF) of a video encoder for a given target video quality parameter (e.g., PSNR) for a different video at the given target video quality parameter. Specifically, training a data prediction model by using a machine learning mode, and predicting coding parameter prediction data (such as CRF value) corresponding to a target encoder of a video to be processed under a given target video quality parameter by using the data prediction model; in some embodiments, the target encoder may include, but is not limited to, a 264 encoder. According to the method and the device, the data prediction model obtained through machine learning training is used for predicting the coding parameter prediction data of the video to be processed under the target video quality parameter, so that the accuracy of the coding parameter prediction data is improved, the video to be processed is further encoded by utilizing the coding parameter prediction data, the problem that the video quality is unstable due to the fact that different videos are encoded by the same coding parameter is solved, and the stability of the video quality is improved.

The video coding data prediction method provided by the embodiment of the application can be applied to a video coding data prediction device, and the video coding data prediction device can be arranged in computer equipment. The video encoding method provided in the embodiments of the present application may be applied to a video encoding device, where the video encoding device may be disposed in a computer device, and in some embodiments, the computer device may include, but is not limited to, a smart terminal device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, an on-vehicle smart terminal, a smart watch, and the like.

The video coding data prediction method and the video coding method provided in the embodiments of the present application are schematically described below with reference to the accompanying drawings.

Referring to fig. 1 specifically, fig. 1 is a flowchart of a video coding data prediction method provided in an embodiment of the present application, where the video coding data prediction method in the embodiment of the present application may be executed by a video coding data prediction device, and the video coding data prediction device may be disposed in a computer apparatus.

S101: a sample video dataset is obtained, the sample video dataset comprising a plurality of sample video data, each sample video data comprising compressed domain feature data of a sample video, and sample coding parameter annotation data corresponding to the sample video quality parameter for a sample video annotation.

In an embodiment of the present application, the computer device may obtain a sample video data set, where the sample video data set includes a plurality of sample video data, and each sample video data includes compressed domain feature data of a sample video, a sample video quality parameter, and sample coding parameter labeling data corresponding to the sample video quality parameter for labeling the sample video. In some embodiments, the sample video is a video of different sources, different types, different features, different complexity, etc. In some embodiments, the sample coding parameter labeling data is used to indicate the sample coding parameters of the sample video corresponding to each sample video quality parameter, and the sample coding parameter labeling data may include, but is not limited to, any one or more characters such as letters, numbers, letters, etc. In some embodiments, the sample encoding parameters may include, but are not limited to CRF, CBR, VBR, etc. In some embodiments, the sample video quality parameters may include, but are not limited to, PSNR, structural similarity (Structural Similarity, SSIM), video multi-assessment method fusion (Visual Multimethod Assessment Fusion, VMAF), and the like.

In some embodiments, the compressed domain feature data of the sample video is feature data of a compressed domain in a target encoder obtained during encoding of the sample video with the target encoder, wherein the target encoder may include, but is not limited to, an H264 encoder. In some embodiments, the compressed domain feature data may include, but is not limited to, one or more of block partition information, different partition block transform coefficients, macroblock information, etc. of the target encoder, wherein the macroblock information may include, but is not limited to skip and/or direct macroblock information. For example, assuming that the target encoder is an H264 encoder, in the process of encoding the sample video by the H264 encoder, macroblock (i.e., basic processing unit of the encoding standard) information of the H264 encoder may include a macroblock size of 16×16 pixels, and the macroblock size of 16×16 may be divided into smaller sub-blocks, where the sub-block sizes may include, but are not limited to, 8×16, 16×8, 8×8, 4*8, 8*4, 4*4, and the like.

S102: and training the data prediction model to be trained by using the sample video data set, so that the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameter and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameter is smaller than a function threshold value after the compressed domain feature data and the sample video quality parameter in each sample video data in the sample video data set are input into the data prediction model obtained through training.

In this embodiment of the present invention, the computer device may train the data prediction model to be trained using the sample video data set, so that after the compressed domain feature data and the sample video quality parameters in each sample video data in the sample video data set are input into the data prediction model obtained through training, the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters is smaller than the function threshold. The data prediction model obtained through training is used for outputting and obtaining coding parameter prediction data of the video to be processed under the target video quality parameter according to the compressed domain characteristic data of the video to be processed and the target video quality parameter; the coding parameter prediction data are used for coding the video to be processed according to the coding parameter prediction data, and a target video code stream of the video to be processed under the target video quality parameters is obtained. In some embodiments, the data prediction model to be trained may be a neural network model.

In one embodiment, when the computer device trains the data prediction model to be trained by using the sample video data set, compressed domain feature data and sample video quality parameters in each sample video data in the sample video data set can be input into the data prediction model to be trained to obtain sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters; calculating a loss function value between sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters and sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters, and adjusting model parameters of a data prediction model to be trained according to the loss function value when the loss function value is greater than or equal to a function threshold; and inputting the compressed domain characteristic data and the sample video quality parameters in each sample video data into the data prediction model to be trained after the model parameters are adjusted to retrain, and determining to obtain the data prediction model when the loss function value obtained by retraining is smaller than the function threshold value.

In one embodiment, the data prediction model to be trained may include a convolutional layer, a pooling layer, and a fully-connected layer; the computer equipment inputs compressed domain feature data and sample video quality parameters in each sample video data in a sample video data set into a data prediction model to be trained, so that when sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters is obtained, the compressed domain feature data of each sample video data and the sample video quality parameters of each sample video can be input into a convolution layer of the data prediction model to be trained for feature extraction, and sample compressed domain feature data vectors and sample video quality parameter feature vectors of each sample video data are obtained; performing feature fusion processing on the compressed domain feature data vector of each sample video data and the corresponding sample video quality parameter feature vector by using a convolution layer to obtain a fusion feature vector of each sample video; and inputting the fusion feature vectors of the sample videos into a pooling layer in the data prediction model to be trained to carry out pooling treatment, and inputting the target feature vectors obtained after pooling treatment into a full-connection layer in the data prediction model to be trained, so that the full-connection layer is utilized to carry out full-connection treatment on the target feature vectors of the sample videos, and sample coding parameter prediction data corresponding to the sample videos under corresponding sample video quality parameters are obtained.

In one embodiment, when calculating the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameter and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameter, the computer device may compare the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameter with the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameter, and determine the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameter and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameter according to the comparison result.

The computer device may compare the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameter with the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameter to obtain a difference value between each sample coding parameter prediction data and the corresponding sample coding parameter labeling data, and determine a loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameter and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameter according to the difference value.

When the computer equipment obtains the difference value of each sample coding parameter prediction data and the corresponding sample coding parameter labeling data, the square sum of the distances between each sample coding parameter prediction data and the corresponding sample coding parameter labeling data can be calculated, and the square sum of the distances between each sample coding parameter prediction data and the corresponding sample coding parameter labeling data is determined to be the loss function value between each sample coding parameter prediction data and the corresponding sample coding parameter labeling data. In some embodiments, the loss function value may include, but is not limited to, cross entropy loss, minimum mean error, and the like.

In one embodiment, when the computer device adjusts the model parameters of the data prediction model to be trained according to the loss function value, the computer device may determine a loss function graph according to the loss function value, and adjust the model parameters of the data prediction model to be trained according to the loss function graph, the model measurement standard, the learning rate parameter, the batch size parameter, the optimizer parameter, the iteration number parameter, the activation function parameter, and the like; the model measurement criteria may be accuracy, recall, F (FMea s u re, comprehensive evaluation index) 1, R O C Curve (re ce i ve R O pe ra ti ngcharacteristic Curve, subject working characteristic Curve), AUC (Area Under ROC Curve) Area, and the like.

According to the method and the device for predicting the coding parameter prediction data of the video to be processed, the data prediction model for predicting the coding parameter prediction data of the video to be processed under the target video quality parameter is obtained through training in a machine learning mode, reliability of the data prediction model is improved, and accuracy of the coding parameter prediction data obtained through prediction of the data prediction model is improved.

In one embodiment, the computer device may obtain video data to be processed, where the video data to be processed includes compressed domain feature data of the video to be processed and a target video quality parameter of the video to be processed, where the compressed domain feature data of the video to be processed is feature data of a compressed domain in a target encoder obtained during encoding of the video to be processed with the target encoder, and the compressed domain feature data of the sample video is feature data of the compressed domain in the target encoder obtained during encoding of the sample video with the target encoder; inputting the compressed domain characteristic data of the video to be processed and the target video quality parameter into a data prediction model to obtain the coding parameter prediction data of the video to be processed under the target video quality parameter.

In one embodiment, the trained data prediction model may be further used to output the encoded parameter prediction data of the video to be processed under the target video quality parameter according to the compressed domain feature data of the video to be processed and the target video quality parameter obtained in the short video scene; the coding parameter prediction data are used for coding the video to be processed according to the coding parameter prediction data, so as to obtain a target video code stream of the video to be processed under the target video quality parameter, and the target video code stream is sent to the terminal equipment, so that the terminal equipment decodes the target video code stream, and the video obtained by decoding is output and displayed on a screen of the terminal equipment in a short video scene.

In the embodiment of the application, the computer device may acquire a sample video data set; training the data prediction model to be trained by using a sample video data set, so that after the compressed domain feature data and the sample video quality parameters in each sample video data in the sample video data set are input into the data prediction model obtained through training, the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters is smaller than a function threshold value. And training a sample video data set to obtain a data prediction model, determining coding parameter prediction data of the video to be processed under the target video quality parameter by using the trained data prediction model, and being beneficial to further encoding the video to be processed by using the coding parameter prediction data, so that the video encoding quality is improved.

Referring to fig. 2 in detail, fig. 2 is a schematic flow chart of a video encoding method according to an embodiment of the present application, where the video encoding method according to the embodiment of the present application may be performed by a video encoding apparatus, and the video encoding apparatus may be disposed in a computer device.

S201: and acquiring video data to be processed, wherein the video data to be processed comprises compressed domain characteristic data of the video to be processed and target video quality parameters, and the video to be processed is obtained by decoding a video code stream to be processed.

In this embodiment of the present application, the computer device may obtain to-be-processed video data, where the to-be-processed video data includes compressed domain feature data of a to-be-processed video and a target video quality parameter, and the to-be-processed video is obtained by decoding a to-be-processed video code stream. In some embodiments, the target video quality parameter may include, but is not limited to PSNR, SSIM, VMAF, etc.

In one embodiment, the computer device may obtain compressed domain feature data (i.e., compressed domain feature data) during encoding of the video to be processed using a target encoder when obtaining the compressed domain feature data of the video to be processed. In some embodiments, the compressed domain feature data may include, but is not limited to, one or more of block partition information, different partition block transform coefficients, macroblock information, etc. of the target encoder, wherein the macroblock information may include, but is not limited to skip and/or direct macroblock information. For example, assuming that the target encoder is an H264 encoder, in the process of encoding the video to be processed by the H264 encoder, macroblock (i.e., basic processing unit of the encoding standard) information of the H264 encoder may include a macroblock size of 16×16 pixels, and the macroblock size of 16×16 may be divided into smaller sub-blocks, where the sub-block size may include, but is not limited to, 8×16, 16×8, 8×8, 4*8, 8*4, 4*4, and the like.

S202: inputting the compressed domain characteristic data of the video to be processed and the target video quality parameter into a data prediction model to obtain the coding parameter prediction data of the video to be processed under the target video quality parameter.

In this embodiment of the present application, the computer device may input the compressed domain feature data of the video to be processed and the target video quality parameter into a data prediction model to obtain encoded parameter prediction data of the video to be processed under the target video quality parameter. In one embodiment, when the computer device inputs the compressed domain feature data of the video to be processed and the target video quality parameter into the data prediction model to obtain the encoding parameter prediction data of the video to be processed under the target video quality parameter, the computer device may perform feature extraction on the compressed domain feature data of the video to be processed and the convolution layer of the target video quality parameter input data prediction model to obtain the compressed domain feature data vector and the target video quality parameter feature vector of the video to be processed; performing feature fusion processing on the compressed domain feature data vector of the video data to be processed and the target video quality parameter feature vector by using a convolution layer to obtain a fusion feature vector of the video to be processed; and inputting the fusion feature vector of the video to be processed into a pooling layer of the data prediction model for pooling processing, and inputting the target feature vector obtained after pooling processing into a full-connection layer of the data prediction model, so as to perform full-connection processing on the target feature vector of the video to be processed by using the full-connection layer, thereby obtaining the corresponding coding parameter prediction data of the video to be processed under the target video quality parameters.

According to the method and the device for predicting the coding parameter, the coding parameter prediction data corresponding to the video to be processed under the target video quality parameter is obtained through the trained data prediction model prediction, and accuracy of the coding parameter prediction data obtained through prediction is improved.

S203: and encoding the video to be processed according to the encoding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameter.

In this embodiment of the present application, the computer device may encode the video to be processed according to the encoding parameter prediction data, to obtain a target video code stream of the video to be processed under the target video quality parameter.

In one embodiment, the video data to be processed further includes encoding parameter data of the video to be processed, and when the computer device encodes the video to be processed according to the encoding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameter, the computer device may determine the target parameter data according to the encoding parameter prediction data and the encoding parameter data; and coding the video to be processed by utilizing the target parameter data to obtain a target video code stream of the video to be processed under the target video quality parameter. It should be noted that, the encoding parameter data of the video to be processed under the target video quality parameter is the original encoding parameter data of the video to be processed under the target video quality parameter.

The computer device may detect whether the encoding parameter prediction data is greater than the encoding parameter data before determining the target parameter data from the encoding parameter prediction data and the encoding parameter data; and when the coding parameter prediction data is detected to be larger than the coding parameter data, determining that the video to be processed meets the recoding condition, and executing the step of determining target parameter data according to the coding parameter prediction data and the coding parameter data.

For example, assuming that the encoding parameter data of the video to be processed is CRF value 23, the encoding parameter prediction data of the video to be processed at the target video quality parameter PSNR value of 35dB is CRF value 25, the computer device may detect that the encoding parameter prediction data CRF value 25 is greater than the encoding parameter data CRF value 23 of the video to be processed, and thus may perform the step of determining the target parameter data based on the encoding parameter prediction data CRF value 25 and the encoding parameter data CRF value 23.

And when the computer equipment detects that the coding parameter prediction data is smaller than or equal to the coding parameter data, determining that the video to be processed does not meet the recoding condition, namely, the video to be processed does not need to be coded.

For example, assuming that the encoding parameter data of the video to be processed is CRF value 23, and the encoding parameter prediction data of the video to be processed at the target video quality parameter PSNR value of 45dB is CRF value 21, the computer device may detect that the encoding parameter prediction data CRF value 21 is smaller than the encoding parameter data CRF value 23 of the video to be processed, and thus may determine that the video to be processed does not satisfy the re-encoding condition, that is, does not need to encode the video to be processed.

The computer device may detect whether the encoding parameter prediction data is greater than a parameter threshold when determining target parameter data from the encoding parameter prediction data and the encoding parameter data; when the coding parameter prediction data is detected to be smaller than or equal to the parameter threshold value, determining the coding parameter prediction data as target parameter data; and when the coding parameter prediction data is detected to be larger than the parameter threshold value, determining the parameter threshold value as target parameter data.

For example, assuming that the encoded parameter data of the video to be processed is CRF value 23, the encoded parameter prediction data of the video to be processed at the target video quality parameter PSNR value of 35dB is CRF value 25, and the parameter threshold value is 26, the computer device may detect that the encoded parameter prediction data CRF value 25 is greater than the encoded parameter data CRF value 23 of the video to be processed, and determine that the encoded parameter prediction data CRF value 25 is the target parameter data when detecting that the encoded parameter prediction data CRF value 25 is less than the parameter threshold value 26.

For another example, assuming that the encoded parameter data of the video to be processed is CRF value 23, the encoded parameter prediction data of the video to be processed at the target video quality parameter PSNR value of 35dB is CRF value 25, and the parameter threshold value is 24, the computer device may detect that the encoded parameter prediction data CRF value 25 is greater than the encoded parameter data CRF value 23 of the video to be processed, and detect that the encoded parameter prediction data CRF value 25 is greater than the parameter threshold value 24, and may determine that the parameter threshold value 24 is the target parameter data.

By comparing the set parameter threshold value with the predicted coding parameter prediction data of the predicted video to be processed, the method and the device help to avoid the fact that the quality of the coded video is obviously reduced due to the fact that the coding parameter prediction data are too large, and influence on user experience is avoided.

The method comprises the steps that when a computer device encodes a video to be processed by utilizing target parameter data to obtain a target video code stream of the video to be processed under a target video quality parameter, the code rate of the video code stream obtained by encoding the video to be processed by utilizing the target parameter data and the code rate of the video code stream data to be processed can be obtained; calculating the rate change rate between the code rate of the video code stream and the code rate of the video code stream data to be processed; when the rate of change of the code rate is detected to be smaller than the code rate threshold value, determining the video code stream as a target video code stream of the video to be processed under the target video quality parameter; and stopping encoding the video to be processed when the rate of change of the code rate is detected to be greater than or equal to the code rate threshold value.

Specifically, it can be described with reference to fig. 3, where fig. 3 is a schematic diagram of a relation between coding parameters, code rate and video quality parameters, and as shown in fig. 3, when the code rate is saved by about-2.5%, n=33 and the code rate variation limit (i.e. the code rate threshold) is increased, so that the minimum psnr influence variation can be obtained; at a code rate saving of about-4%, the optimal psnr (i.e. video quality parameter) influence can be obtained by increasing the code rate change limit by n=34, and different coding parameters can be used for different code rate saving targets according to the statistical data result.

According to the method and the device for processing the video, the rate threshold is set to limit the rate change rate between the rate of the video code stream obtained by encoding the video to be processed and the rate of the video code stream data to be processed, so that the encoding parameter prediction data can be further and better limited, and the influence on the video quality caused by overlarge rate change after the video to be processed is encoded by the encoding parameter prediction data is avoided.

In one embodiment, when the computer device encodes the video to be processed according to the encoding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameter, the computer device may encode the video to be processed obtained in the short video scene according to the encoding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameter, and send the target video code stream to the terminal device, so that the terminal device decodes the target video code stream, and outputs the video obtained by decoding in the short video scene to be displayed on a screen of the terminal device, so that a user can view the video with better quality, and user experience is improved.

According to the embodiment of the application, the video data to be processed can be obtained, the video data to be processed comprises compressed domain feature data and target video quality parameters of the video to be processed, the video to be processed is obtained by decoding a video code stream to be processed, the compressed domain feature data of the video to be processed and the target video quality parameters are input into a data prediction model to obtain coding parameter prediction data of the video to be processed under the target video quality parameters, and the video to be processed is coded according to the coding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameters. By the method, under the condition of given target video quality parameters, proper coding parameter prediction data can be accurately and effectively found, and the video to be processed is coded by using the coding parameter prediction data, so that the quality of video coding is improved.

Referring specifically to fig. 4, fig. 4 is a flowchart of another video encoding method provided in an embodiment of the present application, where the video encoding method of the embodiment of the present application may be performed by a video encoding apparatus, where the video encoding apparatus is disposed in a computer device, and a specific explanation of the computer device is as described above. Specifically, the method of the embodiment of the application comprises the following steps.

S401: and acquiring compressed domain characteristic data of the video to be processed in the target encoder and a target video quality parameter PSNR.

S402: inputting the compressed domain characteristic data of the video to be processed and the target video quality parameter PSNR into a data prediction model to obtain the coding parameter prediction data, namely the prediction CRF value, of the video to be processed under the target video quality parameter PSNR.

S403: and acquiring coding parameter data of the video to be processed, namely an original CRF value, detecting whether the predicted CRF value is larger than the original CRF value of the video to be processed, if so, executing step S404, and if not, executing step S410.

S404: whether the predicted CRF value is greater than the parameter threshold is detected, if the detection result is yes, step S405 is executed, and if the detection result is no, step S406 is executed.

S405: the parameter threshold is determined as target parameter data, and step S407 is performed.

S406: the predicted CRF value is determined as target parameter data.

S407: obtaining the code rate of a video code stream obtained by encoding the video to be processed by utilizing the target parameter data, obtaining the code rate of the video code stream data to be processed, and calculating the code rate change rate between the code rate of the video code stream and the code rate of the video code stream data to be processed.

S408: and detecting whether the rate of change of the code rate is smaller than a code rate threshold, if so, executing step S409, and if not, executing step S410.

S409: and determining a video code stream obtained by encoding the video to be processed by using the target parameter data as a target video code stream of the video to be processed under the target video quality parameter.

S410: the encoding operation is ended.

In one embodiment, the effect of the video encoding method of the embodiments of the present application may be described with reference to fig. 5 and 6. Fig. 5 is a PSNR distribution diagram of encoding a video using the same CRF value, and fig. 6 is a PSNR distribution diagram provided in the embodiment of the present application, where, as shown in fig. 5, the PSNR distribution situation of a video of a high heat video (i.e. a code stream 8) on a current original line is shown, it can be seen that the PSNR distribution of a current encoder is uneven caused by using the same CRF value for different videos of different sources and types. As shown in fig. 6, after the method described in the embodiment of the present application is used, PSNR of different videos with different sources and types can be stabilized and fluctuated within a preset error range with a target PSNR of 41.

Aiming at a given target video quality parameter PSNR, the embodiment of the application predicts a predicted CRF value corresponding to a target encoder of a video to be processed under the given target video quality parameter by using a data prediction model obtained through training in a machine learning mode. The method is beneficial to finding out the target CRF value under the given target PSNR, and encoding the video to be processed by using the target CRF value, thereby being beneficial to improving the quality of video encoding.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a video coding data prediction apparatus according to an embodiment of the present application. Specifically, the video coding data prediction apparatus is disposed in a computer device, and the video coding data prediction apparatus includes: an acquisition unit 701 and a training unit 702;

an obtaining unit 701, configured to obtain a sample video data set, where the sample video data set includes a plurality of sample video data, and each sample video data includes compressed domain feature data of a sample video, a sample video quality parameter, and sample coding parameter labeling data corresponding to the sample video quality parameter for labeling the sample video;

the training unit 702 is configured to train the data prediction model to be trained using the sample video data set, so that after the compressed domain feature data and the sample video quality parameters in each sample video data in the sample video data set are input into the data prediction model obtained through training, a loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters is smaller than a function threshold; the data prediction model obtained through training is used for outputting and obtaining coding parameter prediction data of the video to be processed under the target video quality parameter according to the compressed domain characteristic data of the video to be processed and the target video quality parameter; the coding parameter prediction data are used for determining target parameter data by combining coding parameter data of the video to be processed under the target video quality parameters, and recoding the video to be processed by utilizing the target parameter data.

Further, when the training unit 702 uses the sample video data set to train the data prediction model to be trained, the training unit is specifically configured to:

inputting compressed domain feature data and sample video quality parameters in each sample video data in the sample video data set into a data prediction model to be trained to obtain sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters;

calculating a loss function value between sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters and sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters, and adjusting model parameters of the data prediction model to be trained according to the loss function value when the loss function value is greater than or equal to a function threshold;

and inputting the compressed domain characteristic data and the sample video quality parameters in the sample video data into the data prediction model to be trained after adjusting the model parameters, retraining, and determining to obtain the data prediction model when the loss function value obtained by retraining is smaller than the function threshold value.

Further, the data prediction model to be trained comprises a convolution layer, a pooling layer and a full connection layer; the training unit 702 inputs the compressed domain feature data and the sample video quality parameters in each sample video data in the sample video data set into a data prediction model to be trained, and is specifically configured to:

inputting the compressed domain feature data of each sample video data and the sample video quality parameters of each sample video into a convolution layer of the data prediction model to be trained for feature extraction to obtain sample compressed domain feature data vectors and sample video quality parameter feature vectors of each sample video data;

performing feature fusion processing on the compressed domain feature data vector of each sample video data and the corresponding sample video quality parameter feature vector by using the convolution layer to obtain a fusion feature vector of each sample video;

and inputting the fusion feature vectors of the sample videos into a pooling layer in the data prediction model to be trained for pooling treatment, and inputting the target feature vectors obtained after pooling treatment into a full-connection layer in the data prediction model to be trained, so that the full-connection layer is utilized to carry out full-connection treatment on the target feature vectors of the sample videos, and sample coding parameter prediction data corresponding to the sample videos under corresponding sample video quality parameters are obtained.

Further, the video encoding data prediction apparatus further includes: a prediction unit 703, the prediction unit 703 being configured to:

acquiring video data to be processed, wherein the video data to be processed comprises compressed domain characteristic data of the video to be processed and target video quality parameters of the video to be processed, the compressed domain characteristic data of the video to be processed are characteristic data of a compressed domain in a target encoder, which is acquired in the process of encoding the video to be processed by using the target encoder, and the compressed domain characteristic data of the sample video are characteristic data of a compressed domain in the target encoder, which is acquired in the process of encoding the sample video by using the target encoder;

and inputting the compressed domain characteristic data of the video to be processed and the target video quality parameter into the data prediction model to obtain the coding parameter prediction data of the video to be processed under the target video quality parameter.

Further, the trained data prediction model is used for outputting and obtaining coding parameter prediction data of the video to be processed under the target video quality parameter according to the compressed domain characteristic data of the video to be processed and the target video quality parameter obtained in the short video scene;

The coding parameter prediction data is used for coding the video to be processed according to the coding parameter prediction data, obtaining a target video code stream of the video to be processed under the target video quality parameter, and sending the target video code stream to a terminal device, so that the terminal device decodes the target video code stream, and the video obtained by decoding is output and displayed on a screen of the terminal device in the short video scene.

In the embodiment of the application, the video coding data prediction device can acquire a sample video data set; training the data prediction model to be trained by using a sample video data set, so that after the compressed domain feature data and the sample video quality parameters in each sample video data in the sample video data set are input into the data prediction model obtained through training, the loss function value between the sample coding parameter prediction data of each sample video under the corresponding sample video quality parameters and the sample coding parameter labeling data of each sample video under the corresponding sample video quality parameters is smaller than a function threshold value. And training a sample video data set to obtain a data prediction model, determining coding parameter prediction data of the video to be processed under the target video quality parameter by using the trained data prediction model, and being beneficial to further encoding the video to be processed by using the coding parameter prediction data, so that the video encoding quality is improved.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application. Specifically, the video encoding device is disposed in a computer device, and the video encoding device includes: acquisition unit 801, prediction unit 802, and encoding unit 803;

an obtaining unit 801, configured to obtain video data to be processed, where the video data to be processed includes compressed domain feature data of a video to be processed, a target video quality parameter, and encoding parameter data of the video to be processed under the target video quality parameter, where the video to be processed is obtained by decoding a video code stream to be processed;

a prediction unit 802, configured to input the compressed domain feature data of the video to be processed and the target video quality parameter into a data prediction model, so as to obtain coding parameter prediction data of the video to be processed under the target video quality parameter;

and the encoding unit 803 is configured to encode the video to be processed according to the encoding parameter prediction data, so as to obtain a target video code stream of the video to be processed under the target video quality parameter.

Further, the video data to be processed further comprises encoding parameter data of the video to be processed; the encoding unit 803 encodes the video to be processed according to the encoding parameter prediction data, and is specifically configured to:

Determining target parameter data according to the coding parameter prediction data and the coding parameter data;

and encoding the video to be processed by utilizing the target parameter data to obtain a target video code stream of the video to be processed under the target video quality parameter.

Further, before the encoding unit 803 determines target parameter data according to the encoding parameter prediction data and the encoding parameter data, it is further configured to:

detecting whether the coding parameter prediction data is larger than the coding parameter data;

and when the coding parameter prediction data is detected to be larger than the coding parameter data, determining that the video to be processed meets a recoding condition, and executing the step of determining target parameter data according to the coding parameter prediction data and the coding parameter data.

Further, the encoding unit 803 is specifically configured to, when determining target parameter data according to the encoding parameter prediction data and the encoding parameter data:

detecting whether the coding parameter prediction data is greater than a parameter threshold;

when the coding parameter prediction data is detected to be smaller than or equal to the parameter threshold value, determining the coding parameter prediction data as the target parameter data;

And when the coding parameter prediction data is detected to be larger than the parameter threshold value, determining the parameter threshold value as the target parameter data.

Further, the encoding unit 803 encodes the video to be processed by using the target parameter data, so as to obtain a target video code stream of the video to be processed under the target video quality parameter, which is specifically configured to:

acquiring a code rate of a video code stream obtained by encoding the video to be processed by utilizing the target parameter data, and acquiring the code rate of the video code stream data to be processed;

calculating the rate change rate between the code rate of the video code stream and the code rate of the video code stream data to be processed;

and when the rate of change of the code rate is detected to be smaller than a code rate threshold value, determining the video code stream as a target video code stream of the video to be processed under the target video quality parameter.

Further, the encoding unit 803 encodes the video to be processed according to the encoding parameter prediction data, so as to obtain a target video code stream of the video to be processed under the target video quality parameter, which is specifically configured to:

and encoding the video to be processed acquired in a short video scene according to the encoding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameter, and sending the target video code stream to terminal equipment, so that the terminal equipment decodes the target video code stream, and outputting and displaying the video obtained by decoding on a screen of the terminal equipment in the short video scene.

In this embodiment of the present application, a video encoding device may acquire to-be-processed video data, where the to-be-processed video data includes compressed domain feature data and a target video quality parameter of a to-be-processed video, where the to-be-processed video is obtained by decoding a to-be-processed video code stream, and the compressed domain feature data of the to-be-processed video and the target video quality parameter are input into a data prediction model to obtain encoded parameter prediction data of the to-be-processed video under the target video quality parameter, and encode the to-be-processed video according to the encoded parameter prediction data to obtain a target video code stream of the to-be-processed video under the target video quality parameter. By the method, under the condition of given target video quality parameters, proper coding parameter prediction data can be found more accurately, and the video to be processed is coded by using the coding parameter prediction data, so that the quality of video coding is improved.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application. Specifically, the computer device includes: memory 901, processor 902.

In one embodiment, the computer device further comprises a data interface 903, the data interface 903 being used to transfer data information between the computer device and other devices.

The memory 901 may include volatile memory (volatile memory); memory 901 may also include non-volatile memory (nonvolatile memory); memory 901 may also include a combination of the above types of memory. The processor 902 may be a central processing unit (central processing unit, CPU). The processor 902 may further comprise a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), or any combination thereof.

The memory 901 is used for storing a program, and the processor 902 may call the program stored in the memory 901, for performing the following steps:

Further, when the processor 902 trains the data prediction model to be trained using the sample video data set, the processor is specifically configured to:

Further, the data prediction model to be trained comprises a convolution layer, a pooling layer and a full connection layer; the processor 902 inputs compressed domain feature data and sample video quality parameters in each sample video data in the sample video data set into a data prediction model to be trained, and is specifically configured to:

Further, the processor 902 is further configured to:

Embodiments of the present application further provide a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements a method described in an embodiment corresponding to fig. 1, fig. 2, or fig. 3 of the present application, and may also implement an apparatus corresponding to an embodiment corresponding to the present application described in fig. 6 or fig. 7, which is not described herein again.

The computer readable storage medium may be an internal storage unit of the device according to any of the foregoing embodiments, for example, a hard disk or a memory of the device. The computer readable storage medium may also be an external storage device of the device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the device. Further, the computer readable storage medium may also include both internal storage units and external storage devices of the device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in the above-described various embodiments.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

The above disclosure is only a few examples of the present application, and it is not intended to limit the scope of the claims, and those skilled in the art will understand that all or a portion of the above-described embodiments may be implemented and equivalents may be substituted for elements thereof, which are included in the scope of the present invention.

Claims

1. A method for predicting video-encoded data, comprising:

2. The method of claim 1, wherein the training the predictive model for data to be trained using the sample video dataset comprises:

3. The method of claim 1, wherein the data prediction model to be trained comprises a convolutional layer, a pooling layer, and a fully-connected layer; inputting compressed domain feature data and sample video quality parameters in each sample video data in the sample video data set into a data prediction model to be trained to obtain sample coding parameter prediction data of each sample video under corresponding sample video quality parameters, wherein the method comprises the following steps:

4. The method according to claim 1, wherein the method further comprises:

5. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the data prediction model obtained through training is used for outputting and obtaining coding parameter prediction data of the video to be processed under the target video quality parameter according to the compressed domain characteristic data of the video to be processed and the target video quality parameter obtained in the short video scene;

6. A video encoding method, comprising:

7. The method of claim 6, wherein the video data to be processed further comprises encoding parameter data for the video to be processed; the encoding the video to be processed according to the encoding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameter, including:

8. The method of claim 7, wherein prior to determining target parameter data from the encoding parameter prediction data and the encoding parameter data, further comprising:

9. The method of claim 7, wherein said determining target parameter data from said encoding parameter prediction data and said encoding parameter data comprises:

10. The method of claim 6, wherein encoding the video to be processed using the target parameter data results in a target video bitstream of the video to be processed at the target video quality parameter, comprising:

11. The method of claim 6, wherein the encoding the video to be processed according to the encoding parameter prediction data to obtain a target video code stream of the video to be processed under the target video quality parameter comprises:

12. A video encoded data prediction apparatus, comprising:

13. A video encoding apparatus, comprising:

14. A computer device comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is adapted to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-11.

15. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein program instructions which, when executed, implement the method according to any of claims 1-11.

16. A computer program product, characterized in that it comprises program instructions which, when executed by a processor, implement the method of any one of claims 1-11.