CN113329226A - Data generation method and device, electronic equipment and storage medium - Google Patents

Data generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113329226A
CN113329226A CN202110591962.9A CN202110591962A CN113329226A CN 113329226 A CN113329226 A CN 113329226A CN 202110591962 A CN202110591962 A CN 202110591962A CN 113329226 A CN113329226 A CN 113329226A
Authority
CN
China
Prior art keywords
video
coding
target
sample
sampled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110591962.9A
Other languages
Chinese (zh)
Other versions
CN113329226B (en
Inventor
刘禾
廖懿婷
李军林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202110591962.9A priority Critical patent/CN113329226B/en
Publication of CN113329226A publication Critical patent/CN113329226A/en
Application granted granted Critical
Publication of CN113329226B publication Critical patent/CN113329226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/196Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the disclosure discloses a data generation method, a data generation device, an electronic device and a storage medium, wherein the method comprises the following steps: sampling an original video to a first resolution to obtain a sampled video, and determining coding characteristics according to the sampled video; inputting the coding characteristics into a video classifier so that the video classifier outputs a target class of an original video; determining a target prediction model corresponding to the target category from the prediction models; and inputting the coding characteristics into a target prediction model so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters. By additionally arranging the video classifier and the prediction models corresponding to different classes, the complexity of the models is improved, and the quality parameters with higher accuracy can be predicted according to the coding features with lower accuracy, so that the computing resources consumed in computing the coding features can be greatly reduced, and the data generation efficiency is improved.

Description

Data generation method and device, electronic equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of computers, and in particular relates to a data generation method and device, an electronic device and a storage medium.
Background
At present, a video platform full of Lin Lang provides abundant video resources for people. In the process of transmitting the video to the user terminal, the video is compressed by adopting proper coding parameters, so that the transmission bandwidth flow is saved and the transmission efficiency is improved under the condition of ensuring the video quality.
In the prior art, the following methods are generally used to determine suitable coding parameters: setting a precoding parameter; pre-coding is carried out according to the set pre-coding parameters to obtain the video characteristics of the pre-coded video; predicting the video quality according to the video characteristics based on the prediction model; and determining the pre-coding parameters with good video quality as the proper coding parameters.
The disadvantages of the prior art at least include: because a uniform prediction model is used to predict the quality of the video after different pre-coding, in order to ensure the prediction accuracy, a large amount of computing resources and time are required to be consumed to compute the high-accuracy video characteristics.
Disclosure of Invention
The embodiment of the disclosure provides a data generation method and device, an electronic device and a storage medium, which can greatly reduce the consumption of computing resources and improve the data generation efficiency on the basis of ensuring the prediction accuracy of video quality.
In a first aspect, an embodiment of the present disclosure provides a data generation method, including:
sampling an original video to a first resolution to obtain a sampled video, and determining coding characteristics according to the sampled video;
inputting the coding features into a video classifier so that the video classifier outputs a target class of the original video;
determining a target prediction model corresponding to the target category from prediction models;
and inputting the coding characteristics into the target prediction model so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters.
In a second aspect, an embodiment of the present disclosure further provides an apparatus for generating data, including:
the system comprises a characteristic extraction module, a coding module and a decoding module, wherein the characteristic extraction module is used for sampling an original video to a first resolution ratio to obtain a sampled video and determining coding characteristics according to the sampled video;
the video classification module is used for inputting the coding features into a video classifier so that the video classifier outputs a target class of the original video;
the model determining module is used for determining a target prediction model corresponding to the target category from prediction models;
and the parameter prediction module is used for inputting the coding characteristics into the target prediction model so as to enable the target prediction model to predict the quality parameters of the video coded by the original video according to the candidate coding parameters.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the data generation method according to any one of the embodiments of the present disclosure.
In a fourth aspect, the embodiments of the present disclosure also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are used for executing the method for generating data according to any one of the embodiments of the present disclosure.
According to the technical scheme of the embodiment of the disclosure, an original video is sampled to a first resolution to obtain a sampled video, and coding characteristics are determined according to the sampled video; inputting the coding characteristics into a video classifier so that the video classifier outputs a target class of an original video; determining a target prediction model corresponding to the target category from the prediction models; and inputting the coding characteristics into a target prediction model so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters.
By additionally arranging the video classifier and the prediction models corresponding to different classes, the complexity of the models is improved, and the quality parameters with higher accuracy can be predicted according to the coding features with lower accuracy, so that the computing resources consumed in computing the coding features can be greatly reduced, and the data generation efficiency is improved.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a schematic flow chart of a data generation method according to a first embodiment of the present disclosure;
fig. 2 is a data flow diagram of a data generation method provided in the second embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating a pre-training flow of a video classifier in a data generation method according to a third embodiment of the present disclosure;
fig. 4 is a schematic diagram illustrating a pre-training process of a prediction model in a data generation method according to a third embodiment of the present disclosure;
fig. 5 is a schematic diagram of a clustering result of a rate-distortion optimization curve in a data generation method according to a third embodiment of the present disclosure;
fig. 6 is a schematic flow chart of a data generation method according to a fourth embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a data generation apparatus according to a fifth embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
Example one
Fig. 1 is a schematic flow chart of a data generation method according to a first embodiment of the present disclosure. The embodiment of the present disclosure is applicable to the situation of generating the quality parameter of the video, and is particularly applicable to the situation of generating the quality parameter of the video encoded by the original data according to different candidate encoding parameters. The method may be performed by a data generating apparatus, which may be implemented in software and/or hardware, and which may be configured in an electronic device, for example in a server of a video platform.
As shown in fig. 1, the method for generating data provided in this embodiment includes:
s110, sampling the original video to a first resolution to obtain a sampled video, and determining coding characteristics according to the sampled video.
In the embodiment of the present disclosure, the original video may be, for example, a video resource stored in a server of a video platform. The first resolution may be a lower resolution that is set in advance according to an application scenario and is smaller than the resolution of the on-line main stream. For example, in general, the video resolution of the main stream on the line is 540p and 720p, and the first resolution may be set to 480p in advance.
Since the first resolution is a lower resolution, the original video is sampled to the first resolution, which may be generally considered as downsampling the original video to the first resolution. Wherein, can be based on the image principle of down-sampling with the original video down-sampling to first resolution, and the image principle of down-sampling can be understood as, extract the ranks pixel of original video to satisfy the quantity of the ranks pixel that first resolution corresponds. By determining the encoding characteristics of the low-resolution video, the computational resource consumption and the computational duration of the characteristics can be reduced.
Wherein, determining the coding characteristics according to the sampled video may include, but is not limited to: and determining the characteristics according to the size, the number, the pixels and the like of each video frame in the sampled video. Generally, the encoding features may be features that have a correlation with video classification and/or video encoded quality, and the features may be pre-selected based on empirical data or experimental data. By determining the coding characteristics of the sampled video, the subsequent video classification and the determination of the quality parameters of the coded video are facilitated.
And S120, inputting the coding features into a video classifier so that the video classifier outputs the target class of the original video.
In the embodiments of the present disclosure, the video classifier may include, but is not limited to: traditional algorithm-based classifiers (e.g., random forest algorithm-based classifiers), and machine learning-based classifiers (e.g., deep learning network-based classifiers). The video classifier can be trained in advance according to the coding features corresponding to the sample video and the set class labels of the sample video, and can output the target class of the original video according to the coding features corresponding to the original data after the training is finished.
The class labels of the sample videos can include multiple candidate classes, and different candidate classes can respectively represent original videos with different coding attributes; the encoding attribute may be considered as a quality parameter trend obtained after encoding by different encoding parameters. Wherein the category label may be set based on empirical data or experimental data. Based on the target category output by the video classifier, one of the candidate categories may be selected.
By classifying the original video, the quality parameters of the original video after being coded can be favorably predicted according to different prediction models.
S130, determining a target prediction model corresponding to the target category from the prediction models.
In the embodiment of the present disclosure, the prediction model may be obtained by performing regression based on a regression function (e.g., mean square error, mean absolute error, etc.) in machine learning. In the process of training the prediction model, regression of the prediction model of the corresponding candidate category can be performed according to the coding characteristics of the sample video of each candidate category. After the prediction model is obtained through regression, the quality parameters of the video of the original video coded by the candidate coding parameters can be output according to the coding characteristics corresponding to the original video.
Wherein the prediction model may include at least one, and each prediction model may be derived from a sample video regression corresponding to the candidate category. Correspondingly, after the target category of the original video is determined, the prediction model corresponding to the target category can be used as the target prediction model according to the corresponding relation between the prediction model and the candidate category.
And S140, inputting the coding characteristics into the target prediction model so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters.
The candidate encoding parameters may include at least one encoding parameter set, and each encoding parameter set may include at least: the resolution used for encoding and the fixed Rate Coefficient (CRF). The resolution may include 540p, 720p, 1080p, etc., for example; the value of CRF may range, for example, from 16 to 35 and the step size may be 1. Accordingly, each encoding parameter set may be composed of a value in resolution and a value in the range of CRF values.
Wherein the quality parameters may include at least: the Video multi-method evaluates the value of the Fusion (VMAF), and the value of the bitrate (bitrate). The VMAF may be used to evaluate the video picture quality, which may be considered as a final index obtained by combining a plurality of basic quality indexes (such as distortion type and distortion degree), and a certain weight may be assigned to each basic index, so that the final index may retain all advantages of each basic index, thereby obtaining a more accurate picture quality evaluation value. Wherein the code rate can be used to estimate the transmission speed, which can be considered as the number of data bits transmitted per unit time.
For example, the quality parameter of the video encoded by the candidate encoding parameters of the output original video may be: by the target prediction model, the VMAF and the code rate of the video coded according to 540p and CRF positions of the original video, the VMAF and the code rate of the video coded according to 720p and CRF positions, the VMAF and the code rate of the video coded according to 1080p and CRF positions and the like can be predicted.
In the conventional scheme, when the quality of a coded video is predicted, the quality of different pre-coded videos is predicted by using a uniform prediction model, and in order to ensure the prediction accuracy, more accurate video characteristics need to be mined. Therefore, the conventional scheme usually adopts a slow mode with larger calculation amount and higher calculation accuracy to perform video feature calculation, which greatly consumes calculation power and time. In addition, when the set resolution is high, the calculation amount is increased to further extend the time consumption; and in order to determine the most suitable coding parameters, multiple pre-coding and video feature calculation are required, so that the calculation time is longer. Compared with the time consumption for calculating the video characteristics of the pre-coded video, the time consumption for predicting the video quality can be almost ignored, so that the time consumption for calculating the video characteristics is reduced, and the method has very important significance for improving the service performance on the line.
In the generation method disclosed in this embodiment, by adding the video classifier and the prediction models corresponding to different classes, the complexity of the models is improved, and prediction can be performed according to the coding features with lower precision under the condition of ensuring the prediction precision, so that the calculation resources consumed in calculating the coding features can be greatly reduced. In addition, the original video is sampled to the first resolution and then the coding characteristics are determined, so that the calculation amount can be further reduced, and the time consumption is reduced; and the quality parameters under each candidate coding parameter are output at one time based on the prediction model, so that the time consumption of calculation is further saved. By reducing the calculation time consumption of the coding characteristics, the time consumption of quality parameter prediction is greatly reduced, the data generation efficiency is improved, and the online service performance is greatly improved.
According to the technical scheme of the embodiment of the disclosure, an original video is sampled to a first resolution to obtain a sampled video, and coding characteristics are determined according to the sampled video; inputting the coding characteristics into a pre-trained video classifier so that the video classifier outputs a target class of an original video; determining a target prediction model corresponding to a target category from pre-trained prediction models; and inputting the coding characteristics into a target prediction model so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters.
By additionally arranging the video classifier and the prediction models corresponding to different classes, the complexity of the models is improved, and the quality parameters with higher accuracy can be predicted according to the coding features with lower accuracy, so that the computing resources consumed in computing the coding features can be greatly reduced, and the data generation efficiency is improved.
Example two
The embodiments of the present disclosure and various alternatives in the generation method of data provided in the above embodiments may be combined. The method for generating data provided by this embodiment describes the determining step of the coding characteristics in detail, and not only can use the code stream characteristics of the pre-coded video as the coding characteristics, but also can use the quality characteristics of the pre-coded video and the temporal-spatial information characteristics of the sampled video as the coding characteristics.
By taking the quality characteristics as the coding characteristics, the quality parameter trend of the video coded after sampling can be characterized to a certain extent, and a prediction basis is provided for subsequently predicting the quality parameters of the video coded by the original video according to the candidate coding parameters. By taking the space-time information characteristics as the coding characteristics, the complexity of the sampled video can be represented, and a prediction basis can be further provided for subsequently predicting the quality parameters of the video of the original video coded according to the candidate coding parameters. The prediction accuracy of the quality parameters can be further improved by expanding the richness of the coding features.
In this embodiment, determining the coding characteristics according to the sampled video includes: and carrying out pre-coding processing on the sampled video, determining the code stream characteristics of the pre-coded video, and taking the code stream characteristics as coding characteristics.
The sampled Video is pre-encoded, for example, by using the h.264 Video Coding and decoding standard or Scalable Video Coding (SVC) technology, etc. Wherein, the pre-coded video can comprise frames of an intra-frame coding (I frame), a forward predictive coding frame (P frame) and a bidirectional predictive interpolation coding frame (B frame); accordingly, the bitrate characteristic of the pre-encoded video may include a frame characteristic of I, P, B frames.
The frame features of I, P, B frames may include, but are not limited to, the following 13-dimensional features: [ 'i _ n', 'i _ byte', 'i _ qp', 'p _ n', 'p _ byte', 'p _ qp', 'b _ n', 'b _ byte', 'b _ qp', 'p _ skip', 'b _ skip', 'p _ intra', 'b _ intra' ]; wherein, 'I _ n' may represent the number of I frames; 'I _ byte' may represent the size of each I frame; 'I _ QP' may represent a Quantization Parameter (QP) value for each I frame; accordingly, 'P _ n', 'P _ byte', 'P _ QP', 'B _ n', 'B _ byte', and 'B _ QP', the number of standard P frames, the size of each P frame, the QP value of each P frame, the number of B frames, the size byte of each B frame, the QP value of each B frame, respectively; 'P _ skip' and 'B _ skip' may represent skip situations of P-frames and B-frames, respectively; 'P _ intra' and 'B _ intra' may denote intra cases of P frames and B frames, respectively.
The determination process of the above characteristics can be calculated based on the fast coding gear with lower precision. In the conventional scheme, although the video features may also include the above features, the video features generally need to be calculated based on the slow coding bits with higher precision. Therefore, the generation method of the present embodiment may take less time to calculate the code rate characteristic compared to the conventional scheme.
In some optional implementations, determining the coding characteristics from the sampled video further includes: determining the quality characteristics of the pre-coded video according to the sampled video and the pre-coded video; and/or determining the space-time information characteristics according to the sampled video; correspondingly, the code stream characteristics are taken as coding characteristics, and the coding characteristics comprise: and taking at least one of the code stream characteristics, the quality characteristics and the space-time information characteristics as the coding characteristics.
The quality characteristics of the pre-encoded video may include, but are not limited to, the following 4-dimensional characteristics: VMAF value, code rate, number of pixels of kpixels, and difference between target CRF and reference CRF. The above features may be determined from the sampled video and the pre-encoded video based on the prior art, and will not be described in detail herein.
Due to the quality characteristics of the pre-coded video, the quality parameter trend of the sampled video after coding can be characterized to a certain extent, and the quality characteristics of the pre-coded video are used as coding characteristics, so that a prediction basis can be provided for subsequently predicting the quality parameters of the video of the original video after coding according to the candidate coding parameters.
The step of determining spatio-Temporal Information (SITI) characteristics of the sampled video may generally include: s1, acquiring a difference image I _ dif in a space domain (SI) and a time domain (TI) respectively aiming at each video frame; wherein, the difference image of the airspace can be determined by calculating the sobel operator (namely the pixel difference value in the horizontal and vertical directions) of the video frame; the time domain difference image can be determined by calculating the difference between the front and rear positions of each pixel point in the current frame and the next frame. And S2, respectively obtaining the maximum value max (I _ dif), the mean value mean (I _ dif) and the variance std (I _ dif) in the time domain difference image and the spatial domain difference image aiming at each video frame. S3, respectively acquiring a maximum value max (max (I _ dif)), a mean value mean (max (I _ dif)) and a variance std (max (I _ dif)) in time domain and space domain of all the maximum values max (I _ dif); acquiring the maximum value max (mean (I _ dif)), the mean (mean (I _ dif)) and the variance std (mean (I _ dif)) of each mean value; and acquiring a maximum value max (std (I _ dif)), a mean (std (I _ dif)), and a variance std (std (I _ dif)) of each variance.
In summary, the spatio-temporal information features of the sampled video may include the above 9-dimensional features obtained in the time domain and the spatial domain, respectively, that is, may include 18-dimensional features in total. The starting point of the coding compression can be considered to comprise time domain correlation (namely in a group of video frame sequences, two adjacent frames have only little difference) and space domain correlation (namely in the same video frame, the closer two pixel points are, the stronger the correlation is), and the spatial-temporal information characteristic of the sampled video is taken as the coding characteristic, so that the complex conditions of the sampled video in the time domain and the space domain can be represented respectively, and the quality parameter of the video coded by the original video according to each candidate coding parameter can be further predicted for providing a prediction basis for the subsequent prediction.
Exemplarily, fig. 2 is a data flow diagram of a data generation method provided in the second embodiment of the present disclosure. Referring to fig. 2, the original videos with different resolutions are downsampled to a first resolution, so as to obtain a sampled video; pre-coding the sampled video to obtain 13-dimensional (D) code stream characteristics; performing quality calculation according to the sampled video and the pre-coded video, and determining the quality characteristics of the 4D pre-coded video; the 18D spatiotemporal information features may be determined from the SITI calculations performed on the sampled video.
Thus, a 35D coding feature may be obtained in total. Compared with the traditional scheme in which the video features only contain partial code stream features, the encodable feature in the embodiment can cover more comprehensive features, and the prediction accuracy of the quality parameters can be further improved by expanding the richness of the encoding features.
After the coding features are determined, the coding features can be input into a video classifier to obtain a target class of an original video; then, a target prediction model corresponding to the target category is determined from the pre-trained K0 prediction model, K1 prediction model,. and Kn prediction model (for example, K1 in the figure is the target prediction model); finally, the coding characteristics can be input into the target prediction model, so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters.
According to the technical scheme of the embodiment of the disclosure, the determination step of the coding characteristics is described in detail, so that not only can the code stream characteristics of the pre-coded video be used as the coding characteristics, but also the quality characteristics of the pre-coded video and the space-time information characteristics of the sampled video can be used as the coding characteristics. By taking the quality characteristics as the coding characteristics, the quality parameter trend of the video coded after sampling can be characterized to a certain extent, and a prediction basis is provided for subsequently predicting the quality parameters of the video coded by the original video according to the candidate coding parameters. By taking the space-time information characteristics as the coding characteristics, the complexity of the sampled video can be represented, and a prediction basis can be further provided for subsequently predicting the quality parameters of the video of the original video coded according to the candidate coding parameters. The prediction accuracy of the quality parameters can be further improved by expanding the richness of the coding features.
The method for generating data provided by the embodiment of the present disclosure and the method for generating data provided by the above embodiment belong to the same disclosure concept, and the technical details that are not described in detail in the embodiment can be referred to the above embodiment, and the same technical features have the same beneficial effects in the embodiment and the above embodiment.
EXAMPLE III
The embodiments of the present disclosure and various alternatives in the generation method of data provided in the above embodiments may be combined. The data generation method provided in this embodiment describes in detail the pre-training step of the video classifier and the pre-training step of the prediction model. Through automatic clustering of each sample video, the clustering result can be used as a label, and training of a video classification model is realized. And respectively carrying out gradient regression training on the rate-distortion optimization curves of all the classes based on the classification result of the sample video, so that the prediction models corresponding to all the candidate classes can be obtained through regression.
Fig. 3 is a schematic diagram of a pre-training process of a video classifier in a data generation method according to a third embodiment of the present disclosure. Referring to fig. 3, in this embodiment, the pre-training step of the video classifier includes:
s310, obtaining sample videos, and sampling each sample video to a first resolution to obtain sampled sample videos.
The sample video can be massive video resources under various resolutions. The manner of sampling the sample video to the first resolution is the same as the manner of sampling the original video to the first resolution, and is not described herein again.
And S320, determining a rate-distortion optimization curve of each sampled sample video.
Wherein, a Rate-distortion optimization (RD) curve represents the quality parameters of the video encoded by each sampled sample video according to the candidate encoding parameters. For example, in the RD curve, the abscissa may be the code rate and the ordinate may be the VMAF value; accordingly, each point in the RD curve may represent a different candidate encoding parameter for the sampled sample video. Therefore, the quality parameter of the video coded by each candidate coding parameter of the sampled sample video can be determined according to the RD curve.
Wherein, the sampled video can be encoded according to each candidate encoding parameter; then, the quality parameters of the coded video can be respectively determined; and finally, according to the corresponding relation between each candidate encoding parameter and each quality parameter, drawing to obtain an RD curve corresponding to each sampled sample video.
S330, clustering each rate-distortion optimization curve to obtain candidate categories.
The specific process of clustering the rate-distortion optimization curves to obtain candidate categories may include: s1, initially dividing each sample video into N types, and calculating the RD curve center corresponding to each type; the method comprises the following steps of initializing and dividing sample videos into N types according to the comparison of code rate and VMAF value; the size of N may be specifically set according to an application scenario. And S2, calculating the nearest RD curve center for each RD curve, and updating the category of each sample video. And S3, recalculating the category centers after all the video categories are updated, and repeating the step S2 until the video categories are not changed any more. Through the clustering process based on the RD curves, N candidate categories can be finally obtained.
In some optional implementations, clustering the rate-distortion optimization curves may include: and clustering each rate distortion optimization curve according to the mean square error distance of the rate distortion optimization curves between every two sampled sample videos.
In these optional implementations, clustering the rate-distortion optimization curves using the mean-square error distance of the rate-distortion optimization curves may include: when the nearest RD curve center is calculated for each RD curve, the current RD curve is calculated, the Mean Square Error (MSE) distance between the RD curve centers corresponding to each category is calculated, and the center position with the minimum MSE distance is determined as the RD curve center nearest to the current RD curve. Further, the category of the sample video may be updated according to the RD curve center closest to the current RD curve until the video category is no longer changed.
In addition, besides clustering the rate-distortion optimization curves by using the MSE distance, the rate-distortion optimization curves may also be clustered according to other distances, for example, the minkowski distance between the center of the current RD curve and the center of the RD curve corresponding to each category is used for calculation, and the like, which is not exhaustive.
For example, fig. 5 is a schematic diagram of a clustering result of a rate-distortion optimization curve in a data generation method provided in the third embodiment of the present disclosure.
Referring to fig. 5, there is shown an RD graph with code rate on the abscissa and VMAF value on the ordinate. The 5 curves cat0-cat4 in the figure may respectively represent sample videos of 5 clustered candidate categories, which respectively correspond to RD curves. Wherein, the point in each curve can correspond to the CRF value, and the range of the CRF value in each curve can be 16-35, and the step length can be 1. In fig. 5, each point in the RD curve may represent the code rate and VMAF value corresponding to the video after the CRF encoding of the sample video sampled to the first resolution to which the point belongs.
And S340, determining the target category of each sample video from each candidate category.
For each sample video, the RD curve corresponding to the current sample video and the MSE distance of each type of RD curve in the clustering result may be calculated. And the candidate category corresponding to the RD curve in the clustering result with the minimum MSE distance may be used as the target category of the current sample video. And finally, obtaining the target category of each sample video.
And S350, determining sample coding characteristics according to the sampled sample videos.
There is no strict time sequence relationship between the step S350 of determining the sample encoding characteristics and the steps S320-S340 of determining the target category of each sample video. The step of determining the sample coding features may be performed first, the step of determining the target category of each sample video may be performed first, or the step of determining the sample coding features and the step of determining the target category of each sample video may be performed simultaneously.
The step of determining the sample coding characteristics of the sampled sample video may refer to the above embodiments, and is not described herein again.
And S360, inputting the coding characteristics of each sample into the video classifier so that the video classifier outputs the actual output category of each sample video.
And S370, training the video classifier by taking the deviation between each actual output class and the target class of each sample video, which is smaller than the preset deviation, as a target.
The determined target class of each sample video may be used as a true value of each sample video class. Wherein, initial parameters can be set for the video classifier according to empirical values or experimental values. After the actual output category of each sample video is output according to the current parameter in the video classifier, the current parameter can be subjected to iterative optimization according to the deviation between the actual output category and the true value, so that the training of the video classifier is completed.
Fig. 4 is a schematic diagram of a pre-training process of a prediction model in a data generation method according to a third embodiment of the present disclosure. Referring to fig. 4, in the present embodiment, the pre-training step of the prediction model includes:
s410, obtaining sample videos, and sampling each sample video to a first resolution to obtain sampled sample videos.
And S420, determining a rate-distortion optimization curve of each sampled sample video.
And the rate-distortion optimization curve represents the quality parameters of the video coded by each sampled sample video according to the candidate coding parameters.
And S430, clustering the rate-distortion optimization curves to obtain candidate categories.
After determining each candidate category, regression of the prediction model may be performed for each candidate category. The regression process of the prediction model can be performed synchronously with the training process of the video classifier.
And S440, determining the target category of each sample video from the candidate categories.
And S450, grouping the sample videos according to the target category of the sample videos.
Wherein, sample videos with the same target category can be used as a group of sample videos.
And S460, aiming at each group of sample videos, training a gradient regression model according to the rate distortion optimization curve of the sampled sample videos corresponding to the sample videos in the group.
Through gradient regression training, the corresponding relation between the local features of a single RD curve and all the features of all the RD curves can be learned by the prediction model. Therefore, the prediction model can be provided with the capability of predicting the quality parameters under all candidate encoding parameters according to part of the candidate encoding parameters and the quality parameters corresponding to the part of the candidate encoding parameters.
And S470, taking the trained gradient regression model corresponding to each group of sample videos as a prediction model of the candidate category corresponding to each group of sample videos.
By performing gradient regression on the prediction models corresponding to each group of sample videos, each prediction model corresponding to each candidate category can be finally obtained. Therefore, after the target category of the original category is determined, the target prediction model can be selected from the prediction models according to the corresponding relation, and the quality parameter of the video coded by the original video according to the candidate coding parameters can be predicted.
The technical scheme of the embodiment of the disclosure describes the pre-training step of the video classifier and the pre-training step of the prediction model in detail. By optimizing the curve and the mean square error distance based on rate distortion, automatic clustering of each sample video can be realized, so that a clustering result can be used as a label to train a video classification model. And respectively carrying out gradient regression training on the rate-distortion optimization curves of all the classes based on the classification result of the sample video, so that the prediction models corresponding to all the candidate classes can be obtained through regression.
The method for generating data provided by the embodiment of the present disclosure and the method for generating data provided by the above embodiment belong to the same disclosure concept, and the technical details that are not described in detail in the embodiment can be referred to the above embodiment, and the same technical features have the same beneficial effects in the embodiment and the above embodiment.
Example four
The embodiments of the present disclosure and various alternatives in the generation method of data provided in the above embodiments may be combined. The data generation method provided in this embodiment supplements the steps after obtaining the predicted quality parameters. The quality parameters of the original video coded by the candidate coding parameters are determined by using the generation method, so that the purpose that the corresponding target coding parameters are fed back according to the request of a requester carrying target quality information can be realized, and the requester is informed of the target coding parameters capable of meeting the target quality information; the method can also realize that the corresponding quality parameters are fed back according to the request of the candidate coding parameters carried by the requester so as to inform the requester of the quality condition after the candidate coding parameters are coded.
Fig. 6 is a schematic flow chart of a data generation method according to a fourth embodiment of the present disclosure. Referring to fig. 6, in this embodiment, the data generating method includes:
s610, sampling the original video to a first resolution to obtain a sampled video, and determining coding characteristics according to the sampled video.
And S620, inputting the coding features into the video classifier so that the video classifier outputs the target class of the original video.
S630, a target prediction model corresponding to the target category is determined from the prediction models.
And S640, inputting the coding characteristics into the target prediction model so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters.
And S651, receiving a parameter acquisition request sent by a requester, wherein the parameter acquisition request carries target quality information.
The target quality information may include at least one of the quality parameters, and may be considered as a quality condition of the encoded video desired by the requesting party.
And S661, determining the target quality parameter from the predicted quality parameters according to the target quality information.
Searching quality parameters containing target parameters from the predicted quality parameters according to the target parameters contained in the target quality information; then, the searched quality parameter can be used as a target instruction parameter.
S671, using the candidate encoding parameter corresponding to the target quality parameter as the target encoding parameter of the original video, and feeding back the target encoding parameter to the requesting party.
S652, receiving a quality acquisition request sent by a requester, wherein the quality acquisition request carries candidate encoding parameters.
And S662, feeding back the quality parameters corresponding to the carried candidate encoding parameters to the requester.
Step S651-671 is a process of feeding back a corresponding target encoding parameter according to a request of the requester for carrying target quality information; steps S652 to 662 are processes of feeding back corresponding quality parameters according to a request of the requesting party carrying the candidate encoding parameters. The two processes may be in a relationship of or, that is, the requesting party may request the target encoding parameters according to the target quality information expected to be obtained; alternatively, the requesting party may request the corresponding quality parameter according to the encoding condition of the candidate encoding parameter that is expected to be known.
In the embodiment, the target coding parameters which can meet the target quality information can be notified to the requesting party by feeding back the target coding parameters to the requesting party; by feeding back the quality parameter for the requesting party, it is possible to inform the requesting party of the quality condition after being encoded by the candidate encoding parameter.
In the conventional method, if a requester needs to request quality parameters of a plurality of candidate encoding parameters, a plurality of precoding parameters need to be set according to the request, and corresponding code stream characteristics are calculated, which consumes a lot of calculation resources and time. In addition, when the adaptive system of each requester is deployed in the server, corresponding precoding parameters need to be set for each requester, which results in a complex adaptive system architecture.
In the generation method disclosed in this embodiment, the quality parameter of the original video encoded by each candidate encoding parameter can be output through the target prediction model. If the requesting party needs to request the quality parameters of a plurality of candidate coding parameters, the quality parameters are directly obtained from all predicted quality parameters without repeated calculation, so that the consumption of calculation resources and time can be greatly reduced, and corresponding precoding parameters do not need to be set for all requesting parties when the self-adaptive system of all requesting parties is laid out, thereby simplifying the self-adaptive system architecture.
In some optional implementation manners, the generation method provided by this embodiment may also be applied to a server of a video platform; after determining the target coding parameters of the original video, the method further comprises: and encoding the original video based on the target encoding parameter.
In these alternative implementations, when the quality parameter generation method is applied to a server of a video platform, the server may further encode the original video according to the determined target encoding parameter. Moreover, the transmission of the coded video can be carried out after coding, so that the video can be compressed by adopting proper coding parameters, the transmission bandwidth flow is saved and the transmission efficiency is improved under the condition of ensuring the video quality.
According to the technical scheme of the embodiment of the disclosure, steps after the quality parameters are obtained through prediction are supplemented. The quality parameters of the original video coded by the candidate coding parameters are determined by using the generation method, so that the purpose that the corresponding target coding parameters are fed back according to the request of a requester carrying target quality information can be realized, and the requester is informed of the target coding parameters capable of meeting the target quality information; the method can also realize that the corresponding quality parameters are fed back according to the request of the candidate coding parameters carried by the requester so as to inform the requester of the quality condition after the candidate coding parameters are coded.
The method for generating data provided by the embodiment of the present disclosure and the method for generating data provided by the above embodiment belong to the same disclosure concept, and the technical details that are not described in detail in the embodiment can be referred to the above embodiment, and the same technical features have the same beneficial effects in the embodiment and the above embodiment.
EXAMPLE five
Fig. 7 is a schematic structural diagram of a data generation apparatus according to a fifth embodiment of the present disclosure. The data generation apparatus provided in this embodiment is suitable for use in generating quality parameters of video, and is particularly suitable for use in generating quality parameters of video in which original data is encoded according to different candidate encoding parameters.
As shown in fig. 7, the data generation device includes:
the feature extraction module 710 is configured to sample an original video to a first resolution to obtain a sampled video, and determine a coding feature according to the sampled video;
a video classification module 720, configured to input the encoding characteristics into a video classifier, so that the video classifier outputs a target category of the original video;
a model determining module 730, configured to determine, from the prediction models, a target prediction model corresponding to the target category;
the parameter prediction module 740 is configured to input the coding characteristics into the target prediction model, so that the target prediction model predicts the quality parameters of the video encoded by the original video according to the candidate coding parameters.
In some optional implementations, the feature extraction module may be configured to:
and carrying out pre-coding processing on the sampled video, determining the code stream characteristics of the pre-coded video, and taking the code stream characteristics as coding characteristics.
In some optional implementations, the feature extraction module may be further configured to:
determining the quality characteristics of the pre-coded video according to the sampled video and the pre-coded video;
and/or determining the space-time information characteristics according to the sampled video;
correspondingly, the code stream characteristics are taken as coding characteristics, and the coding characteristics comprise: and taking at least one of the code stream characteristics, the quality characteristics and the space-time information characteristics as the coding characteristics.
In some optional implementations, the data generating apparatus further includes:
a first training module to pre-train a video classifier based on the following steps:
acquiring sample videos, and sampling each sample video to a first resolution to obtain sampled sample videos; determining a rate-distortion optimization curve of each sampled sample video; the rate distortion optimization curve represents the quality parameters of the video coded by each sampled sample video according to the candidate coding parameters; clustering each rate-distortion optimization curve to obtain candidate categories; determining a target category of each sample video from each candidate category; determining sample coding characteristics according to each sampled sample video; inputting the coding characteristics of each sample into a video classifier so that the video classifier outputs the actual output category of each sample video; and training the video classifier by taking the deviation between each actual output class and the target class of each sample video, which is smaller than the preset deviation, as a target.
In some optional implementations, the first training module may be specifically configured to:
and clustering the rate distortion optimization curves according to the mean square error distance of the rate distortion optimization curves between every two sampled sample videos.
In some optional implementations, the data generating apparatus further includes:
a second training module to pre-train the predictive model based on the steps of:
grouping the sample videos according to the target categories of the sample videos; aiming at each group of sample videos, training a gradient regression model according to a rate distortion optimization curve of the sampled sample videos corresponding to the sample videos in the group; and taking the trained gradient regression model corresponding to each group of sample videos as a prediction model of the candidate category corresponding to each group of sample videos.
In some optional implementations, the data generating apparatus further includes:
the first request receiving module is used for receiving a parameter obtaining request sent by a requesting party, wherein the parameter obtaining request carries target quality information;
a first feedback module for determining a target quality parameter from the predicted quality parameters according to the target quality information; and taking the candidate coding parameters corresponding to the target quality parameters as target coding parameters of the original video, and feeding the target coding parameters back to the requester.
In some optional implementations, the data generating apparatus further includes:
the second request receiving module is used for receiving a quality acquisition request sent by a requesting party, wherein the quality acquisition request carries candidate coding parameters;
and the second feedback module is used for feeding back the quality parameters corresponding to the carried candidate coding parameters to the requesting party.
In some optional implementation manners, the data generation device may also be applied to a server of the video platform; correspondingly, the data generating device further comprises:
and the encoding module is used for encoding the original video based on the target encoding parameter after the target encoding parameter of the original video is determined.
The data generation device provided by the embodiment of the disclosure can execute the data generation method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the embodiments of the present disclosure.
EXAMPLE six
Referring now to fig. 8, a schematic diagram of an electronic device (e.g., a terminal device or a server in fig. 8) 800 suitable for implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 8, electronic device 800 may include a processing device (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read-Only Memory (ROM) 802 or a program loaded from a storage device 806 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 806, or installed from the ROM 802. The computer program performs the above-described functions defined in the data generation method of the embodiment of the present disclosure when executed by the processing apparatus 801.
The electronic device provided by the embodiment of the present disclosure and the method for generating data provided by the above embodiment belong to the same disclosure concept, and technical details that are not described in detail in the embodiment can be referred to the above embodiment, and the embodiment has the same beneficial effects as the above embodiment.
EXAMPLE seven
The disclosed embodiments provide a computer storage medium on which a computer program is stored, which when executed by a processor implements the method of generating data provided by the above-described embodiments.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or FLASH Memory (FLASH), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with execution of instructions by an apparatus or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (Hyper Text Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:
sampling an original video to a first resolution to obtain a sampled video, and determining coding characteristics according to the sampled video;
inputting the coding characteristics into a video classifier so that the video classifier outputs a target class of an original video;
determining a target prediction model corresponding to the target category from the prediction models;
and inputting the coding characteristics into a target prediction model so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The names of the units and modules do not limit the units and modules in some cases, and for example, the parameter prediction module may be further described as a "parameter generation module".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Part (ASSP), an on-Chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with instruction execution, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, [ example one ] there is provided a method of generating data, the method comprising:
sampling an original video to a first resolution to obtain a sampled video, and determining coding characteristics according to the sampled video;
inputting the coding features into a video classifier so that the video classifier outputs a target class of the original video;
determining a target prediction model corresponding to the target category from prediction models;
and inputting the coding characteristics into the target prediction model so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters.
According to one or more embodiments of the present disclosure, [ example two ] there is provided a data generation method, further comprising:
in some optional implementations, the determining encoding characteristics from the sampled video includes:
and carrying out pre-coding processing on the sampled video, determining the code stream characteristics of the pre-coded video, and taking the code stream characteristics as coding characteristics.
According to one or more embodiments of the present disclosure, [ example three ] there is provided a data generation method, further comprising:
in some optional implementations, the determining an encoding characteristic from the sampled video further includes:
determining the quality characteristics of the pre-coded video according to the sampled video and the pre-coded video;
and/or determining the space-time information characteristics according to the sampled video;
correspondingly, the code stream characteristics are taken as coding characteristics, and the coding characteristics comprise: and taking at least one of the code stream characteristics, the quality characteristics and the space-time information characteristics as coding characteristics.
According to one or more embodiments of the present disclosure, [ example four ] there is provided a data generating method, further comprising:
in some optional implementations, the video classifier is pre-trained based on the following steps:
obtaining sample videos, and sampling each sample video to a first resolution to obtain sampled sample videos;
determining a rate-distortion optimization curve of each sampled sample video; the rate-distortion optimization curve represents quality parameters of the video coded by each sampled sample video according to candidate coding parameters;
clustering each rate-distortion optimization curve to obtain a candidate category;
determining a target category of each sample video from each candidate category;
determining sample coding characteristics according to each sampled sample video;
inputting each sample coding feature into a video classifier so that the video classifier outputs an actual output class of each sample video;
and training the video classifier by taking the deviation between each actual output class and the target class of each sample video, which is smaller than the preset deviation, as a target.
According to one or more embodiments of the present disclosure, [ example five ] there is provided a data generation method, further comprising:
in some optional implementations, the clustering each rate-distortion optimization curve includes:
and clustering the rate distortion optimization curves according to the mean square error distance of the rate distortion optimization curves between every two sampled sample videos.
According to one or more embodiments of the present disclosure, [ example six ] there is provided a data generation method, further comprising:
in some optional implementations, the predictive model is pre-trained based on:
grouping each sample video according to the target category of each sample video;
aiming at each group of sample videos, training a gradient regression model according to a rate distortion optimization curve of the sampled sample videos corresponding to the sample videos in the group;
and taking the trained gradient regression model corresponding to each group of sample videos as a prediction model of the candidate category corresponding to each group of sample videos.
According to one or more embodiments of the present disclosure, [ example seven ] there is provided a data generating method, further comprising:
in some optional implementation manners, a parameter acquisition request sent by a requester is received, wherein the parameter acquisition request carries target quality information;
determining a target quality parameter from the predicted quality parameters according to the target quality information;
and taking the candidate coding parameters corresponding to the target quality parameters as target coding parameters of the original video, and feeding the target coding parameters back to the requester.
According to one or more embodiments of the present disclosure, [ example eight ] there is provided a data generation method, further comprising:
in some optional implementation manners, a quality acquisition request sent by a requesting party is received, wherein the quality acquisition request carries candidate encoding parameters;
and feeding back the quality parameters corresponding to the carried candidate encoding parameters to the requesting party.
According to one or more embodiments of the present disclosure, [ example nine ] there is provided a data generation method, further comprising:
in some alternative implementations, the method is applied to a server of a video platform; after determining the target encoding parameters of the original video, further comprising: and encoding the original video based on the target encoding parameter.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (12)

1. A method of generating data, comprising:
sampling an original video to a first resolution to obtain a sampled video, and determining coding characteristics according to the sampled video;
inputting the coding features into a video classifier so that the video classifier outputs a target class of the original video;
determining a target prediction model corresponding to the target category from prediction models;
and inputting the coding characteristics into the target prediction model so that the target prediction model predicts the quality parameters of the video coded by the original video according to the candidate coding parameters.
2. The method of claim 1, wherein determining the coding characteristics from the sampled video comprises:
and carrying out pre-coding processing on the sampled video, determining the code stream characteristics of the pre-coded video, and taking the code stream characteristics as coding characteristics.
3. The method of claim 2, wherein determining coding characteristics from the sampled video further comprises:
determining the quality characteristics of the pre-coded video according to the sampled video and the pre-coded video;
and/or determining the space-time information characteristics according to the sampled video;
correspondingly, the code stream characteristics are taken as coding characteristics, and the coding characteristics comprise: and taking at least one of the code stream characteristics, the quality characteristics and the space-time information characteristics as coding characteristics.
4. The method of claim 1, wherein the video classifier is pre-trained based on the steps of:
obtaining sample videos, and sampling each sample video to a first resolution to obtain sampled sample videos;
determining a rate-distortion optimization curve of each sampled sample video; the rate-distortion optimization curve represents quality parameters of the video coded by each sampled sample video according to candidate coding parameters;
clustering each rate-distortion optimization curve to obtain a candidate category;
determining a target category of each sample video from each candidate category;
determining sample coding characteristics according to each sampled sample video;
inputting each sample coding feature into a video classifier so that the video classifier outputs an actual output class of each sample video;
and training the video classifier by taking the deviation between each actual output class and the target class of each sample video, which is smaller than the preset deviation, as a target.
5. The method of claim 4, wherein clustering each of the rate-distortion optimization curves comprises:
and clustering the rate distortion optimization curves according to the mean square error distance of the rate distortion optimization curves between every two sampled sample videos.
6. The method of claim 4, wherein the predictive model is pre-trained based on the steps of:
grouping each sample video according to the target category of each sample video;
aiming at each group of sample videos, training a gradient regression model according to a rate distortion optimization curve of the sampled sample videos corresponding to the sample videos in the group;
and taking the trained gradient regression model corresponding to each group of sample videos as a prediction model of the candidate category corresponding to each group of sample videos.
7. The method of claim 1, further comprising:
receiving a parameter acquisition request sent by a requester, wherein the parameter acquisition request carries target quality information;
determining a target quality parameter from the predicted quality parameters according to the target quality information;
and taking the candidate coding parameters corresponding to the target quality parameters as target coding parameters of the original video, and feeding the target coding parameters back to the requester.
8. The method of claim 1, further comprising:
receiving a quality acquisition request sent by a requester, wherein the quality acquisition request carries candidate coding parameters;
and feeding back the quality parameters corresponding to the carried candidate encoding parameters to the requesting party.
9. The method of claim 6, applied to a server of a video platform; after determining the target encoding parameters of the original video, further comprising: and encoding the original video based on the target encoding parameter.
10. An apparatus for generating data, comprising:
the system comprises a characteristic extraction module, a coding module and a decoding module, wherein the characteristic extraction module is used for sampling an original video to a first resolution ratio to obtain a sampled video and determining coding characteristics according to the sampled video;
the video classification module is used for inputting the coding features into a video classifier so that the video classifier outputs a target class of the original video;
the model determining module is used for determining a target prediction model corresponding to the target category from prediction models;
and the parameter prediction module is used for inputting the coding characteristics into the target prediction model so as to enable the target prediction model to predict the quality parameters of the video coded by the original video according to the candidate coding parameters.
11. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method of generating data as claimed in any one of claims 1-9.
12. A storage medium containing computer-executable instructions for performing a method of generating data as claimed in any one of claims 1-9 when executed by a computer processor.
CN202110591962.9A 2021-05-28 2021-05-28 Data generation method and device, electronic equipment and storage medium Active CN113329226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110591962.9A CN113329226B (en) 2021-05-28 2021-05-28 Data generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110591962.9A CN113329226B (en) 2021-05-28 2021-05-28 Data generation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113329226A true CN113329226A (en) 2021-08-31
CN113329226B CN113329226B (en) 2022-12-20

Family

ID=77422119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110591962.9A Active CN113329226B (en) 2021-05-28 2021-05-28 Data generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113329226B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117495853A (en) * 2023-12-28 2024-02-02 淘宝(中国)软件有限公司 Video data processing method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107046639A (en) * 2016-10-31 2017-08-15 上海大学 HEVC code stream quality prediction models based on content
US20180227585A1 (en) * 2017-02-06 2018-08-09 Google Inc. Multi-level Machine Learning-based Early Termination in Partition Search for Video Encoding
CN109286825A (en) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 Method and apparatus for handling video
CN110719457A (en) * 2019-09-17 2020-01-21 北京达佳互联信息技术有限公司 Video coding method and device, electronic equipment and storage medium
CN111246209A (en) * 2020-01-20 2020-06-05 北京字节跳动网络技术有限公司 Adaptive encoding method, apparatus, electronic device, and computer storage medium
CN112188310A (en) * 2020-09-28 2021-01-05 北京金山云网络技术有限公司 Test sequence construction method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107046639A (en) * 2016-10-31 2017-08-15 上海大学 HEVC code stream quality prediction models based on content
US20180227585A1 (en) * 2017-02-06 2018-08-09 Google Inc. Multi-level Machine Learning-based Early Termination in Partition Search for Video Encoding
CN109286825A (en) * 2018-12-14 2019-01-29 北京百度网讯科技有限公司 Method and apparatus for handling video
CN110719457A (en) * 2019-09-17 2020-01-21 北京达佳互联信息技术有限公司 Video coding method and device, electronic equipment and storage medium
CN111246209A (en) * 2020-01-20 2020-06-05 北京字节跳动网络技术有限公司 Adaptive encoding method, apparatus, electronic device, and computer storage medium
CN112188310A (en) * 2020-09-28 2021-01-05 北京金山云网络技术有限公司 Test sequence construction method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117495853A (en) * 2023-12-28 2024-02-02 淘宝(中国)软件有限公司 Video data processing method, device and storage medium
CN117495853B (en) * 2023-12-28 2024-05-03 淘宝(中国)软件有限公司 Video data processing method, device and storage medium

Also Published As

Publication number Publication date
CN113329226B (en) 2022-12-20

Similar Documents

Publication Publication Date Title
US10990812B2 (en) Video tagging for video communications
WO2020177722A1 (en) Method for video classification, method and device for model training, and storage medium
CN113436620B (en) Training method of voice recognition model, voice recognition method, device, medium and equipment
CN110248189B (en) Video quality prediction method, device, medium and electronic equipment
CN112153415B (en) Video transcoding method, device, equipment and storage medium
US20210211768A1 (en) Video Tagging For Video Communications
WO2023273610A1 (en) Speech recognition method and apparatus, medium, and electronic device
WO2022000298A1 (en) Reinforcement learning based rate control
CN114245209A (en) Video resolution determination method, video resolution determination device, video model training method, video coding device and video coding device
CN103929648A (en) Motion estimation method and device in frame rate up conversion
CN113329226B (en) Data generation method and device, electronic equipment and storage medium
Zhao et al. Improving the accuracy-latency trade-off of edge-cloud computation offloading for deep learning services
Zhang et al. Mfvp: Mobile-friendly viewport prediction for live 360-degree video streaming
CN112399177B (en) Video coding method, device, computer equipment and storage medium
WO2023020492A1 (en) Video frame adjustment method and apparatus, and electronic device and storage medium
CN111277838A (en) Encoding mode selection method, device, electronic equipment and computer readable medium
WO2023050433A1 (en) Video encoding and decoding method, encoder, decoder and storage medium
US11792408B2 (en) Transcoder target bitrate prediction techniques
CN115103191A (en) Image processing method, device, equipment and storage medium
CN115269978A (en) Video tag generation method, device, equipment and medium
CN1939064A (en) Video processing method and corresponding encoding device
CN114648712A (en) Video classification method and device, electronic equipment and computer-readable storage medium
CN111737575A (en) Content distribution method and device, readable medium and electronic equipment
Gao et al. Quality-aware massive content delivery in digital twin-enabled edge networks
WO2021093548A1 (en) Video decoding and encoding methods, apparatus, computer-readable medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant