CN110248189B

CN110248189B - Video quality prediction method, device, medium and electronic equipment

Info

Publication number: CN110248189B
Application number: CN201910517758.5A
Authority: CN
Inventors: 廖懿婷; 李军林; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2019-06-14
Filing date: 2019-06-14
Publication date: 2021-07-27
Anticipated expiration: 2039-06-14
Also published as: CN110248189A; WO2020248889A1

Abstract

The present disclosure provides a video quality prediction method, apparatus, electronic device, and computer-readable storage medium, the method comprising: calculating a video quality metric for video encoded at a predetermined compression level of a predetermined encoding mode; extracting bitstream information from a video encoded at a predetermined compression level of a predetermined encoding mode; and predicting a video quality score for video encoded at one or more compression levels of the one or more encoding modes based on the calculated video quality metric and the extracted bitstream information.

Description

Video quality prediction method, device, medium and electronic equipment

Technical Field

The present disclosure relates to video quality assessment, and more particularly, to a video quality prediction method, apparatus, electronic device, and computer-readable storage medium.

Background

Objective video quality metrics aim to predict how a human viewer evaluates a particular video. Objective video quality metrics are typically used to help video content providers customize the encoding options for each video and guarantee a high quality efficient compression ladder (compression ladder).

Among objective Video quality metrics, full reference metrics such as Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM), and Video multi-method Assessment Fusion (VMAF) are one of the most widely used metrics for Video quality Assessment and transcoding (decoding re-encoding) optimization. When transcoding optimization is performed, the video is typically encoded multiple times in different encoding modes (e.g., different compression levels of the same encoding mode), and the above metrics are used to calculate the video quality scores for the video encoded in the different encoding modes. The calculated video quality scores for videos encoded in different encoding modes may then be used in video adaptation decisions to help video content providers customize the encoding options for each video.

Disclosure of Invention

The present disclosure provides a video quality prediction method, apparatus, electronic device, and computer-readable storage medium.

According to an aspect of the present disclosure, there is provided a video quality prediction method, the method including: calculating a video quality metric for video encoded at a predetermined compression level of a predetermined encoding mode; extracting bitstream information from a video encoded at a predetermined compression level of a predetermined encoding mode; and predicting a video quality score for video encoded at one or more compression levels of the one or more encoding modes based on the calculated video quality metric and the extracted bitstream information.

According to another aspect of the present disclosure, there is provided a video quality prediction apparatus, the apparatus including: a calculation module for calculating a video quality metric for video encoded at a predetermined compression level of a predetermined encoding mode; an extraction module for extracting bitstream information from a video encoded at a predetermined compression level of a predetermined encoding mode; and a prediction module to predict a video quality score for video encoded at one or more compression levels of the one or more encoding modes based on the calculated video quality metric and the extracted bitstream information.

According to yet another aspect of the present disclosure, there is provided an electronic device comprising a memory and a processor, the memory having stored thereon computer program instructions which, when loaded and executed by the processor, cause the processor to perform the following: calculating a video quality metric for video encoded at a predetermined compression level of a predetermined encoding mode; extracting bitstream information from a video encoded at a predetermined compression level of a predetermined encoding mode; and predicting a video quality score for video encoded at one or more compression levels of the one or more encoding modes based on the calculated video quality metric and the extracted bitstream information.

According to yet another aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when loaded and executed by a processor, cause the processor to perform the following: calculating a video quality metric for video encoded at a predetermined compression level of a predetermined encoding mode; extracting bitstream information from a video encoded at a predetermined compression level of a predetermined encoding mode; and predicting a video quality score for video encoded at one or more compression levels of the one or more encoding modes based on the calculated video quality metric and the extracted bitstream information.

As will be described in detail below, compared to a conventional video transcoding optimization method, i.e., a method of predicting video quality scores of videos encoded in different encoding modes by transcoding videos multiple times and calculating video quality scores of videos transcoded at each time, the video quality prediction method, apparatus, electronic device, and computer-readable storage medium according to the embodiments of the present disclosure may predict video quality scores of videos encoded in different encoding modes by a single transcoding, so that the calculation cost of transcoding optimization may be significantly reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and are intended to provide further explanation of the claimed technology, and are not intended to limit the technical concepts of the present disclosure.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flow diagram of a video quality prediction method according to some embodiments of the present disclosure;

fig. 2 is another schematic flow diagram of a video quality prediction method according to some embodiments of the present disclosure;

FIG. 3 is a schematic flow diagram of training a machine learning based predictive model according to some embodiments of the present disclosure;

fig. 4 illustrates a graph of an error between a video quality score G-PSNR of a video predicted by a video quality prediction method according to some embodiments of the present disclosure and a calculated video quality score G-PSNR;

fig. 5 is a schematic diagram of an electronic device for video quality prediction, according to some embodiments of the present disclosure;

fig. 6 is another schematic diagram of an electronic device for video quality prediction, in accordance with some embodiments of the present disclosure;

fig. 7 is a schematic diagram of an electronic device for video quality prediction, in accordance with some embodiments of the present disclosure;

fig. 8 is a schematic diagram of another electronic device for video quality prediction, in accordance with some embodiments of the present disclosure;

fig. 9 is a schematic diagram of a computer-readable storage medium for video quality prediction, according to some embodiments of the present disclosure; and

fig. 10 is a block diagram of one example application scenario to which video quality prediction methods according to some embodiments of the present disclosure are applied.

Detailed Description

As described above, the conventional video transcoding optimization method encodes videos multiple times in different encoding modes (e.g., different compression levels of the same encoding mode), and calculates video quality scores of the videos encoded in the different encoding modes, respectively, using metrics such as peak signal-to-noise ratio (PSNR), Structural Similarity (SSIM), and video multi-method evaluation fusion (VMAF). In the process, the video needs to be transcoded for multiple times, and the video quality score of the transcoded video needs to be calculated for multiple times, so that the calculation is complex.

The present disclosure has been made in view of the above problems, and it is an object of the present disclosure to provide a video quality prediction method, apparatus, electronic device, and computer-readable storage medium that predict video quality scores of videos encoded at various encoding levels of various encoding modes based on a video quality metric of a video encoded at one encoding level of one encoding mode and bitstream information extracted therefrom. The video quality prediction method, apparatus, electronic device, and computer-readable storage medium proposed by the present disclosure can thus predict video quality scores of a video at various encoding levels of various encoding modes by encoding the video only once and calculating video quality scores only once.

In order to make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Referring first to fig. 1, fig. 1 is a schematic flow diagram of a video quality prediction method according to some embodiments of the present disclosure. As shown in fig. 1, a video quality prediction method according to some embodiments of the present disclosure may start at step S100.

At step S100, the video quality prediction method according to some embodiments of the present disclosure calculates a video quality metric of a video encoded at a predetermined compression level of a predetermined encoding mode. Specifically, in some embodiments, the predetermined coding mode may be a CRF (Constant Rate Factor), and the predetermined compression level may be 26; the video quality metric may include a video quality matrix, examples of which include, but are not limited to, PSNR matrices, SSIM matrices, and VMAF matrices. It should be understood that video quality metrics according to embodiments of the present disclosure may include any video quality metric that may be used to evaluate video quality, e.g., various partial reference video quality metrics for partial reference video quality evaluation and various no-reference video quality metrics for no-reference video quality evaluation, in addition to the aforementioned full-reference metrics such as PSNR, SSIM, and VMAF. At step S102, the video quality prediction method according to some embodiments of the present disclosure may extract bitstream information from a video encoded at a predetermined compression level (e.g., CRF 26) of a predetermined encoding mode. The bitstream information includes, but is not limited to, any one or more of a bitrate, a frame rate, an average I-frame size, an average I-frame qp (Quantization Parameter), an average P-frame size, an average P-frame qp, an average B-frame size, an average B-frame qp, an inter-coding mode percentage of P-frames, an inter-skipping mode percentage of P-frames, an inter-coding mode percentage of B-frames, and an inter-skipping mode percentage of B-frames. Thereafter, the video quality prediction method according to some embodiments of the present disclosure may proceed to step S104. At step S104, the video quality prediction method according to some embodiments of the present disclosure predicts video quality scores of videos encoded at one or more compression levels of one or more encoding modes based on the video quality metric calculated at step S100 and the bitstream information extracted at step S102. In particular, in some embodiments, video quality prediction methods according to some embodiments of the present disclosure may use machine learning-based prediction models to predict video quality scores for video encoded at one or more compression levels of one or more encoding modes. More specifically, in some embodiments, the machine learning based predictive model used at step S104 may be a machine learning based predictive model trained using a gradient boosting algorithm. Alternatively, in other embodiments, the machine learning based predictive model used at step S104 may be a machine learning based predictive model trained using a genetic algorithm. Also, the one or more compression levels of the one or more coding modes may be, for example, CRF 30, CRF 31, CRF 40, 2-pass-ABR (Average Bit-Rate), 3-pass-ABR, 1-pass-CBR (Constant Bit-Rate), etc.

In this disclosure, a video quality score for a video may be used to represent the quality of the video. The video quality score of the video may be obtained based on at least video quality metrics such as VMAF, PSNR, SSIM, and the like. Specifically, the video quality score obtained based on at least the VMAF may be represented by a VMAF-value, the video quality score obtained based on at least the PSNR may be represented by a G-PSNR, and the video quality score obtained based on at least the SSIM may be represented by a G-SSIM. In addition to obtaining a video quality score for a video based on at least a single video quality metric as previously described, a video quality score for a video may also be obtained based on multiple video quality metrics. For example, in some embodiments, the video quality score of a video may be obtained based on the video quality metrics VMAF, PSNR, and SSIM.

It should be understood that although CRF 26 is used in the foregoing description as an example of a predetermined compression level for a predetermined encoding scheme, CRF 30, CRF 31, CRF 40, 2-pass-ABR, 3-pass-ABR, 1-pass-CBR are used as examples of one or more compression levels for one or more encoding schemes. The present disclosure is not so limited. That is, the video quality prediction method according to the embodiments of the present disclosure may predict the video quality score of the video encoded at any compression level of any encoding mode based on the video quality metric of the video encoded at any compression level of any encoding mode and bitstream information extracted therefrom, wherein the encoding mode may be any rate control mode that may be used for video encoding, such as, for example, CRF, ABR, CBR, or the like, existing or developed in the future; and the compression level may represent a variable parameter in the rate control mode. For example, for a CRF, the compression level may be any one or more of compression levels 0-51. As another example, for ABR, the compression level may be any one or more of the encoding degree 1-pass (i.e., 1-pass-ABR), 2pass (i.e., 2-pass-ABR), or 3pass (i.e., 3-pass-ABR), among others. Illustratively, in some embodiments, the quality score of video encoded at a predetermined compression level of a predetermined encoding mode (e.g., CRF, ABR, or CBR) may be predicted for one or more other compression levels of the predetermined encoding mode. For example, the predetermined compression level of the predetermined encoding mode may be CRF 23, and one or more compression levels of one or more encoding modes may be CRF 26, CRF 30, etc.; alternatively, the predetermined compression level of the predetermined encoding mode may be the original video, and one or more compression levels of one or more encoding modes may be the CRF 26. Alternatively, in other embodiments, the quality score of video encoded at one or more compression levels of other encoding modes may be predicted for video encoded at a certain compression level of a predetermined encoding mode (e.g., CRF, ABR, or CBR). For example, the predetermined compression level of the predetermined coding pattern may be the CRF 30, and one or more compression levels of one or more coding patterns may be a second order fixed average code rate (2-pass-ABR) pattern. It should be understood that although in some of the foregoing embodiments, an example of one or more compression levels of one or more coding modes is a certain compression level of one coding mode (e.g., CRF 26 or 2-pass-ABR), the disclosure is not so limited. In other words, the video quality prediction method according to an embodiment of the present disclosure may simultaneously or sequentially predict the quality scores of a video at a plurality of coding levels of a plurality of coding modes, with the video being coded at a certain coding level of one coding mode. For example, the video quality prediction method according to the embodiment of the present disclosure may predict the quality scores of the video at any compression level (e.g., 1-pass-ABR or 2-pass-ABR) of CRF 0 to CRF 25, CRF 27 to CRF 51 and ABR simultaneously or sequentially with the CRF 26-encoded video.

It should also be understood that while a machine learning based prediction model is used in the foregoing description to predict video quality scores for video encoded at one or more compression levels of one or more encoding modes based on the video quality metrics computed at step S100 and the bitstream information extracted at step S102. The present disclosure is not limited thereto, that is, the video quality prediction method according to embodiments of the present disclosure may use any prediction model or prediction algorithm to predict video quality scores of video encoded at one or more compression levels of one or more encoding modes based on video quality metrics and bitstream information. It should also be understood that although in the foregoing description, the machine learning based predictive model is a machine learning based predictive model using a gradient boosting algorithm or a genetic algorithm, the present disclosure is not limited thereto.

In addition, the video quality prediction methods according to some embodiments of the present disclosure described above in connection with fig. 1 may be used to predict other features, such as code rate, of video encoded at one or more compression levels of one or more encoding modes, in addition to predicting video quality scores, such as VMAF, PSNR, SSIM, and the like.

The video quality prediction method according to the embodiment of the present disclosure described above in connection with fig. 1 predicts video quality scores of target videos encoded at one or more compression levels of one or more encoding modes based on video quality metrics of the target videos encoded at predetermined compression levels of predetermined encoding modes and bitstream information extracted therefrom. Compared with the conventional video transcoding optimization method that the target video is coded at each compression level of each coding mode and the video quality scores of the target video coded at each compression level of each coding mode are respectively calculated, the video quality prediction method according to the embodiment of the disclosure can significantly reduce the computational complexity and the computational complexity.

Fig. 2 is another schematic flow diagram of a video quality prediction method according to some embodiments of the present disclosure. Compared to the schematic flow of the video quality prediction method according to some embodiments of the present disclosure shown in fig. 1, the exemplary flow of the video quality prediction method according to some embodiments of the present disclosure shown in fig. 2 includes a step S200 in addition to the step S100, the step S102, and the step S104. That is, in the case where the target video is not encoded in a predetermined encoding mode for prediction (i.e., a predetermined compression level of a predetermined encoding mode), the video quality prediction method according to an embodiment of the present disclosure may include step S200. At step S200, the video quality prediction method according to the embodiment of the present disclosure may encode the target video at a predetermined compression level of a predetermined encoding mode. For example, in some embodiments, the predetermined compression level of the predetermined encoding mode is CRF 26, and the target video is the original video or the video encoded with CRF 23. In such a case, the video quality prediction method according to the embodiment of the present disclosure first encodes the target video with the CRF 26. Then, the video quality scores of the target video at one or more coding levels of one or more coding modes are predicted by a video quality prediction method according to an embodiment of the present disclosure.

It should be understood that although step S100 (calculating a video quality metric of a video encoded at a predetermined compression level of a predetermined encoding mode) and step S102 (extracting bitstream information from a video encoded at a predetermined compression level of a predetermined encoding mode) are sequentially illustrated in the schematic flow of the video quality prediction method according to the embodiment of the present disclosure described in conjunction with fig. 1 and 2, the present disclosure is not limited thereto. That is, the video quality prediction method according to the embodiment of the present disclosure may perform step S100 and step S102 in any order or in parallel. For example, in some embodiments, step S100 and step S102 may be performed sequentially. Alternatively, in other embodiments, step S102 and step S100 may be performed sequentially. Alternatively, in still other embodiments, step S100 and step S102 may be performed in parallel.

The video quality prediction method according to the embodiment of the present disclosure described in conjunction with fig. 2 only needs to encode the target video once to predict the video quality score of the target video at any compression level of any other encoding mode. Compared with the conventional video transcoding optimization method that the target video is coded by each compression level of each coding mode and the video quality scores of the target videos coded by each compression level of each coding mode are respectively calculated, the video quality prediction method according to the embodiment of the disclosure can significantly reduce the calculation complexity and the calculation amount. For example, assuming that video quality scores of a target video at N encoding modes (e.g., CRF 0 to CRF 51) need to be calculated, the conventional method needs to encode the target video in the N encoding modes and calculate the video quality scores of the videos encoded by each encoding mode respectively. The video quality prediction method according to the embodiment of the present disclosure only needs to encode the video once (for example, encode the target video using the CRF 26), so as to predict the video quality scores of the target video at the N encoding modes. Thus, compared to conventional video transcoding optimization methods, the video quality prediction method according to embodiments of the present disclosure is approximately 1/N as computationally intensive as conventional video transcoding optimization methods.

Also, since the video quality prediction method according to the embodiment of the present disclosure described in conjunction with fig. 2 includes a step of encoding the target video in a predetermined encoding manner for prediction, i.e., a predetermined compression level of a predetermined encoding mode (step S200), the video quality prediction method according to the embodiment of the present disclosure described in conjunction with fig. 2 can predict the video quality scores of any video (e.g., the original video and the video not encoded at the predetermined compression level of the predetermined encoding mode) after being encoded at any encoding level of any encoding mode.

In the foregoing disclosure, the present disclosure describes, in conjunction with fig. 1 and 2, an example flow of a video quality prediction method according to an embodiment of the present disclosure. As described in the foregoing, the video quality prediction method according to an embodiment of the present disclosure may use a machine learning-based prediction model to predict video quality scores of videos encoded at one or more compression levels of one or more encoding modes based on video quality metrics of videos encoded at predetermined compression levels of predetermined encoding modes and bitstream information extracted therefrom. The machine learning based prediction model may be suitably trained prior to predicting a video quality score of video encoded at one or more compression levels of one or more encoding modes based on video quality metrics of video encoded at predetermined compression levels of the predetermined encoding modes and bitstream information extracted therefrom using the machine learning based prediction model. In the following, the present disclosure will describe an example method of training a machine learning based predictive model according to an embodiment of the present disclosure in connection with fig. 3.

Fig. 3 is a schematic flow diagram of a method of training a machine learning based predictive model according to some embodiments of the present disclosure. As shown in fig. 3, a method of training a machine learning based predictive model according to some embodiments of the present disclosure may begin at step S300.

In step S300, a method of training a machine learning based prediction model according to some embodiments of the present disclosure may calculate a training video quality metric for video encoded at a predetermined compression level of a predetermined encoding mode. Alternatively or additionally, the method may encode the training video at a predetermined compression level of a predetermined encoding mode before calculating the training video quality metric for video encoded at the predetermined compression level of the predetermined encoding mode. As described above, the predetermined compression level of the predetermined encoding mode during training may be any encoding mode, such as the CRF 26. In step S302, a method of training a machine learning based prediction model according to some embodiments of the present disclosure may extract training bitstream information from the video encoded at a predetermined compression level of a predetermined encoding mode. As described above, the training bitstream information includes, but is not limited to, any one or more of bit rate, frame rate, average I frame size, average I frame qp, average P frame size, average P frame qp, average B frame size, average B frame qp, inter-coded mode percentage of P frames, inter-skipped mode percentage of P frames, inter-coded mode percentage of B frames, and inter-skipped mode percentage of B frames. At step S304, a training video quality score is calculated for video encoded at one or more compression levels of one or more encoding modes. Similar to step S300, alternatively or additionally, the method may encode the training video at one or more compression levels of one or more encoding modes prior to calculating a training video quality score for video encoded at the one or more compression levels of the one or more encoding modes. Thereafter, the method may proceed to step S306. At step S306, the method of training a machine learning based prediction model according to some embodiments of the present disclosure may train the machine learning based prediction model with a training video quality score as a training target based on the training video quality metric and the training bitstream information. As described above, the compression level or levels of the coding mode or modes during the training process may be any coding scheme, such as one or more of CRF 0 through CRF 51. Specifically, at step S306, in some embodiments, the method of training a machine learning based prediction model according to some embodiments of the present disclosure may use a gradient boosting algorithm to train the machine learning based prediction model with a training video quality score as a training target based on the training video quality metric and the training bitstream information. Alternatively, at step S306, in other embodiments, the method of training a machine learning based prediction model according to some embodiments of the present disclosure may use a genetic algorithm to train the machine learning based prediction model with a training video quality score as a training target based on the training video quality metric and the training bitstream information.

It should be understood that, similar to fig. 1 and 2, although step S300 (calculating a training video quality metric of a video encoded at a predetermined compression level of a predetermined encoding mode), step S302 (extracting training bitstream information from a video encoded at a predetermined compression level of a predetermined encoding mode), and step S304 (calculating a training video quality score of a video encoded at one or more compression levels of one or more encoding modes) are sequentially illustrated in the schematic flow of training a machine learning based prediction model according to an embodiment of the present disclosure described in connection with fig. 3, the present disclosure is not limited thereto. That is, the video quality prediction method according to the embodiment of the present disclosure may perform step S300, step S302, and step S304 in any order or in parallel.

The method of training a machine learning based predictive model according to an embodiment of the present disclosure described above in connection with fig. 3 may use training video to train the machine learning based predictive model. By training, the machine learning based prediction model may well predict video quality scores of videos encoded at one or more compression levels of one or more coding modes based on video quality metrics of videos encoded at predetermined compression levels of predetermined coding modes and bitstream information extracted therefrom.

In the above, a video quality prediction method according to an embodiment of the present disclosure is described in conjunction with fig. 1 to 3. Hereinafter, the present disclosure will describe a prediction result and performance of a video quality prediction method according to an embodiment of the present disclosure in conjunction with table 1 and fig. 4.

In an example of the present disclosure, a set of 11290 videos of 720P was used to train and test a video quality prediction method according to an embodiment of the present disclosure, where 80% of the videos were used for training and 20% of the videos were used for testing. In this example, the predetermined compression level of the predetermined encoding mode is the CRF 26, and one or more compression levels of one or more encoding modes are the CRFs 22-34. That is, the video encoded with the encoding scheme CRF 26 is used to predict the video quality scores at CRF 22 through CRF 34. And in this example 3 machine learning based prediction models are used to predict the video quality scores VMAF, PSNR and SSIM, respectively, of the video. The three machine learning based predictive models are three predictive models trained using a gradient boosting algorithm. The bitstream information extracted from the encoded video includes bit rate, frame rate, average I frame size, average I frame qp, average P frame size, average P frame qp, average B frame size, average B frame qp, inter-coded mode percentage of P frames, inter-skipped mode percentage of P frames, inter-coded mode percentage of B frames, inter-skipped mode percentage of B frames. Table 1 shows the predicted average performance.

TABLE 1

In table 1, MAE represents the mean square error of all predicted points; the different percentages of MAE represent the errors of the respective predicted points after being arranged in order from small to large, wherein the errors at the positions of different percentages of all the predicted points, for example MAE — 50%, represent the median of the errors of all the predicted points.

As can be seen from table 1, in the above example, the mean square errors of the predicted video quality scores VMAF-value, G _ PSNR, and G _ SSIM of all the prediction points are 0.5199, 0.1343, and 0.0009, respectively, and since the average quality difference between two consecutive CRFs (e.g., CRF 24 and CRF 25) of the video quality scores VMAF-value, G _ PSNR, and G _ SSIM is 1.43, 0.52, and 0.002, respectively, the video quality prediction method according to the embodiment of the present disclosure predicts the video quality scores of videos encoded in respective encoding manners with good accuracy.

To further illustrate the performance of the video quality prediction method according to embodiments of the present disclosure. Fig. 4 illustrates a graph of an error between a video quality score G-PSNR of a video predicted by a video quality prediction method according to some embodiments of the present disclosure and a calculated video quality score G-PSNR. Fig. 4 shows the average G _ PSNR prediction at different CRFs. Referring to fig. 4, it can be seen that the prediction error of different CRFs is smaller than the quality difference between two consecutive CRFs, which indicates that the performance of the video quality prediction method according to an embodiment of the present disclosure is very good. In addition, as can also be seen with reference to fig. 4, the closer the target CRF (i.e., one or more compression levels of one or more encoding modes) is to the encoded CRF (i.e., the predetermined compression level of the predetermined encoding mode, in this example, CRF 26), the more accurate the result of the prediction.

In the above, the video quality prediction method according to the embodiment of the present disclosure is described in conjunction with fig. 1 to 3, and the performance of the video quality prediction method according to the embodiment of the present disclosure is described in conjunction with table 1 and fig. 4. As can be seen from the description in conjunction with table 1 and fig. 4, the video quality prediction method according to the embodiment of the present disclosure has good performance.

Hereinafter, the present disclosure will describe an apparatus, an electronic device, and a computer-readable storage medium for video quality prediction according to embodiments of the present disclosure in conjunction with fig. 5 to 9.

Fig. 5 is a schematic diagram of an electronic device 500 for video quality prediction, according to some embodiments of the present disclosure. As shown in fig. 5, an electronic device 500 for video quality prediction according to some embodiments of the present disclosure may include a calculation module 510, an extraction module 520, and a prediction module 530. Wherein the calculation module 510 is configured to calculate a video quality metric of a video encoded at a predetermined compression level of a predetermined encoding mode; an extraction module 520 for extracting bitstream information from a video encoded at a predetermined compression level of a predetermined encoding mode; and a prediction module 530 for predicting a video quality score for video encoded at one or more compression levels of the one or more encoding modes based on the calculated video quality metric and the extracted bitstream information. Alternatively or additionally, the calculation module 510, the extraction module 520, and the prediction module 530 shown in fig. 5 may also perform the video quality prediction method according to the embodiment of the present disclosure described above in conjunction with fig. 1 and 2.

Fig. 6 is a schematic diagram of an electronic device 600 for video quality prediction, according to some embodiments of the present disclosure. As shown in fig. 6, an electronic device 600 for video quality prediction according to some embodiments of the present disclosure may include a training module 640, in addition to a calculation module 610, an extraction module 620, and a prediction module 630 similar to the calculation module 510, the extraction module 520, and the prediction module 530 included in the electronic device 500 shown in fig. 5. Among them, the operations performed by the calculation module 610, the extraction module 620, and the prediction module 630 in the electronic device 600 shown in fig. 6 are similar to those performed by the calculation module 510, the extraction module 520, and the prediction module 530 shown in fig. 5, and a detailed description thereof is omitted here for the sake of simplicity. The training module 640 in the electronic device 600 shown in fig. 6 is configured to perform the following operations: calculating a training video quality metric for video encoded at a predetermined compression level of a predetermined encoding mode; extracting training bitstream information from video encoded at a predetermined compression level of a predetermined encoding mode; calculating a training video quality score for video encoded at one or more compression levels of one or more encoding modes; and training the machine learning based predictive model with the training video quality score as a training target based on the training video quality metric and the training bitstream information. Alternatively or additionally, the training module 640 may be used to perform the training method according to an embodiment of the present disclosure described above in connection with fig. 3.

Fig. 7 is a schematic diagram of an electronic device 700 for video quality prediction, in accordance with some embodiments of the present disclosure. As shown in fig. 7, an electronic device 700 for video quality prediction according to an embodiment of the present disclosure may include a processor 710 and a memory 720, the memory 720 having stored thereon computer program instructions that, when loaded and executed by the processor 710, cause the processor 710 to perform the video quality prediction method according to an embodiment of the present disclosure described above in connection with fig. 1 to 3.

Fig. 8 is another schematic diagram of an electronic device 800 for video quality prediction, in accordance with some embodiments of the present disclosure. Referring now to FIG. 8, a block diagram of an electronic device 800 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device 800 may be a cloud platform, a server, a terminal device, and the like. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 810 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)820 or a program loaded from a storage 880 into a Random Access Memory (RAM) 830. In the RAM 830, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing device 810, the ROM 820, and the RAM 830 are connected to each other by a bus 840. An input/output (I/O) interface 850 is also connected to bus 840.

Generally, the following devices may be connected to the I/O interface 850: input devices 880 including, for example, touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 870 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, or the like; storage 880 including, for example, magnetic tape, hard disk, etc.; and a communication device 890. The communication device 890 may allow the electronic apparatus 800 to communicate wirelessly or by wire with other apparatuses to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through communication device 890, or installed from storage device 880, or installed from ROM 820. The computer program, when executed by the processing device 810, performs the above-described functions defined in the methods of embodiments of the present disclosure.

Fig. 9 is a schematic diagram of a computer-readable storage medium 900 for video quality prediction, according to some embodiments of the present disclosure. As shown in fig. 9, a computer-readable storage medium 900 for video quality prediction according to an embodiment of the present disclosure has stored thereon computer program instructions 910, which when loaded and executed by a processor, causes the processor to perform the video quality prediction method according to an embodiment of the present disclosure described above in connection with fig. 1 to 3.

In the above, the video quality prediction method, apparatus, electronic device, and computer-readable storage medium according to the embodiments of the present disclosure are described in conjunction with fig. 1 to 3, 5 to 9, and the performance of the video quality prediction method according to the embodiments of the present disclosure is described in conjunction with table 1 and 4. As can be seen from the above description, the video quality prediction method, apparatus, electronic device and computer-readable storage medium according to the embodiments of the present disclosure may significantly reduce computational complexity and computational effort and have good prediction performance, compared to the conventional video transcoding optimization method.

Hereinafter, the present disclosure will describe one example application scenario to which a video quality prediction method according to an embodiment of the present disclosure may be applied, in conjunction with fig. 10.

Fig. 10 is a block diagram of one example application scenario to which video quality prediction methods according to some embodiments of the present disclosure are applied. A scenario to which a video quality prediction method according to an embodiment of the present disclosure is applied, as illustrated in fig. 9, may include predicting a video quality score (1000) and video adaptation (1010). Specifically, a video quality prediction method according to an embodiment of the present disclosure may calculate (1002) a video quality metric (1004) of a target video encoded at a predetermined compression level of a predetermined encoding mode and extract bitstream information (1006) therefrom. The calculated video quality metric (1004) and the extracted bitstream information (1006) are then input into a trained predictive model (1008). A prediction model (1008) predicts a video quality score of the target video at one or more compression levels of one or more encoding modes based on the calculated video quality metric (1004) and the extracted bitstream information (1006). The video quality scores of the target video at one or more compression levels of one or more coding modes predicted by the prediction model (1008) may be used for video adaptation (1010). For example, in some embodiments, the video adaptation module may select an appropriate coding mode and corresponding appropriate coding level to encode the target video according to a video quality score of the target video predicted by the prediction model at one or more compression levels of the one or more coding modes. And in some embodiments, the target video encoded in the selected appropriate encoding mode and corresponding appropriate encoding level may be later transmitted to the target client device. It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.

Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. According to one or more embodiments of the present disclosure, there is provided a video quality prediction method and an apparatus/electronic device/computer-readable storage medium implementing the same, the method including calculating a video quality metric of a video encoded at a predetermined compression level of a predetermined encoding mode; extracting bitstream information from the video encoded at a predetermined compression level of the predetermined encoding mode; and predicting a video quality score for video encoded at one or more compression levels of the one or more encoding modes based on the calculated video quality metric and the extracted bitstream information.

According to one or more embodiments of the present disclosure, in the method, predicting video quality scores of video encoded at one or more compression levels of one or more encoding modes based on the calculated video quality metrics and the extracted bitstream information comprises: predicting, by a machine learning based prediction model, a video quality score for video encoded at one or more compression levels of one or more encoding modes based on the calculated video quality metrics and the extracted bitstream information.

In accordance with one or more embodiments of the present disclosure, the method further comprises: calculating a training video quality metric for video encoded at a predetermined compression level of the predetermined encoding mode; extracting training bitstream information from the video encoded at a predetermined compression level of the predetermined encoding mode; calculating a training video quality score for video encoded at one or more compression levels of the one or more encoding modes; and training the machine learning based predictive model with the training video quality score as a training target based on the training video quality metric and the training bitstream information.

According to one or more embodiments of the present disclosure, the one or more coding modes in the method include the predetermined coding mode.

According to one or more embodiments of the present disclosure, the predetermined coding pattern in the method comprises at least one of a fixed rate coefficient CRF pattern, an average rate ABR pattern, and a fixed rate CBR pattern.

According to one or more embodiments of the present disclosure, the bitstream information in the method includes: at least one of a bit rate, a frame rate, an average I frame size, an average I frame qp, an average P frame size, an average P frame qp, an average B frame size, an average B frame qp, an inter-coding mode percentage of P frames, an inter-skip mode percentage of P frames, an inter-coding mode percentage of B frames, and an inter-skip mode percentage of B frames.

In accordance with one or more embodiments of the present disclosure, the video quality metric in the method includes a video quality matrix.

According to one or more embodiments of the disclosure, the video quality matrix in the method includes at least one of a peak signal-to-noise ratio (PSNR) matrix, a Structural Similarity Index (SSIM) matrix and a video multi-method evaluation fusion (VMAF) matrix.

According to one or more embodiments of the present disclosure, the method further comprises encoding the video at a predetermined compression level of a predetermined encoding mode before calculating the video quality metric for the video encoded at the predetermined compression level of the predetermined encoding mode.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A video quality prediction method, the method comprising:

calculating a video quality metric for video encoded at a predetermined compression level of a predetermined encoding mode;

extracting bitstream information from the video encoded at a predetermined compression level of the predetermined encoding mode; and

a video quality score for video encoded at one or more compression levels of the one or more encoding modes is predicted by a machine-learning based prediction model based on the calculated video quality metrics and the extracted bitstream information.

2. The method of claim 1, further comprising:

calculating a training video quality metric for video encoded at a predetermined compression level of the predetermined encoding mode;

extracting training bitstream information from the video encoded at a predetermined compression level of the predetermined encoding mode;

calculating a training video quality score for video encoded at one or more compression levels of the one or more encoding modes; and

training the machine learning based predictive model with the training video quality score as a training target based on the training video quality metric and the training bitstream information.

3. The method of any of claims 1-2, wherein the one or more coding modes include the predetermined coding mode.

4. The method of claim 3, wherein the predetermined coding pattern comprises at least one of a fixed rate Coefficient (CRF) pattern, an average code rate (ABR) pattern, and a fixed rate (CBR) pattern.

5. The method of any of claims 1-2, wherein the bitstream information comprises: at least one of a bit rate, a frame rate, an average I frame size, an average I frame qp, an average P frame size, an average P frame qp, an average B frame size, an average B frame qp, an inter-coding mode percentage of P frames, an inter-skip mode percentage of P frames, an inter-coding mode percentage of B frames, and an inter-skip mode percentage of B frames.

6. The method of any of claims 1-2, wherein the video quality metric comprises a video quality matrix.

7. The method of claim 6, wherein the video quality matrix comprises at least one of a peak signal-to-noise ratio (PSNR) matrix, a Structural Similarity Index (SSIM) matrix, and a video multi-method evaluation fusion (VMAF) matrix.

8. The method of any of claims 1-2, further comprising: the video is encoded at a predetermined compression level of a predetermined encoding mode before calculating a video quality metric for the video encoded at the predetermined compression level of the predetermined encoding mode.

9. A video quality prediction device, the device comprising:

a calculation module for calculating a video quality metric for video encoded at a predetermined compression level of a predetermined encoding mode;

an extraction module for extracting bitstream information from the video encoded at a predetermined compression level of a predetermined encoding mode; and

a prediction module to predict a video quality score of video encoded at one or more compression levels of one or more encoding modes using a machine learning based prediction model based on the calculated video quality metrics and the extracted bitstream information.

10. The apparatus of claim 9, further comprising:

a training module to perform the following operations:

extracting training bitstream information from video encoded at a predetermined compression level of the predetermined encoding mode;

11. An electronic device, comprising:

a processor; and

a memory for storing computer program instructions;

wherein, when the computer program instructions are loaded and executed by the processor, the processor performs the method of any of claims 1 to 8.

12. A computer readable storage medium having stored thereon computer program instructions which, when loaded and executed by a processor, cause the processor to perform the method of any of claims 1 to 8.