WO2018153161A1

WO2018153161A1 - Video quality evaluation method, apparatus and device, and storage medium

Info

Publication number: WO2018153161A1
Application number: PCT/CN2017/119261
Authority: WO
Inventors: 刘祥凯
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2017-02-24
Filing date: 2017-12-28
Publication date: 2018-08-30
Also published as: CN108513132B; CN108513132A

Abstract

Disclosed are a video quality evaluation method, apparatus and device, and a storage medium. The method comprises: dividing each frame of image in a video, on which a quality evaluation needs to be performed, into image blocks of a pre-set size; determining a distortion metric value of each of the image blocks according to a standard deviation of space-time domain gradients of each of the image blocks and a mean squared error of pixel values; determining a distortion metric value of each frame of image according to the distortion metric value of the image block included in each frame of image; and determining a distortion metric value of the video according to the distortion metric value of each frame of image.

Description

Video quality evaluation method and device, device and storage medium

Cross-reference to related applications

The present application is filed on the basis of the Chinese Patent Application No. PCT Application No. PCT Application No. PCT Application No. .

Technical field

The present invention relates to the field of multimedia information processing, and in particular, to a video quality evaluation method, device, device, and storage medium.

Background technique

Video quality evaluation can be divided into two categories, subjective video quality evaluation and objective video quality evaluation. Subjective video quality evaluation refers to the organization of testers to view a set of video with distortion according to the prescribed experimental process, and subjectively score the quality of the video. The subjective video quality evaluation may calculate the mean of each test video as the Mean Opinion Score (MOS), or may calculate the score of each test video and the score of the original reference video corresponding to the video. Difference Mean Opinion Score (DMOS). Subjective video quality evaluation can get the result closest to the human eye's true visual perception quality, but the experiment is time consuming and laborious and cannot be applied to real-time video compression and processing systems.

The objective video quality evaluation algorithm can automatically predict the quality of the video and is therefore more practical. In order to evaluate whether the objective video quality evaluation algorithm can accurately predict the true visual perception quality of the human eye, a more complete and complete data set is needed for test verification. The contribution of subjective video quality evaluation is to establish a public test video data set and provide corresponding MOS or DMOS data for testing the performance of different objective video quality evaluation algorithms.

The objective video quality evaluation algorithm can be roughly divided into three categories according to whether the original reference video needs to be used in the calculation process, which are Full Reference (FR), Reduce Reference (RR) and No Reference (No Reference). , NR) video quality evaluation algorithm. The most straightforward way to calculate the distortion of a video image is to compare the original image to the distorted image pixel by pixel, such as the most basic image distortion metric method Mean Square Error (MSE) and Peak Signal to Noise Ratio (Peak Signal to Noise Ratio, PSNR). However, the process of direct pixel-by-pixel comparison does not reflect the perceptual characteristics of the human eye to the distortion of the video image. Therefore, an algorithm for extracting and comparing certain pixel statistical features of the original image and the distorted image appears, such as a structural similarity algorithm (Structural). Similarity Index Measurement, SSIM). In essence, the basic process of the full reference video quality evaluation algorithm is to extract the multiple visual statistic features of the original video image and the distorted video image to form the feature vector, and estimate the distortion degree of the video image by comparing the distance between the feature vectors. .

At present, the full reference video quality evaluation algorithm also has a Video Quality Model (VQM) algorithm and a MOtion Based Video Integrity Evaluation Index (MOVIE) algorithm.

A disadvantage of the existing full reference video quality algorithm is that it is difficult to meet both the quality prediction accuracy and the computational low complexity requirements. Because of its simple calculation, PSNR is widely used in video image processing systems with high real-time requirements such as video coding. However, the experimental results show that the correlation between the video quality score calculated by PSNR and the real subjective score is poor. . Advanced video quality evaluation algorithms such as VQM and MOVIE can effectively predict video quality and approximate the subjective perceived quality score of the human eye. However, the calculation of the VQM and MOVIE algorithms is extremely complicated and can only be applied to offline calculation of video quality. Since the video encoder needs to calculate the distortion of the reconstructed image in real time during the encoding process and select the encoding parameters based on this, the video quality evaluation algorithms such as VQM and MOVIE cannot be applied to the video encoder.

Summary of the invention

In order to solve the existing technical problems, the embodiments of the present invention provide a video quality evaluation method, device, device, and storage medium, which can reduce the computational complexity while ensuring the accuracy of video quality estimation, and can be applied to A video image processing system such as video coding that requires high real-time performance.

The technical solution of the embodiment of the present invention is implemented as follows:

An embodiment of the present invention provides a video quality evaluation method, where the method includes:

Each frame of the video in which the quality evaluation is required is divided into image blocks according to a preset size;

Determining a distortion metric value of each image block according to a standard deviation of a spatial time domain gradient of each image block and a mean square error of the pixel value;

Determining a distortion metric value of each frame image according to a distortion metric value of the image block included in each frame image;

A distortion metric value of the video is determined based on a distortion metric of the image of each frame.

An embodiment of the present invention provides a video quality evaluation apparatus, where the apparatus includes:

a dividing module configured to divide an image block according to a preset size in each frame of the video that needs to be subjected to quality evaluation;

a first determining module, configured to determine a distortion metric value of each image block according to a standard deviation of a space time domain gradient of each image block and a mean square error of the pixel value;

a second determining module, configured to determine a distortion metric value of each frame image according to a distortion metric value of the image block included in each frame image;

And a third determining module, configured to determine a distortion metric value of the video according to the distortion metric of each frame image.

An embodiment of the present invention provides a video quality evaluation apparatus, including a memory, a processor, and a computer program stored on the memory and operable on the processor, where the processor implements the video quality evaluation method when the program is executed .

Embodiments of the present invention provide a computer readable storage medium having stored thereon a computer program, wherein the computer program is implemented by a processor to implement the video quality evaluation method described above.

An embodiment of the present invention provides a video quality evaluation method, device, device, and storage medium, where the method includes: first, dividing each frame of a video that needs to be subjected to quality evaluation into image blocks according to a preset size; Determining the distortion metric of each image block by the standard deviation of the spatial time domain gradient of an image block and the mean square error of the pixel value; and determining the distortion metric according to the image block included in each frame image a distortion metric for each frame of image; and finally determining a distortion metric for the video based on the distortion metric for each frame of the image. In this way, not only the accuracy of the video quality estimation can be ensured, but also the computational complexity is reduced, and the video image processing system such as video coding with high real-time requirements can be applied.

DRAWINGS

In the drawings, which are not necessarily to scale, the Like reference numerals with different letter suffixes may indicate different examples of similar components. The drawings generally illustrate the various embodiments discussed herein by way of example and not limitation.

1 is a schematic flowchart of an implementation process of a video quality evaluation method according to an embodiment of the present invention;

2 is a schematic flowchart of an implementation process of a video quality evaluation method according to Embodiment 2 of the present invention;

3 is a first template for calculating a horizontal gradient of a pixel according to Embodiment 2 of the present invention;

4 is a second template for calculating a vertical gradient of a pixel according to Embodiment 2 of the present invention;

FIG. 5 is a third template for calculating a time domain gradient of a pixel according to Embodiment 2 of the present invention; FIG.

6 is a correlation scatter diagram between a video distortion score calculated by a video quality evaluation method and a subjective experimental distortion DMOS score of a LIVE data set according to Embodiment 3 of the present invention;

7 is a correlation scatter plot between the video distortion score calculated by the VQM method and the subjective experimental distortion DMOS score of the LIVE data set video;

8 is a correlation scatter plot between the video distortion score calculated by the PSNR method and the subjective experimental distortion DMOS score of the LIVE data set video;

9 is a correlation scatter plot between the video distortion score calculated by the SSIM method and the subjective experimental distortion DMOS score of the LIVE data set video;

10 is a schematic structural diagram of a video quality evaluation apparatus according to Embodiment 4 of the present invention;

FIG. 11 is a schematic diagram of a hardware entity of a terminal according to an embodiment of the present invention.

detailed description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the invention will be further described in detail below with reference to the accompanying drawings in the embodiments. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

Embodiment 1

The embodiment of the present invention provides a video quality evaluation method, which is applied to a video quality evaluation device, and the video quality evaluation device includes, but is not limited to, a terminal such as a computer, a tablet computer, or a smart phone. 1 is a schematic flowchart of an implementation process of a video quality evaluation method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps:

Step S101, dividing each image frame in the video that needs to be subjected to quality evaluation into image blocks according to a preset size;

Here, the size of the image block can be set according to actual needs, and is generally set to an image block with the same number of rows and columns, such as an image block set to 8*8 or 16*16.

Step S102, determining a distortion metric value of each image block according to a standard deviation of a space time domain gradient of each image block and a mean square error of the pixel;

Here, the step S102 further includes:

Step S102a, determining an empty time domain gradient of each pixel in the image block;

Here, the image block includes at least one pixel.

The step S102a first calculates a horizontal gradient, a vertical gradient, and a time domain gradient of each pixel in the image block, and then according to the formula (1-1) according to the horizontal gradient, the vertical gradient, and the time domain gradient of each pixel point. To calculate the spatial time domain gradient for each pixel.

In formula (1-1),

Is the space time domain gradient of the pixel,

Is the horizontal gradient of the pixel,

Is the vertical gradient of the pixel,

Is the time domain gradient of the pixel.

Step S102b, determining a standard deviation of an empty time domain gradient of the image block;

Step S102c, determining a mean square error of a pixel value of the image block;

Step S102d: Determine a distortion metric value of the image block according to a standard deviation of a space time domain gradient of the image block and a mean square error of the pixel value.

Here, the distortion metric value of the image block is determined according to the formula (1-2).

In the formula (1-2), D is a distortion metric of the image block, MSE is a mean square error of a pixel value of the image block, and σ is a standard deviation of a spatial time domain gradient of the image block.

According to the relevant research results of human visual perception, the human eye is not sensitive to the distortion of the edge or texture region of the image, and is sensitive to the distortion of the flat region. At the same time, the human eye is not sensitive to the distortion of fast moving video images. Therefore, for the visual characteristics of the human eye, the distortion of the image block can be divided by the standard deviation of the spatial time domain gradient value of the image block on the basis of the original video distortion, thereby embodying the complexity of the human eye to the time domain content. The visual perception of video distortion is not sensitive. In this way, not only the accuracy of the video quality estimation can be guaranteed, but also the computational complexity is reduced.

Step S103, determining a distortion metric value of each frame image according to a distortion metric value of an image block included in each frame image in the video;

Here, the average value of the distortion metric values of all the image blocks included in each frame is determined as the distortion metric value of the video.

Step S104: Determine a distortion metric value of the video according to a distortion metric value of each frame image included in the video.

Here, the average value of the distortion metric values of the images of all the frames included in the video is determined as the distortion metric value of the video.

In the embodiment of the present invention, the method includes: first, dividing each frame image in the video that needs to be quality-evaluated into image blocks according to a preset size; and then, according to the standard deviation and pixel value of the space-time domain gradient of each image block. Mean square error determines a distortion metric value of each image block; and further determines a distortion metric value of each frame image according to a distortion metric value of the image block included in each frame image; The distortion metric of the frame image determines the distortion metric of the video. In this way, not only the accuracy of the video quality estimation can be ensured, but also the computational complexity is reduced, and the video image processing system such as video coding with high real-time requirements can be applied.

Embodiment 2

Based on the foregoing embodiments, the embodiment of the present invention further provides a video quality evaluation method, which is applied to a video quality evaluation apparatus, and the video quality evaluation apparatus includes, but is not limited to, a terminal such as a computer, a tablet computer, or a smart phone. 2 is a schematic flowchart of an implementation process of a video quality evaluation method according to an embodiment of the present invention. As shown in FIG. 2, the method includes the following steps:

Step S201, acquiring a first video and a second video.

Here, the first video is a video that requires quality evaluation, that is, a video in which distortion occurs. The second video is the original video of the video that requires quality evaluation, that is, the video that does not occur.

Here, acquiring the first video includes acquiring pixel values of all pixels in each frame image of the first frequency. Correspondingly, acquiring the second video comprises acquiring pixel values of all pixels in each frame image of the second video.

Step S202, dividing each frame image of the first video and the second video into image blocks according to a preset size;

Here, when the first video and the second video are divided, they are divided according to the same size.

For example, in this embodiment, the first video and the second video are divided into image blocks according to a size of 4*4.

Step S203, determining a standard deviation of an empty time domain gradient of each image block in the first video;

Here, the step S203 further includes:

Step S203a, determining, according to the preset first template, a horizontal gradient of each pixel in each image block in the first video;

Here, the first template is a template for calculating a horizontal gradient of a pixel, the first template is as shown in FIG. 3, and the first column of the first template is a right column of a pixel to be calculated. Value, the second column is the weight of the column where the pixel is calculated, and the third column is the weight of the column to the right of the pixel to be calculated.

Calculating the horizontal gradient of the (i, j) pixel point in the position of the kth frame image in the first video according to formula (2-1)

(i, j) indicates that the pixel is located in the ith row of the image, column j:

In the formula (2-1), f(k, i, j) is a pixel value at which the position of the kth frame in the first video is (i, j).

Step S203b, determining, according to the preset second template, a vertical gradient of each pixel in each image block in the first video;

Here, the second template is a template for calculating a vertical gradient of a pixel, the second template is as shown in FIG. 4, and the first row of the second template is a weight of a row above the pixel to be calculated, The second line is the weight of the row of pixels to be calculated, and the third row is the weight of the row below the pixel to be calculated.

Calculating the horizontal gradient of the position of the kth frame in the first video as (i, j) pixels according to formula (2-2)

In the formula (2-2), f(k, i, j) is a pixel value at which the position of the kth frame in the first video is (i, j) pixels.

Step S203c: Calculate a time domain gradient of each pixel in each image block in the first video according to a preset third template.

Here, the third template is a template for calculating a pixel time domain gradient, and the third template is as shown in FIG. 5. The third template has three 3×3 matrices, wherein the left 3×3 matrix is The weight of the image of the previous frame of the frame in which the pixel is to be calculated, the middle 3×3 matrix is the weight of the frame where the pixel point needs to be calculated, and the matrix of the right 3×3 is the frame of the pixel where the pixel to be calculated is located. The weight of a frame of image.

Calculating a time domain gradient of (i, j) pixels in the kth frame image in the first video according to formula (2-3)

Here, it should be noted that the first template shown in FIG. 3, the second template shown in FIG. 4, and the third template shown in FIG. 5 are merely exemplary illustrations, the first template and the second template. And the third template can be set according to actual needs in practical applications. For example, the size of the first template, the second template, and the third template may be N×N, where N is an odd number greater than 1, such as 3×3, 5×5, and 7×7.

Taking the first template as an example, in addition to the weight of the column in which the pixel to be calculated is 0, the weights of the left and right sides of the pixel to be calculated can be set according to actual needs. However, the setting should follow the principle that the absolute values of the weights on the left and right symmetrical positions are the same, and the weight of one side is positive and the weight of one side is negative. In addition, the closer to the pixel to be calculated is to be followed. The principle that the absolute value of the weight of a pixel is larger.

For example, taking FIG. 3 as an example, the second column is the column of the pixel to be calculated, so the weight of the second column is 0, and the absolute values of the weights of the symmetric positions of the first column and the third column are the same, and the first column is Negative, the third column is positive. And the weight (6) of the pixel point close to the pixel to be calculated is larger than the weight (3) of the pixel point farther from the pixel to be calculated.

Similarly, the weight setting of the second template needs to ensure that the weight of the row of the pixel to be calculated is 0, and the absolute value of the weight at the symmetric position of the upper side and the lower side of the pixel to be calculated is the same. And the weight of one side is positive, the weight of one side is negative, and the absolute value of the weight of the pixel closer to the pixel to be calculated is larger.

For example, taking FIG. 4 as an example, the second row is to calculate the row of the pixel point, so the weight of the second row is 0, and the absolute values of the weights of the symmetric positions of the first row and the third row are the same, and the first behavior is negative. The third act is positive. And the weight (6) of the pixel point close to the pixel to be calculated is larger than the weight (3) of the pixel point farther from the pixel to be calculated.

The weight of the third template is set to ensure that the weight of the frame in which the pixel is to be calculated is 0, and the absolute value of the weight of the previous frame of the pixel to be calculated is the same as the absolute value of the symmetric position of the latter frame, and one side The weight is positive, the weight on one side is negative, and the absolute value of the weight of the pixel closer to the pixel to be calculated is larger.

For example, taking FIG. 5 as an example, the middle 3×3 matrix is the weight of the frame of the pixel to be calculated, the matrix is 0, and the left 3×3 matrix is the previous frame of the frame where the pixel to be calculated is located. The weight of the weight, the right 3 × 3 matrix is the weight of the frame after the pixel to be calculated, the absolute value of the weight of the left 3 × 3 matrix and the right 3 × 3 matrix symmetric position The same, and one side is positive on the negative side. And the weight (6) of the pixel near the pixel to be calculated is larger than the weight (3) of the pixel far from the pixel to be calculated.

Step S203d, determining an empty time domain gradient of each pixel of each image block in the first video according to formula (1-1);

Step S203e: Determine a standard deviation of the spatial time domain gradient of each image block according to a spatial time domain gradient of each pixel of each image block.

Step S204, determining a mean square error of a pixel value of each image block in the first video;

Here, for example, it is determined that the mean square error MSE ₁ of the pixel value of the first image block in the kth frame image in the first video is calculated according to formula (2-4):

Where f(k, i, j) is the pixel value of the (i, j) pixel of the kth frame in the first video, and g(k, i, j) is the second video. The position of the k frame is the pixel value of the (i, j) pixel.

Step S205, determining a distortion metric value of each image block in the first video according to a standard deviation of a space time domain gradient and a mean square error of a pixel value of each image block in the first video;

Here, the distortion metric value of each image block in the first video is determined according to formula (1-2).

Step S206, determining a distortion metric value of each frame image according to a distortion metric value of an image block included in each frame image in the first video;

Step S207, determining a distortion metric value of the video according to a distortion metric value of each frame image in the first video.

It should be noted that the description of the same steps or concepts in the other embodiments may refer to the description in other embodiments, and details are not described herein again.

In the embodiment of the present invention, the method includes: first acquiring a first video and a second video, and dividing each frame image of the first video and the second video into image blocks according to a preset size; a standard deviation of a spatial time domain gradient of each image block in the first video and a mean square error of the pixel value determine a distortion metric value of each image block; and then according to each frame image in the first video A distortion metric value of the included image block determines a distortion metric value of the image of each frame; and finally determines a distortion metric value of the video according to a distortion metric value of each frame image in the first video. In this way, not only the accuracy of the video quality estimation can be ensured, but also the computational complexity is reduced, and the video image processing system such as video coding with high real-time requirements can be applied.

Embodiment 3

The embodiment of the present invention first provides a video quality evaluation method to overcome the problem that the existing video quality evaluation method cannot simultaneously achieve video quality prediction accuracy and maintain low computational complexity. The method includes the following steps:

The first step is to calculate a horizontal gradient, a vertical gradient, and a time domain gradient for each pixel of the video image, and calculate an empty time domain gradient of the pixel on the basis of the pixel;

Here, calculate the pixel horizontal gradient using the template shown in Figure 3.

Calculate the vertical gradient of the pixel using the template shown in Figure 4.

Calculate the time domain gradient of the pixel using the template shown in Figure 5.

It should be noted that the gradient calculation template in Figures 3 - 5 is only an alternative, and various gradient calculation templates can be used as an alternative in practice. Thereafter, the spatial time domain gradient of the pixel is calculated according to the formula (1-1).

In the second step, for each image block, the standard deviation of the pixel empty time domain gradient in the image block is counted;

Here, in the implementation process, the standard deviation σ of the pixel space-time gradient value in each image block may be counted in units of 8×8 blocks or 16×16 blocks, and σ is used as a space time for characterizing the content of the image block. The basis for domain complexity.

In the third step, the mean square error (MSE) of each image block is counted, and the MSE is divided by the space-time gradient standard deviation of the image block and the logarithm is taken as the final distortion metric of the image block.

Here, the traditional mean square error is used as the objective distortion calculation criterion of the video, and based on the formula (1-2), the final distortion determination D of the video is adjusted according to the video space-time domain content complexity σ.

The video quality evaluation method provided in the embodiment of the present invention is compared with the PSNR algorithm, the SSIM algorithm, and the VQM algorithm in the prior art.

The PSNR algorithm compares each frame of the original video and the distorted video pixel by pixel. It is an algorithm based on independent pixel difference, ignoring the influence of sequence content and observation conditions on distortion visibility, so it tends to Subjectively perceived video quality is less consistent.

The SSIM algorithm is an algorithm for extracting and calculating a certain pixel statistical feature of an image of an original video and a distorted video.

The VQM algorithm decomposes the original video and the distorted video into different channels (such as edge, luminance, chrominance, and frame difference) through different filters (such as edge detection), and then extracts pixel-level features and space-time image block-levels respectively. Statistical Features. Pixel-level features include the magnitude of the gradient, the direction of the gradient, the color difference, the contrast and the frame difference, etc., for each pixel. The statistical features of the temporal and spatial image blocks include the calculation of statistical features (mean, standard deviation of pixel-level features) within an 8*8 image block, thereby enhancing the feature integration of the pixels to the characteristics of the temporal and spatial image blocks. Finally, the distortion of the video sequence is obtained through the spatial-time domain integration and weighted fusion of the distortion of each feature.

To test the prediction accuracy of the objective video quality evaluation algorithm is achieved by measuring the correlation between the video distortion predicted by the algorithm and the actual MOS value of the video. The evaluation criteria used include Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank Order Correlation Coefficient (SROCC). Among them, PLCC mainly evaluates the prediction accuracy of the algorithm, that is, evaluates the linear fit between the predicted distortion and the real MOS. The SROCC mainly evaluates the monotonicity of the algorithm prediction, that is, whether the prediction prediction distortion order is consistent with the real MOS ordering.

The simulation results of the video quality evaluation algorithm proposed by the present invention are as follows:

The video subjective quality evaluation data set used in the experiment is the LIVE data set of the Laboratory for Image and Video Engineering (LIVE) of the University of Texas, which includes wireless network transmission distortion and wired network. Four types of video distortion, such as transmission distortion, H.264 compression distortion, and MPEG-2 compression distortion. Experimental comparison methods include the classic objective video quality assessment methods PSNR and SSIM, and the video quality model VQM established by the National Telecommunications and Information Administration (NTIA).

The video quality evaluation method provided by the embodiment of the present invention and the PLCC correlation coefficient of the existing mainstream video quality evaluation algorithms and subjective scoring data are as shown in Table 1:

Table 1 shows the video quality evaluation method provided by the embodiment of the present invention and the results obtained by the PSNR, SSIM, VQM algorithm and the PLCC correlation coefficient table of the subjective scoring data.

The video quality evaluation method provided by the embodiment of the present invention and the SORCC correlation coefficient of the existing mainstream video quality evaluation algorithms and subjective scoring data are as shown in Table 2:

Table 2, the video quality evaluation method provided by the embodiment of the present invention and the results obtained by the PSNR, SSIM, VQM algorithm and the SORCC correlation coefficient table of the subjective scoring data

The correlation between the video distortion predicted by the video quality evaluation algorithm of the present invention and the actual subjective DMOS score of the LIVE data set is as shown in FIG. 6 (the abscissa is the video distortion prediction value of the method of the present invention, and the ordinate Score the actual video for the DMOS value). The correlation between the video quality score predicted by the VQM, PSNR, and SSIM comparison methods and the actual subjective DMOS score of the LIVE data set is shown in Figures 7-9, respectively.

It should be noted that the values calculated by the method provided by the embodiment of the present invention and the VQM method are both distortion estimations of the video (the larger the value indicates the greater the video distortion), so the abscissa and the ordinate in FIG. 6 and FIG. 7 The data is positively correlated. The PSNR method and the SSIM method calculate the value of the video quality estimate (the smaller the value, the greater the distortion), so the abscissa and the ordinate data in Figures 8 and 9 are negatively correlated.

The simulation results show that the video quality evaluation method provided by the embodiment of the present invention achieves a video quality prediction result that is more consistent with the subjective perceived quality of the human eye than the prior art, and requires only a low computational complexity. .

Embodiment 4

The embodiment of the present invention provides a video quality evaluation apparatus. FIG. 10 is a schematic structural diagram of a video quality evaluation apparatus according to Embodiment 4 of the present invention. As shown in FIG. 10, the apparatus 1000 includes: a partitioning module 1001, and a first determining module 1002. a second determining module 1003, a third determining module 1004, wherein:

The dividing module 1001 is configured to divide each frame image in the video that needs to be quality-evaluated into image blocks according to a preset size.

The first determining module 1002 is configured to determine a distortion metric value of each image block according to a standard deviation of a spatial time domain gradient of each image block and a mean square error of the pixel value.

Here, the first determining module 1002 further includes:

a first determining unit, configured to determine a spatial time domain gradient of each pixel in the image block; wherein the image block includes at least one pixel;

Here, the first determining unit further includes: a first determining subunit configured to determine a horizontal gradient, a vertical gradient, and a time domain gradient of each pixel in the image block; and a second determining subunit configured to be The horizontal gradient, the vertical gradient, and the time domain gradient of each pixel in the image block determine an empty time domain gradient for each pixel in the image block.

The second determining subunit further includes: determining a sub-subunit, configured to determine an empty time domain gradient of each pixel in the image block according to formula (1-1), where

Is the space time domain gradient of the pixel,

Is the horizontal gradient of the pixel,

Is the vertical gradient of the pixel,

Is the time domain gradient of the pixel.

a second determining unit, configured to determine a standard deviation of an empty time domain gradient of the image block;

a third determining unit configured to determine a mean square error of a pixel value of the image block;

And a fourth determining unit configured to determine a distortion metric value of the image block according to a standard deviation of a spatial time domain gradient of the image block and a mean square error of the pixel value.

Here, the fourth determining unit further includes:

a third determining subunit, configured to determine a distortion metric value of each image block according to formula (1-2), where D is a distortion metric value of the image block, and MSE is a pixel value of the image block The mean square error, σ is the standard deviation of the spatial time domain gradient of the image block.

The second determining module 1003 is configured to determine a distortion metric value of each frame image according to a distortion metric value of the image block included in each frame image.

Here, the second determining module includes: a fifth determining unit configured to determine an average value of distortion metric values of all image blocks included in each frame as a distortion metric value of the video;

The third determining module 1004 is configured to determine a distortion metric value of the video according to the distortion metric value of each frame image.

Here, the third determining module includes: a sixth determining unit configured to determine an average value of distortion metric values of images of all frames included in the video as a distortion metric value of the video.

It should be noted here that the description of the above embodiment of the video quality evaluation device is similar to the description of the above method embodiment, and has similar advantageous effects as the method embodiment, and therefore will not be described again. For the technical details that are not disclosed in the embodiment of the video quality evaluation device of the present invention, please refer to the description of the method embodiment of the present invention, and the details are not described herein.

It should be noted that, in the embodiment of the present invention, if the video quality evaluation method described above is implemented in the form of a software function module and sold or used as a standalone product, it may also be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read only memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Correspondingly, an embodiment of the present invention provides a video quality evaluation apparatus, such as a terminal, including a memory, a processor, and a computer program stored on the memory and operable on the processor, where the processor implements the program The video quality evaluation method described.

It is to be noted that FIG. 11 is a schematic diagram of a hardware entity of a terminal according to an embodiment of the present invention. As shown in FIG. 11, the hardware entity of the terminal 1100 includes: a processor 1101, a communication interface 1102, and a memory 1103, where:

The processor 1101 typically controls the overall operation of the terminal 1100.

Communication interface 1102 can cause terminal 1100 to communicate with other terminals or servers over a network.

The memory 1103 is configured to store instructions and applications executable by the processor 1101, and may also cache data to be processed or processed by the processor 1101 and each module in the terminal 1100 (eg, image data, audio data, voice communication data, and video) Communication data) can be realized by flash memory (FLASH) or random access memory (RAM).

It should be noted that a computer program (also referred to as a program, software, software application, script or code) can be written in any programming language (including assembly or interpreted language, descriptive language or programming language) and can be any Form (including as a stand-alone program, or as a module, component, subroutine, object, or other unit suitable for use in a computing environment). A computer program can, but does not necessarily, correspond to a file in a file system. The program can be stored in a portion of the file that holds other programs or data (eg, one or more scripts stored in the markup language document), in a single file dedicated to the program of interest, or in multiple collaborative files ( For example, storing one or more modules, submodules, or files in a code section). The computer program can be deployed to be executed on one or more computers located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in the specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating input data and generating output. The above described processes and logic flows can also be performed by dedicated logic circuitry, and the apparatus can also be implemented as dedicated logic circuitry, such as an FPGA or ASIC.

Processors suitable for the execution of a computer program include, for example, a general purpose microprocessor and a special purpose microprocessor, and any one or more processors of any type of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The calculated elements are a processor for performing actions in accordance with instructions and one or more memories for storing instructions and data. Generally, a computer also includes one or more mass storage devices (eg, magnetic disks, magneto-optical disks, or optical disks) for storing data, or is operatively coupled to receive data from or send data thereto, or Both are. However, the computer does not need to have such a device. Moreover, the computer can be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio player or mobile video player, a game console, a global positioning system (GPS) receiver, or a mobile storage device. (For example, Universal Serial Bus (USB) flash drive), the above is just an example. Suitable devices for storing computer program instructions and data include all forms of non-volatile memory, media and storage devices, including, for example, semiconductor storage devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard drives or removable hard drives). ), magneto-optical disks, and CD-ROM and DVD-ROM discs. The processor and memory can be supplemented by or included in dedicated logic circuitry.

Embodiments of the subject matter described in the specification can be implemented in a computing system. The computing system includes a backend component (eg, a data server), or includes a middleware component (eg, an application server), or includes a front end component (eg, a client computer with a graphical user interface or web browser through which the user passes) The end computer can interact with an embodiment of the subject matter described herein, or any combination of one or more of the above described backend components, middleware components, or front end components. The components of the system can be interconnected by any form of digital data communication or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs) and wide area networks (WANs), interconnected networks (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks).

The features described in this application are implemented on a smart television module (or connected to a television module, hybrid television module, etc.). The smart TV module can include processing circuitry configured to integrate more traditional television program sources (eg, program sources received via cable, satellite, air, or other signals) with Internet connectivity. The smart TV module can be physically integrated into a television set or can include stand-alone devices such as set top boxes, Blu-ray or other digital media players, game consoles, hotel television systems, and other ancillary equipment. The smart TV module can be configured to enable viewers to search for and find videos, movies, pictures or other content on the network, on local cable channels, on satellite television channels, or on local hard drives. A set top box (STB) or set top box unit (STU) may include an information-applicable device that includes a tuner and is coupled to the television set and an external source to tune the signal to be displayed on a television screen or other playback device. Content. The smart TV module can be configured to provide a home screen or a top screen including icons for a variety of different applications (eg, web browsers and multiple streaming services, connecting cable or satellite media sources, other network "channels", etc.). The smart TV module can also be configured to provide electronic programming to the user. The companion application of the smart TV module can be run on the mobile terminal to provide the user with additional information related to the available programs, thereby enabling the user to control the smart TV module and the like. In alternative embodiments, this feature can be implemented on a portable computer or other personal computer (PC), smart phone, other mobile phone, handheld computer, tablet PC, or other computing device.

The description contains a number of implementation details, which are not to be construed as limiting the scope of any claims, but rather to the description of the features of the particular embodiments. Particular features described in the specification before and after the independent embodiments can also be implemented in a combination of a single embodiment. Conversely, various features that are described in the context of a single embodiment can be implemented in the various embodiments individually or in any suitable sub-combination. Moreover, although features may be described above as even in the particular combination, even as originally claimed, in some cases one or more of the required combinations can be removed from the combination and the required combination It can be a sub-binding or a sub-combination variant.

Similarly, although the operations are depicted in a particular order in the figures, this should not be construed as requiring that the operations are performed in the particular order shown, or in a sequential order, or all of the operations illustrated are performed to achieve the desired. result. Multitasking and parallel processing can be advantageous in certain circumstances. Furthermore, the separation of various system components in the above-described embodiments should not be understood as requiring that the separation be implemented in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or Packaged into multiple software products.

Thus, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions defined in the claims can be performed in a different order and still achieve the desired results. Moreover, the processes depicted in the figures are not necessarily in the particular order shown, or in a sequential order to achieve the desired results. In particular embodiments, multitasking or parallel processing can be used.

Industrial applicability

In the embodiment of the present invention, the first video and the second video are first acquired, and each frame image of the first video and the second video is divided into image blocks according to a preset size; and then according to the first video. The standard deviation of the spatial time domain gradient of each image block and the mean square error of the pixel value determine the distortion metric value of each image block; and further according to the image block included in each frame image in the first video The distortion metric determines a distortion metric of the image of each frame; and finally determines a distortion metric of the video according to a distortion metric of each frame image in the first video; thus, not only can the video quality estimation be accurate It reduces the computational complexity and can be applied to video image processing systems such as video coding with high real-time requirements.

Claims

A video quality evaluation method, the method comprising:

Each frame of the video in which the quality evaluation is required is divided into image blocks according to a preset size;

Determining a distortion metric value of each image block according to a standard deviation of a spatial time domain gradient of each image block and a mean square error of the pixel value;

Determining a distortion metric value of each frame image according to a distortion metric value of the image block included in each frame image;

A distortion metric value of the video is determined based on a distortion metric of the image of each frame.
The method according to claim 1, wherein determining the distortion metric value of each image block according to a standard deviation of a spatial time domain gradient of each image block and a mean square error of the pixel value comprises:

Determining a spatial time domain gradient of each pixel in the image block; wherein the image block includes at least one pixel;

Determining a standard deviation of a spatial time domain gradient of the image block;

Determining a mean square error of pixel values of the image block;

A distortion metric value of the image block is determined according to a standard deviation of a spatial time domain gradient of the image block and a mean square error of the pixel value.
The method of claim 2, wherein determining the spatial time domain gradient for each pixel in the image block comprises:

Determining a horizontal gradient, a vertical gradient, and a time domain gradient for each pixel in the image block;

A spatial time domain gradient of each pixel in the image block is determined according to a horizontal gradient, a vertical gradient, and a time domain gradient of each pixel in the image block.
The method according to claim 3, wherein determining a spatial time gradient of each pixel in the image block according to a horizontal gradient, a vertical gradient, and a time domain gradient of each pixel in the image block comprises:

According to the formula
Determining an empty time domain gradient for each pixel in the image block, wherein
Is the space time domain gradient of the pixel,
Is the horizontal gradient of the pixel,
Is the vertical gradient of the pixel,
Is the time domain gradient of the pixel.
The method according to claim 1, wherein determining the distortion metric value of the image block according to a space-time gradient standard deviation of the image block and a mean square error of the pixel value comprises:

According to the formula
Determining a distortion metric of each image block, where D is a distortion metric of the image block, MSE is a mean square error of a pixel value of the image block, and σ is a spatial time domain gradient of the image block Standard deviation.
The method according to claim 1, wherein determining a distortion metric value of an image of each frame according to a distortion metric value of an image block included in an image of each frame in the video comprises: The average value of the distortion metric values of all the image blocks included in one frame is determined as the distortion metric value of the video;

Correspondingly, determining the distortion metric value of the video according to the distortion metric of the image of each frame comprises: determining an average value of distortion metric values of images of all frames included in the video as The distortion metric of the video.
A video quality evaluation device, the device comprising:

a dividing module configured to divide an image block according to a preset size in each frame of the video that needs to be subjected to quality evaluation;

a first determining module, configured to determine a distortion metric value of each image block according to a standard deviation of a space time domain gradient of each image block and a mean square error of the pixel value;

a second determining module, configured to determine a distortion metric value of each frame image according to a distortion metric value of the image block included in each frame image;

And a third determining module configured to determine a distortion metric value of the video according to a distortion metric of the image of each frame.
The apparatus according to claim 7, wherein the first determining module comprises:

a first determining unit, configured to determine a spatial time domain gradient of each pixel in the image block; wherein the image block includes at least one pixel;

a second determining unit, configured to determine a standard deviation of an empty time domain gradient of the image block;

a third determining unit configured to determine a mean square error of a pixel value of the image block;

And a fourth determining unit configured to determine a distortion metric value of the image block according to a standard deviation of a spatial time domain gradient of the image block and a mean square error of the pixel value.
The apparatus according to claim 8, wherein said first determining unit comprises:

a first determining subunit configured to determine a horizontal gradient, a vertical gradient, and a time domain gradient for each pixel in the image block;

And a second determining subunit configured to determine an empty time domain gradient of each pixel in the image block according to a horizontal gradient, a vertical gradient, and a time domain gradient of each pixel in the image block.
The apparatus according to claim 7, wherein said second determining module comprises: a fifth determining unit configured to determine an average of distortion metric values of all image blocks included in said each frame as said video Distortion metric;

Correspondingly, the third determining module comprises: a sixth determining unit configured to determine an average value of distortion metric values of images of all frames included in the video as a distortion metric value of the video.
A video quality evaluation device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, the processor executing the program to implement the video of any one of claims 1 to 6. Quality evaluation method.
A computer readable storage medium having stored thereon a computer program, wherein the computer program is executed by a processor to implement the video quality evaluation method according to any one of claims 1 to 6.