CN108600745B

CN108600745B - A video quality evaluation method based on spatiotemporal slice multi-atlas configuration

Info

Publication number: CN108600745B
Application number: CN201810882119.4A
Authority: CN
Inventors: 刘利雄; 王天舒; 黄华; 巩佳超
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2018-08-06
Filing date: 2018-08-06
Publication date: 2020-02-18
Anticipated expiration: 2038-08-06
Also published as: CN108600745A

Abstract

The invention discloses a video quality evaluation method based on multi-atlas configuration of temporal and spatial slices, and belongs to the technical field of image and video analysis. This method uses the idea of spatiotemporal slices to convert original and distorted video sequences into spatiotemporal slice representations, extracts distortion-friendly edge maps and frame difference maps on the spatial slices, and then extracts gradient magnitudes and gradients over all slice sequences The change map composed of the orientation and the still map of Laplace correction, and the original map form a map to complete the map configuration. After that, the spatiotemporal stability of the video to be evaluated is introduced into the slice field to generate the atlas, and the 2D image quality evaluation method is introduced to calculate the difference value of the generated atlas reference-distortion pair. Finally, a neural network method is applied to automatically determine the weight of the contribution of each atlas to video distortion in a learning manner. Compared with the prior art, the method of the invention has the characteristics of high subjective consistency, strong compatibility and high algorithm stability.

Description

A video quality evaluation method based on spatiotemporal slice multi-atlas configuration

技术领域technical field

本发明涉及一种视频质量评价方法，特别涉及一种基于时空域切片多图谱配置的视频质量评价方法，属于图像视频分析技术领域。The invention relates to a video quality evaluation method, in particular to a video quality evaluation method based on spatiotemporal slice multi-atlas configuration, and belongs to the technical field of image and video analysis.

背景技术Background technique

随着科学技术的发展，图像、视频产生和传播的成本变得越来越低，这使得图像和视频作为一种优秀的信息传播的媒介，在日常生活中变得越来越普遍，越来越不可缺少。但是，图像和视频在产生和传播的各个阶段都有引入失真的可能。产生的失真会影响人们观看这些多媒体资料时的体验，严重的还会影响到人们的身心健康。With the development of science and technology, the cost of generating and disseminating images and videos has become lower and lower, which makes images and videos, as an excellent medium for information dissemination, more and more common in daily life, and more and more The more indispensable. However, images and videos have the potential to introduce distortion at various stages of production and dissemination. The resulting distortion will affect people's experience when viewing these multimedia materials, and seriously affect people's physical and mental health.

近年来，人们在图像质量评价研究领域取得了很大的进展，但是在视频领域的发展却相对缓慢。如何遏制低质量视频的传播，保证人们的视觉体验，仍然是一个亟待解决的问题。因此，使视频产生和传播的媒体具有自动评价视频质量高低的能力，从而改善媒体输出端视频的质量，对于解决这个问题具有重要意义。In recent years, people have made great progress in the field of image quality evaluation, but the development in the field of video is relatively slow. How to curb the spread of low-quality videos and ensure people's visual experience is still an urgent problem to be solved. Therefore, it is of great significance to solve this problem that the media that generates and spreads the video have the ability to automatically evaluate the quality of the video, so as to improve the quality of the video at the output end of the media.

发明内容SUMMARY OF THE INVENTION

本发明的目的是为了解决现有视频质量评价方法预测准确性低、信息表示能力差、2D图像质量评价方法在视频质量评价领域应用不成功等问题，提出了一种基于时空域切片多图谱配置的视频质量评价方法。The purpose of the present invention is to solve the problems of low prediction accuracy, poor information representation ability, and unsuccessful application of 2D image quality evaluation methods in the field of video quality evaluation of existing video quality evaluation methods, and proposes a multi-atlas configuration based on spatiotemporal slices video quality assessment method.

本发明方法参考了Ngo等人提出的时空域切片思想。时空域切片思想，采用一种时空域联合的方式重新表示原始的视频信息，有效的解决了视频时空间信息提取与高计算复杂度之间的矛盾。该思想将视频看做位于三维坐标系中的长方体，三个坐标轴分别代表视频的高(H)，宽(W)和时间(T)。沿不同的轴对视频进行切割，就会得到视频的不同信息表示。这一思想可以用公式表达为：The method of the present invention refers to the spatiotemporal slice idea proposed by Ngo et al. The spatiotemporal slicing idea uses a spatiotemporal joint method to re-represent the original video information, which effectively resolves the contradiction between video spatiotemporal information extraction and high computational complexity. This idea regards the video as a cuboid located in a three-dimensional coordinate system, and the three coordinate axes represent the height (H), width (W) and time (T) of the video respectively. Slicing the video along different axes results in different information representations of the video. This idea can be formulated as:

I_STS(i,d)＝{V^d|d∈[T,W,H],i∈[1,N]} (1)I _STS (i,d)={V ^d |d∈[T,W,H],i∈[1,N]} (1)

其中，V为输入的视频序列，上标d代表视频的不同维度，取值范围是上面提到的高(H)，宽(W)和时间(T)，i代表切片序列的索引值，I_STS(i,d)为生成的时空域切片序列。Among them, V is the input video sequence, the superscript d represents the different dimensions of the video, the value range is the height (H), width (W) and time (T) mentioned above, i represents the index value of the slice sequence, I _STS (i,d) is the generated spatiotemporal slice sequence.

本发明方法是通过以下技术方案实现的。一种基于时空域切片多图谱配置的视频质量评价方法，包括以下步骤：The method of the present invention is realized by the following technical solutions. A video quality evaluation method based on spatiotemporal slice multi-atlas configuration, comprising the following steps:

步骤一、将原始和失真视频序列转换到时空域切片表示形式，作为后续处理的基本单元。Step 1. Convert the original and distorted video sequences to the spatiotemporal slice representation as the basic unit for subsequent processing.

步骤二、在空域切片上提取失真友好的边缘图和帧差图，在所有切片序列上提取变化图和静止图，与原始图构成图谱，从而完成图谱配置。Step 2: Extract the distortion-friendly edge map and frame difference map on the air domain slice, extract the change map and still image from all slice sequences, and form a map with the original map, thereby completing the map configuration.

步骤三、将待评价视频的时空域稳定性引入切片领域，进行图谱的生成计算。Step 3: Introduce the spatiotemporal stability of the video to be evaluated into the slice field, and perform atlas generation and calculation.

步骤四、引入2D图像质量评价方法，计算生成图谱参考-失真对的差异值。Step 4: A 2D image quality evaluation method is introduced, and the difference value of the generated atlas reference-distortion pair is calculated.

步骤五、应用神经网络方法，以学习的方式自动确定每张图谱对视频失真的贡献度的权重。Step 5: Apply the neural network method to automatically determine the weight of the contribution of each atlas to the video distortion in a learning manner.

有益效果beneficial effect

本发明方法与已有技术相比，具有主观一致性高、兼容性强、算法稳定性高等特点。本方法可以将普通的2D图像质量评价方法转换为高性能的视频质量评价方法，可以与视频处理相关应用系统协同使用，可以嵌入实际的应用系统(比如视频的放映系统、网络传输系统等)中，实时的监控视频的质量；可以用于评价各种视频处理算法、工具(比如立体图像的压缩编码、视频采集工具等)的优劣；可以用于视频作品的质量审核，防止劣质视频制品危害观众的身心健康。Compared with the prior art, the method of the invention has the characteristics of high subjective consistency, strong compatibility and high algorithm stability. This method can convert ordinary 2D image quality evaluation methods into high-performance video quality evaluation methods, can be used in conjunction with video processing related application systems, and can be embedded in practical application systems (such as video projection systems, network transmission systems, etc.) , to monitor the quality of video in real time; it can be used to evaluate the pros and cons of various video processing algorithms and tools (such as compression coding of stereoscopic images, video capture tools, etc.); it can be used to review the quality of video works to prevent the harm of inferior video products The physical and mental health of the audience.

附图说明Description of drawings

图1是本发明方法的流程图。Figure 1 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明方法做进一步详细说明。The method of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

一种基于时空域切片多图谱配置的视频质量评价方法，如图1所示，包括以下步骤：A video quality evaluation method based on spatiotemporal slice multi-atlas configuration, as shown in Figure 1, includes the following steps:

空域切片能够最好地反映图像的结构，因此，提取空域切片的边缘图和帧差图，用于对时空域信息优化。优选方法如下：The spatial slice can best reflect the structure of the image. Therefore, the edge map and frame difference map of the spatial slice are extracted to optimize the spatiotemporal information. The preferred method is as follows:

I_DIFF(i,T)＝{I_STS(i,T)-I_STS(i-1,T)|i∈[2,N]} (3)I _DIFF (i,T)={I _STS (i, T)-I _STS (i-1,T)|i∈[2,N]} (3)

其中，I_EDGE(i,T)为生成的边缘图，I_DIFF(i,T)为生成的帧差图。I_STS(i,T)代表时空域切片，i表示切片序列的索引值，T为切片的时间维度，N为切片的最大索引值，f_h、f_v为相应的失真友好的边缘滤波核。Among them, I _EDGE (i, T) is the generated edge map, and I _DIFF (i, T) is the generated frame difference map. I _STS (i, T) represents the spatiotemporal slice, i represents the index value of the slice sequence, T is the temporal dimension of the slice, N is the maximum index value of the slice, and f _h and f _v are the corresponding distortion-friendly edge filter kernels.

本实施例中，边缘滤波核的具体值为：In this embodiment, the specific value of the edge filter kernel is:

SI13＝[-0.0052625,-0.0173466,-0.0427401,-0.0768961,SI13=[-0.0052625,-0.0173466,-0.0427401,-0.0768961,

-0.957739,-0.0696751,0,0.6696751,0.0957739,-0.957739,-0.0696751,0,0.6696751,0.0957739,

0.0768961,0.0427401,0.0173466,0.0052625].(4)0.0768961,0.0427401,0.0173466,0.0052625].(4)

将此向量分别横向、纵向复制，即可得到上述两个边缘滤波核。The above two edge filter kernels can be obtained by duplicating this vector horizontally and vertically.

为了对时空域切片的基础信息表示能力进行优化，分别提取梯度幅值和方向组成的变化图，以及拉普拉斯校正的高斯静止图来完成优化。优选的提取方法如下：In order to optimize the basic information representation ability of spatiotemporal slices, the change map composed of gradient magnitude and direction, and the Laplace-corrected Gaussian still map are extracted to complete the optimization. The preferred extraction method is as follows:

I_LAP(i,d)＝I_STS(i,d)-I_UP(i,d) (8)I _LAP (i,d)=I _STS (i,d)-I _UP (i,d) (8)

其中，I_GM(i,d)为生成的梯度幅度图，I_GO(i,d)为生成的梯度方向图，I_GAU(i,d)为生成的高斯滤波图，I_LAP(i,d)为生成的拉普拉斯图。i表示切片序列的索引值，d表示视频的不同维度，I_STS(i,d)为时空域切片序列。G_x、G_y为水平垂直方向的高斯梯度滤波核，f_g为高斯模糊滤波核，I_UP(i,d)为高斯滤波后的上采样图。Among them, I _GM (i, d) is the generated gradient magnitude map, I _GO (i, d) is the generated gradient direction map, I _GAU (i, d) is the generated Gaussian filter map, I _LAP (i, d) ) is the generated Laplace graph. i represents the index value of the slice sequence, d represents the different dimensions of the video, and _ISTS (i, d) is the spatiotemporal slice sequence. G _x and G _y are the Gaussian gradient filter kernels in the horizontal and vertical directions, f _g is the Gaussian blur filter kernel, and I _UP (i,d) is the up-sampled image after Gaussian filtering.

优选的，可以只对一半的切片进行图谱的生成计算，进一步减少计算量。Preferably, only half of the slices can be calculated to generate the map to further reduce the amount of calculation.

步骤四、采用2D图像质量评价方法，通过平均聚合方式计算生成图谱参考-失真对的差异值。Step 4: Using the 2D image quality evaluation method, the difference value of the generated atlas reference-distortion pair is calculated by means of average aggregation.

差异值P_m(i′,d)的具体计算方式为：The specific calculation method of the difference value P _m (i′,d) is:

其中，I代表图谱序列中的某一图像，其上标ref和dis分别代表参考视频序列图谱和失真视频序列图谱，m表示图谱类别，i'表示图谱序列的索引值，d表示视频的不同维度。IQA代表采用的2D全参考图像质量评价方法，计算的结果是该类图谱序列中每组图像对生成一个包含了失真信息的差异值，对于每种类别的图谱，用平均聚合方式得到该类图谱的图谱差异分数，然后可以得到所有类别图谱的差异分数组成的向量S。Among them, I represents an image in the atlas sequence, the superscripts ref and dis represent the reference video sequence atlas and the distorted video sequence atlas respectively, m denotes the atlas category, i' denotes the index value of the atlas sequence, and d denotes the different dimensions of the video . IQA represents the 2D full reference image quality evaluation method adopted. The calculated result is that each group of image pairs in this type of atlas sequence generates a difference value containing distortion information. For each type of atlas, the average aggregation method is used to obtain this type of atlas. The map difference score of the map, and then a vector S composed of the difference scores of all categories of maps can be obtained.

学习表征公式为：The learning representation formula is:

Q＝θ^RS (10)Q=θ ^R S (10)

其中，θ即为要学习的权重参数向量，S为图谱差异分数向量，Q为最终的视频质量分数表示，R代表向量转置。Among them, θ is the weight parameter vector to be learned, S is the map difference score vector, Q is the final video quality score representation, and R represents the vector transpose.

实施例Example

在三个视频质量评价数据库上实施本发明方法，包括LIVE，IVP和CSIQ。这些数据库的基本信息如表1所示。通过选取两个性能优越的全参考视频质量评价方法与本发明方法进行比较。The method of the present invention is implemented on three video quality evaluation databases, including LIVE, IVP and CSIQ. The basic information of these databases is shown in Table 1. The method of the present invention is compared by selecting two full-reference video quality evaluation methods with superior performance.

表1数据库基本信息Table 1 Basic information of the database

另外，因为本方法是一个将2D图像质量评价方法转化为视频评价方法的框架，选取三个全参考图像质量评价方法(PSNR，SSIM和VIF)结合本发明方法中完成实验，同时这三个全参考方法的结果也加入了对比，用来测试本发明方法框架对这些方法性能的提升。每次实验选取20％数据进行测试，以SRCC，KRCC，PLCC，RMSE为指标，重复1000次取中值，实验结果见表2。In addition, because this method is a framework for transforming the 2D image quality evaluation method into a video evaluation method, three full reference image quality evaluation methods (PSNR, SSIM and VIF) are selected to complete the experiment in combination with the method of the present invention. The results of the reference method are also added for comparison to test the improvement of the performance of these methods by the method framework of the present invention. In each experiment, 20% of the data was selected for testing, with SRCC, KRCC, PLCC, and RMSE as indicators, and repeated 1000 times to take the median value. The experimental results are shown in Table 2.

表2三个数据库上算法性能比较Table 2 Comparison of algorithm performance on three databases

表3是各算法在每种失真类型上的性能表现，结合表2可以看出，使用本发明方法在三个数据库的测试中，都对2D方法的性能有了显著的提升，同时，改进后的PSNR在3个库上结果都优于STRRED和ViS3，表明2D方法经本框架改进可以取得十分有竞争力的性能表现。Table 3 shows the performance of each algorithm on each type of distortion. Combining with Table 2, it can be seen that the performance of the 2D method has been significantly improved in the tests of the three databases using the method of the present invention. At the same time, after the improvement The PSNR results of the 3 libraries are better than STRRED and ViS3, indicating that the 2D method can achieve very competitive performance after being improved by this framework.

表3每种失真类别上算法性能比较Table 3 Comparison of algorithm performance on each distortion category

Claims

1. A video quality evaluation method based on time-space domain slice multi-map configuration is characterized by comprising the following steps:

converting an original video sequence and a distorted video sequence into a time-space domain slice representation form as a basic unit of subsequent processing;

secondly, extracting distortion-friendly edge images and frame difference images on the spatial domain slices, and extracting variation images and Gaussian stills on all slice sequences; forming an atlas by the extracted edge graph, frame difference graph, change graph and Gaussian stationary graph together with the original graph so as to complete atlas configuration, wherein the change graph comprises a gradient amplitude graph and a gradient directional graph, and the Gaussian stationary graph comprises a Gaussian filter graph and a Laplace graph;

step three, introducing the time-space domain stability of the video to be evaluated into the slicing field, and performing map generation calculation on half of slices according to the mode of the step two;

step four, introducing a 2D image quality evaluation method, and calculating difference values of generated map reference-distortion pairs;

and step five, automatically determining the weight of the contribution degree of each map to the video distortion in a learning mode by applying a neural network method.

2. The video quality evaluation method based on the time-space domain slice multi-map configuration as claimed in claim 1, wherein in the second step, the method for extracting the edge map is as follows:

wherein, I_EDGE(I, T) represents the generated edge map, I_STS(i, T) represents a time-space domain slice, i represents an index value of a slice sequence, T represents a time dimension of a slice, f_h、f_vA corresponding distortion-friendly edge filtering kernel.

3. The video quality evaluation method based on the time-space domain slice multi-map configuration as claimed in claim 1, wherein in the second step, the method for extracting the frame difference map comprises the following steps:

I_DIFF(i,T)＝{I_STS(i,T)-I_STS(i-1,T)|i∈[2,N]} (2)

wherein, I_DIFF(I, T) represents the generated frame difference map, I_STS(i, T) represents a time-space domain slice, i represents an index value of a slice sequence, T is a time dimension of the slice, and N is a maximum index value of the slice.

4. The video quality evaluation method based on the time-space domain slice multi-map configuration as claimed in claim 1, wherein in the second step, the method for extracting the variation map is as follows:

wherein, I_GM(I, d) represents the generated gradient magnitude map, I_GO(i, d) is the generated gradient directional diagram, i represents the index value of the slice sequence; d represents different dimensionalities of the video, the video is regarded as a cuboid in a three-dimensional coordinate system, three coordinate axes respectively represent the height, the width and the time of the video, and the value range of d is the height, the width and the time; i is_STS(i, d) is a time-space domain slice sequence, G_x、G_yIs a Gaussian gradient filter kernel in the horizontal and vertical directions.

5. The video quality evaluation method based on the time-space domain slice multi-map configuration as claimed in claim 1, wherein in the second step, the method for extracting the Gaussian stationary map comprises the following steps:

I_LAP(i,d)＝I_STS(i,d)-I_UP(i,d) (6)

wherein, I_GAU(i, d) is the generated Gauss filter map, ILAP (i, d) is the generated Laplace map, iAn index value representing a slice sequence; d represents different dimensionalities of the video, the video is regarded as a cuboid in a three-dimensional coordinate system, three coordinate axes respectively represent the height, the width and the time of the video, and the value range of d is the height, the width and the time; i is_STS(i, d) is a time-space domain slice sequence, f_gIs a Gaussian blur Filter kernel, I_UP(i, d) are Gaussian filtered upsampled maps.