CN118411583A

CN118411583A - Immersive video quality evaluation method and device based on multi-feature fusion

Info

Publication number: CN118411583A
Application number: CN202410836696.5A
Authority: CN
Inventors: 曾焕强; 柯雯瑶; 朱建清; 陈婧; 施一帆; 王天磊; 杨慰民; 夏至贤
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2024-06-26
Filing date: 2024-06-26
Publication date: 2024-07-30
Anticipated expiration: 2044-06-26
Also published as: CN118411583B

Abstract

The present invention discloses an immersive video quality evaluation method and device based on multi-feature fusion, which relate to the field of video processing, including: extracting features from a reference texture video sequence and a distorted texture video sequence using a 3D-LOG filter to obtain reference texture features and distorted texture features, calculating texture feature similarity, and obtaining a texture video quality score through a 3D-LOG pooling strategy based on the texture feature similarity; calculating reference depth features and distorted depth features according to a reference depth video sequence and a distorted depth video sequence; calculating depth feature similarity according to the reference depth features and the distorted depth features and determining gradient weights, and calculating a depth video quality score according to the depth feature similarity and the gradient weights; and calculating a quality score of an immersive video to be evaluated according to the texture video quality score and the depth video quality score, so as to solve the problem that the existing video evaluation algorithm does not conform to the visual characteristics of the human eye and the characteristics of the immersive video.

Description

Immersive video quality evaluation method and device based on multi-feature fusion

技术领域Technical Field

本发明涉及视频处理领域，具体涉及一种基于多特征融合的沉浸式视频质量评价方法及装置。The present invention relates to the field of video processing, and in particular to an immersive video quality evaluation method and device based on multi-feature fusion.

背景技术Background technique

随着高速网络传输、视频采集、视频处理和显示技术的迅猛发展，以及人类对沉浸式体验需求的日益增长，沉浸式视频迎来爆发期，成为视频技术研究热点，在远程办公、智能交通、商业广播等领域广泛应用。相较于传统视频，沉浸式视频具备超广视角、高自由度、高分辨率等特性，具有极高的沉浸感和交互感。其中高自由度体现在当人观看视频时能够提供三个主轴上的平移以及三个辅助轴上的旋转运动，能给用户带来一种身临其境的感觉。With the rapid development of high-speed network transmission, video acquisition, video processing and display technology, as well as the growing demand for immersive experience, immersive video has ushered in an explosive period and has become a hot spot in video technology research. It is widely used in remote office, intelligent transportation, commercial broadcasting and other fields. Compared with traditional video, immersive video has the characteristics of ultra-wide viewing angle, high degree of freedom, high resolution, etc., and has a very high sense of immersion and interactivity. The high degree of freedom is reflected in the fact that when people watch the video, it can provide translation on three main axes and rotation on three auxiliary axes, which can give users an immersive feeling.

沉浸式视频主要是采用多视点纹理加深度的视频格式，可由计算机生成或相机拍摄而成。其在视频处理过程中会受到各种失真的干扰，削弱了沉浸式场景的视觉表达效果，影响到用户的体验和降低主观感知质量。因此提出一种符合人眼视觉特性并能准确快速地对沉浸式视频的质量做出评价的算法是非常重要的。Immersive video mainly adopts multi-viewpoint texture plus depth video format, which can be generated by computer or shot by camera. It will be interfered by various distortions during video processing, which weakens the visual expression effect of immersive scene, affects the user experience and reduces the subjective perception quality. Therefore, it is very important to propose an algorithm that conforms to the visual characteristics of human eyes and can accurately and quickly evaluate the quality of immersive video.

现阶段大部分视频质量评价算法都主要集中在自然视频领域，但由于沉浸式视频在内容与视频格式上和自然视频都具有不同的时空特性导致了直接将自然视频相关的质量评价算法迁移到沉浸式视频上效果相对较差。因此设计符合人类视觉特性和沉浸式视频特点的视频质量评价算法具有重要的理论研究意义和实际应用价值。At present, most video quality assessment algorithms are mainly focused on the field of natural videos. However, since immersive videos have different temporal and spatial characteristics from natural videos in terms of content and video format, the effect of directly migrating quality assessment algorithms related to natural videos to immersive videos is relatively poor. Therefore, designing a video quality assessment algorithm that conforms to the characteristics of human vision and immersive videos has important theoretical research significance and practical application value.

发明内容Summary of the invention

本申请的目的在于针对上述提到的技术问题提出一种基于多特征融合的沉浸式视频质量评价方法及装置。The purpose of this application is to propose an immersive video quality evaluation method and device based on multi-feature fusion to address the above-mentioned technical problems.

第一方面，本发明提供了一种基于多特征融合的沉浸式视频质量评价方法，包括以下步骤：In a first aspect, the present invention provides an immersive video quality evaluation method based on multi-feature fusion, comprising the following steps:

获取参考的沉浸式视频和待评价的沉浸式视频，参考的沉浸式视频包括参考纹理视频序列和参考深度视频序列，待评价的沉浸式视频包括失真纹理视频序列和失真深度视频序列，对参考纹理视频序列和失真纹理视频序列采用3D-LOG滤波器进行特征提取，得到参考纹理特征和失真纹理特征；根据参考纹理特征和失真纹理特征计算得到纹理特征相似度，基于纹理特征相似度通过3D-LOG池化策略得到纹理视频质量分数；A reference immersive video and an immersive video to be evaluated are obtained, wherein the reference immersive video includes a reference texture video sequence and a reference depth video sequence, and the immersive video to be evaluated includes a distorted texture video sequence and a distorted depth video sequence. A 3D-LOG filter is used to extract features from the reference texture video sequence and the distorted texture video sequence to obtain reference texture features and distorted texture features. Texture feature similarity is calculated based on the reference texture features and the distorted texture features, and a texture video quality score is obtained based on the texture feature similarity through a 3D-LOG pooling strategy.

根据参考深度视频序列和失真深度视频序列计算得到参考深度特征和失真深度特征；根据参考深度特征和失真深度特征计算得到深度特征相似度并确定梯度权重，根据深度特征相似度和梯度权重计算得到深度视频质量分数；A reference depth feature and a distorted depth feature are calculated based on a reference depth video sequence and a distorted depth video sequence; a depth feature similarity is calculated based on the reference depth feature and the distorted depth feature and a gradient weight is determined, and a depth video quality score is calculated based on the depth feature similarity and the gradient weight;

根据纹理视频质量分数和深度视频质量分数计算得到待评价的沉浸式视频的质量分数。The quality score of the immersive video to be evaluated is calculated according to the texture video quality score and the depth video quality score.

作为优选，对参考纹理视频序列和失真纹理视频序列采用3D-LOG滤波器进行特征提取，得到参考纹理特征和失真纹理特征，具体包括：Preferably, a 3D-LOG filter is used to extract features from the reference texture video sequence and the distorted texture video sequence to obtain reference texture features and distorted texture features, which specifically include:

3D-LOG滤波器的计算公式为：The calculation formula of 3D-LOG filter is:

； ;

其中，表示时空域中水平、垂直和时间上的坐标，为3D高斯核函数的标准差，，表示3D-LOG滤波器对应的函数； in, Represents the horizontal, vertical and time coordinates in the space-time domain, is the standard deviation of the 3D Gaussian kernel function, , Represents the function corresponding to the 3D-LOG filter;

将参考纹理视频序列和失真纹理视频序列分别输入3D-LOG滤波器进行卷积，得到参考纹理特征和失真纹理特征，如下式所示：The reference texture video sequence and the distorted texture video sequence are respectively input into the 3D-LOG filter for convolution to obtain the reference texture features and the distorted texture features, as shown in the following formula:

； ;

其中，和分别表示输入的参考纹理视频序列和失真纹理视频序列的每一个像素对应的亮度值，和分别表示通过3D-LOG滤波器提取到的参考纹理特征和失真纹理特征，符号代表卷积操作。 in, and Respectively represent the brightness value corresponding to each pixel of the input reference texture video sequence and the distorted texture video sequence, and Respectively represent the reference texture features and distorted texture features extracted by the 3D-LOG filter, The symbol represents the convolution operation.

作为优选，根据参考纹理特征和失真纹理特征计算得到纹理特征相似度，基于纹理特征相似度通过3D-LOG池化策略得到纹理视频质量分数，具体包括：Preferably, the texture feature similarity is calculated based on the reference texture feature and the distorted texture feature, and the texture video quality score is obtained through the 3D-LOG pooling strategy based on the texture feature similarity, specifically including:

采用下式计算纹理特征相似度： The texture feature similarity is calculated using the following formula: :

； ;

其中，和分别表示通过3D-LOG提取到的参考纹理特征和失真纹理特征，为一个保持数值稳定的常数； in, and Respectively represent the reference texture features and distorted texture features extracted by 3D-LOG, is a constant that maintains numerical stability;

将参考纹理特征和失真纹理特征中的最大值作为纹理权重，如下式所示： The maximum value of the reference texture feature and the distorted texture feature is used as the texture weight , as shown below:

； ;

其中，max表示取其中的最大值；Among them, max means taking the maximum value;

将纹理权重与纹理特征相似度进行加权计算，得到纹理视频质量分数，如下式所示： The texture weight and texture feature similarity are weighted to obtain the texture video quality score , as shown below:

。 .

作为优选，根据参考深度视频序列和失真深度视频序列计算得到参考深度特征和失真深度特征，得到参考深度特征和失真深度特征，具体包括：Preferably, the reference depth feature and the distorted depth feature are calculated according to the reference depth video sequence and the distorted depth video sequence, and the reference depth feature and the distorted depth feature are obtained, specifically including:

分别对参考纹理视频序列和失真纹理视频序列计算梯度幅值，得到对应的深度特征：The gradient amplitude is calculated for the reference texture video sequence and the distorted texture video sequence respectively to obtain the corresponding depth features:

； ;

其中，表示深度视频帧，和分别表示水平和垂直方向的偏导数，和分别表示水平和垂直方向的梯度幅值分量，表示深度特征，参考深度视频序列和失真深度视频序列中的深度视频帧所计算得到的深度特征分别对应为参考深度特征和失真深度特征，符号代表卷积操作。 in, represents a depth video frame, and denote the partial derivatives in the horizontal and vertical directions respectively, and Represent the gradient amplitude components in the horizontal and vertical directions respectively, Represents the depth feature. The depth features calculated from the depth video frames in the reference depth video sequence and the distorted depth video sequence correspond to the reference depth features. and distortion depth features , The symbol represents the convolution operation.

作为优选，根据参考深度特征和失真深度特征计算得到深度特征相似度并确定梯度权重，根据深度特征相似度和梯度权重计算得到深度视频质量分数，具体包括：Preferably, the depth feature similarity is calculated based on the reference depth feature and the distorted depth feature and the gradient weight is determined, and the depth video quality score is calculated based on the depth feature similarity and the gradient weight, specifically including:

采用下式计算深度特征相似度：The following formula is used to calculate the deep feature similarity:

； ;

其中，和分别表示参考深度特征和失真深度特征，表示深度特征相似度，为另一个保持数值稳定的常数； in, and represent the reference depth feature and the distorted depth feature respectively, represents the deep feature similarity, is another constant that maintains numerical stability;

将参考深度特征和失真深度特征中的最大值作为梯度权重，如下式所示： The maximum value of the reference depth feature and the distorted depth feature is used as the gradient weight , as shown below:

； ;

将梯度权重与深度特征相似度进行加权计算，得到深度视频质量分数，如下式所示： The gradient weight and the depth feature similarity are weighted to obtain the depth video quality score. , as shown below:

。 .

作为优选，根据纹理视频质量分数和深度视频质量分数计算得到待评价的沉浸式视频的质量分数，具体包括：Preferably, the quality score of the immersive video to be evaluated is calculated according to the texture video quality score and the depth video quality score, specifically including:

对纹理视频质量分数和深度视频质量分数进行重要性计算，得到重要性分值，如下式所示： Calculate the importance of texture video quality score and depth video quality score to get the importance score , as shown below:

； ;

其中，是用于调整纹理特征和深度特征之间相对重要性的参数； in, It is a parameter used to adjust the relative importance between texture features and depth features;

将纹理视频质量分数的绝对值和深度视频质量分数的绝对值中的最大值作为评价权重，如下式所示： The maximum value of the absolute value of the texture video quality score and the absolute value of the depth video quality score is used as the evaluation weight , as shown below:

； ;

其中，max表示取其中的最大值，为绝对值符号； Among them, max means taking the maximum value. is the absolute value symbol;

将评价权重与重要性分值进行加权计算，得到待评价的沉浸式视频的质量分数MMF，如下式所示：The evaluation weight and the importance score are weighted and calculated to obtain the quality score MMF of the immersive video to be evaluated, as shown in the following formula:

； ;

其中，N表示待评价的沉浸式视频序列的个数，i=1,2,…,N。Where N represents the number of immersive video sequences to be evaluated, i=1,2,…,N.

第二方面，本发明提供了一种基于多特征融合的沉浸式视频质量评价装置，包括：In a second aspect, the present invention provides an immersive video quality assessment device based on multi-feature fusion, comprising:

纹理视频质量分数计算模块，被配置为获取参考的沉浸式视频和待评价的沉浸式视频，参考的沉浸式视频包括参考纹理视频序列和参考深度视频序列，待评价的沉浸式视频包括失真纹理视频序列和失真深度视频序列，对参考纹理视频序列和失真纹理视频序列采用3D-LOG滤波器进行特征提取，得到参考纹理特征和失真纹理特征；根据参考纹理特征和失真纹理特征计算得到纹理特征相似度，基于纹理特征相似度通过3D-LOG池化策略得到纹理视频质量分数；The texture video quality score calculation module is configured to obtain a reference immersive video and an immersive video to be evaluated, wherein the reference immersive video includes a reference texture video sequence and a reference depth video sequence, and the immersive video to be evaluated includes a distorted texture video sequence and a distorted depth video sequence, and extract features from the reference texture video sequence and the distorted texture video sequence using a 3D-LOG filter to obtain reference texture features and distorted texture features; calculate texture feature similarity based on the reference texture features and the distorted texture features, and obtain a texture video quality score based on the texture feature similarity using a 3D-LOG pooling strategy;

深度视频质量分数计算模块，被配置为根据参考深度视频序列和失真深度视频序列计算得到参考深度特征和失真深度特征；根据参考深度特征和失真深度特征计算得到深度特征相似度并确定梯度权重，根据深度特征相似度和梯度权重计算得到深度视频质量分数；The depth video quality score calculation module is configured to calculate reference depth features and distorted depth features according to the reference depth video sequence and the distorted depth video sequence; calculate the depth feature similarity according to the reference depth feature and the distorted depth feature and determine the gradient weight, and calculate the depth video quality score according to the depth feature similarity and the gradient weight;

质量分数计算模块，被配置为根据纹理视频质量分数和深度视频质量分数计算得到待评价的沉浸式视频的质量分数。The quality score calculation module is configured to calculate the quality score of the immersive video to be evaluated according to the texture video quality score and the depth video quality score.

第三方面，本发明提供了一种电子设备，包括一个或多个处理器；存储装置，用于存储一个或多个程序，当一个或多个程序被一个或多个处理器执行，使得一个或多个处理器实现如第一方面中任一实现方式描述的方法。In a third aspect, the present invention provides an electronic device comprising one or more processors; a storage device for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any implementation manner in the first aspect.

第四方面，本发明提供了一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如第一方面中任一实现方式描述的方法。In a fourth aspect, the present invention provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in any implementation manner in the first aspect.

第五方面，本发明提供了一种计算机程序产品，包括计算机程序，计算机程序被处理器执行时实现如第一方面中任一实现方式描述的方法。In a fifth aspect, the present invention provides a computer program product, comprising a computer program, which, when executed by a processor, implements the method described in any implementation manner in the first aspect.

相比于现有技术，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

（1）本发明提出的基于多特征融合的沉浸式视频质量评价方法考虑到沉浸式视频不仅包含复杂边缘信息和运动变化的纹理时空特征，还具有提供沉浸感和高自由度的深度信息，着重于考虑人眼视觉系统特性及沉浸式视频的特点。所以利用3D-LOG滤波器提取纹理视频在空域上的边缘、轮廓信息和时域上的运动信息，计算深度视频中的梯度幅值来感知闪烁失真导致的质量退化，基于所提取到的纹理特征和深度特征进行加权得到纹理视频质量分数和深度视频分数，最后结合人类视觉特性设计了一个加权策略来衡量纹理与深度对于沉浸式视频的贡献度，得到沉浸式视频的质量分数，其结果与人类视觉系统的感知结果有较高的一致性。(1) The immersive video quality evaluation method based on multi-feature fusion proposed in the present invention takes into account that immersive videos not only contain complex edge information and spatiotemporal texture features of motion changes, but also have depth information that provides immersion and high degrees of freedom, and focuses on the characteristics of the human visual system and immersive videos. Therefore, a 3D-LOG filter is used to extract the edge and contour information of the texture video in the spatial domain and the motion information in the temporal domain, and the gradient amplitude in the depth video is calculated to perceive the quality degradation caused by flicker distortion. The texture features and depth features extracted are weighted to obtain the texture video quality score and the depth video score. Finally, a weighting strategy is designed in combination with the human visual characteristics to measure the contribution of texture and depth to the immersive video, and the quality score of the immersive video is obtained. The result is highly consistent with the perception result of the human visual system.

（2）本发明提出的基于多特征融合的沉浸式视频质量评价方法从多方面考虑了人眼视觉特性和沉浸式视频的特点，具有较好的视频质量评价性能。(2) The immersive video quality evaluation method based on multi-feature fusion proposed in the present invention considers the visual characteristics of the human eye and the characteristics of immersive video from many aspects, and has good video quality evaluation performance.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简要介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域的普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1为本申请的实施例的基于多特征融合的沉浸式视频质量评价方法的流程示意图；FIG1 is a schematic flow chart of an immersive video quality assessment method based on multi-feature fusion according to an embodiment of the present application;

图2为本申请的实施例的基于多特征融合的沉浸式视频质量评价方法的流程框图；FIG2 is a flowchart of an immersive video quality assessment method based on multi-feature fusion according to an embodiment of the present application;

图3为本申请的实施例的基于多特征融合的沉浸式视频质量评价装置的示意图；FIG3 is a schematic diagram of an immersive video quality assessment device based on multi-feature fusion according to an embodiment of the present application;

图4为本发明实施例提供的电子设备的硬件结构示意图。FIG. 4 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明作进一步地详细描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。In order to make the purpose, technical scheme and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

图1示出了本申请的实施例提供的一种基于多特征融合的沉浸式视频质量评价方法，包括以下步骤：FIG1 shows an immersive video quality assessment method based on multi-feature fusion provided by an embodiment of the present application, comprising the following steps:

S1，获取参考的沉浸式视频和待评价的沉浸式视频，参考的沉浸式视频包括参考纹理视频序列和参考深度视频序列，待评价的沉浸式视频包括失真纹理视频序列和失真深度视频序列，对参考纹理视频序列和失真纹理视频序列采用3D-LOG滤波器进行特征提取，得到参考纹理特征和失真纹理特征；根据参考纹理特征和失真纹理特征计算得到纹理特征相似度，基于纹理特征相似度通过3D-LOG池化策略得到纹理视频质量分数。S1, obtain a reference immersive video and an immersive video to be evaluated, the reference immersive video includes a reference texture video sequence and a reference depth video sequence, the immersive video to be evaluated includes a distorted texture video sequence and a distorted depth video sequence, use a 3D-LOG filter to extract features from the reference texture video sequence and the distorted texture video sequence to obtain reference texture features and distorted texture features; calculate the texture feature similarity based on the reference texture features and the distorted texture features, and obtain the texture video quality score based on the texture feature similarity through the 3D-LOG pooling strategy.

在具体的实施例中，对参考纹理视频序列和失真纹理视频序列采用3D-LOG滤波器进行特征提取，得到参考纹理特征和失真纹理特征，具体包括：In a specific embodiment, a 3D-LOG filter is used to extract features from a reference texture video sequence and a distorted texture video sequence to obtain reference texture features and distorted texture features, which specifically include:

； ;

其中，和分别表示输入的参考纹理视频序列和失真纹理视频序列的每一个像素对应的亮度值，和分别表示通过3D-LOG提取到的参考纹理特征和失真纹理特征，符号代表卷积操作。 in, and Respectively represent the brightness value corresponding to each pixel of the input reference texture video sequence and the distorted texture video sequence, and Respectively represent the reference texture features and distorted texture features extracted by 3D-LOG, The symbol represents the convolution operation.

在具体的实施例中，根据参考纹理特征和失真纹理特征计算得到纹理特征相似度，基于纹理特征相似度通过3D-LOG池化策略得到纹理视频质量分数，具体包括：In a specific embodiment, the texture feature similarity is calculated based on the reference texture feature and the distorted texture feature, and the texture video quality score is obtained through the 3D-LOG pooling strategy based on the texture feature similarity, which specifically includes:

； ;

。 .

具体的，参考图2，将待评价的沉浸式视频不经过加工直接提取到参考纹理视频序列和参考深度视频序列，参考纹理视频序列和参考深度视频序列均是未失真状态，将待评价的沉浸式视频经过加工提取得到失真纹理视频序列和失真深度视频序列。首先将参考纹理视频序列和失真纹理视频序列采用3D-LOG滤波器进行特征提取，得到参考纹理特征和失真纹理特征。3D-LOG滤波器能够很好地模拟人类视觉神经元对于图像处理的反馈过程，更全面地衡量和感知沉浸式视频中纹理区域的时空质量特征退化。进一步的，根据参考纹理特征和失真纹理特征计算纹理特征相似度，在纹理特征相似度的计算公式中，。基于纹理特征相似度通过一个3D-LOG池化策略得到纹理视频质量分数。 Specifically, referring to FIG2, the immersive video to be evaluated is directly extracted into a reference texture video sequence and a reference depth video sequence without processing. The reference texture video sequence and the reference depth video sequence are both in an undistorted state. The immersive video to be evaluated is processed and extracted to obtain a distorted texture video sequence and a distorted depth video sequence. First, the reference texture video sequence and the distorted texture video sequence are subjected to feature extraction using a 3D-LOG filter to obtain reference texture features. and distorted texture features . The 3D-LOG filter can well simulate the feedback process of human visual neurons for image processing, and more comprehensively measure and perceive the temporal and spatial quality feature degradation of the texture area in the immersive video. Furthermore, the texture feature similarity is calculated based on the reference texture feature and the distorted texture feature. In the calculation formula of the texture feature similarity, The texture video quality score is obtained based on the texture feature similarity through a 3D-LOG pooling strategy. .

S2，根据参考深度视频序列和失真深度视频序列计算得到参考深度特征和失真深度特征；根据参考深度特征和失真深度特征计算得到深度特征相似度并确定梯度权重，根据深度特征相似度和梯度权重计算得到深度视频质量分数。S2, calculating reference depth features and distorted depth features based on the reference depth video sequence and the distorted depth video sequence; calculating depth feature similarity based on the reference depth features and the distorted depth features and determining gradient weights, and calculating a depth video quality score based on the depth feature similarity and the gradient weights.

在具体的实施例中，分别对参考纹理视频序列和失真纹理视频序列计算梯度幅值，得到对应的深度特征：In a specific embodiment, the gradient amplitudes are calculated for the reference texture video sequence and the distorted texture video sequence respectively to obtain the corresponding depth features:

； ;

在具体的实施例中，根据参考深度特征和失真深度特征计算得到深度特征相似度并确定梯度权重，根据深度特征相似度和梯度权重计算得到深度视频质量分数，具体包括：In a specific embodiment, the depth feature similarity is calculated based on the reference depth feature and the distorted depth feature, and the gradient weight is determined. The depth video quality score is calculated based on the depth feature similarity and the gradient weight, specifically including:

； ;

。 .

具体的，对参考深度视频序列和失真深度视频序列计算梯度幅值作为检测闪烁畸变位置的关键特征，得到参考深度特征和失真深度特征。进一步根据参考深度特征和失真深度特征计算深度特征相似度，在深度特征相似度的计算公式中，经过大量实验确定。利用参考和失真梯度特征之间的较大值来计算梯度权重，模拟人眼观看视频时对不同像素投入不同的关注度，然后与提取到的深度特征相似度进行加权计算，得到深度视频质量分数。 Specifically, the gradient amplitude of the reference depth video sequence and the distorted depth video sequence is calculated as the key feature for detecting the flicker distortion position, and the reference depth feature and the distorted depth feature are obtained. The depth feature similarity is further calculated based on the reference depth feature and the distorted depth feature. In the calculation formula of the depth feature similarity, after a large number of experiments, it is determined that The larger value between the reference and distortion gradient features is used to calculate the gradient weight, simulating the different attention paid to different pixels by human eyes when watching videos. Then, the similarity with the extracted deep features is weighted and calculated to obtain the deep video quality score. .

S3，根据纹理视频质量分数和深度视频质量分数计算得到待评价的沉浸式视频的质量分数。S3, calculating the quality score of the immersive video to be evaluated according to the texture video quality score and the depth video quality score.

在具体的实施例中，步骤S3具体包括：In a specific embodiment, step S3 specifically includes:

； ;

具体的，联合3D-LOG滤波器提取到的纹理特征和计算梯度幅值得到的深度特征作为沉浸式视频质量的评价指标，并结合人眼视觉特性，将纹理视频质量分数和深度视频质量分数中较大的质量分数作为加权策略来表征沉浸式视频的质量分数。在计算过程中，参数是用于调整纹理特征和深度特征之间相对重要性的参数，经过实验分析将其设置为。 Specifically, the texture features extracted by the 3D-LOG filter and the depth features obtained by calculating the gradient amplitude are used as evaluation indicators of immersive video quality. Combined with the visual characteristics of the human eye, the larger quality score of the texture video quality score and the depth video quality score is used as a weighting strategy to characterize the quality score of the immersive video. It is a parameter used to adjust the relative importance between texture features and depth features. After experimental analysis, it is set to .

以上步骤S1-S3并不一定代表步骤之间的顺序，而是步骤符号表示，步骤间的顺序可调整。The above steps S1-S3 do not necessarily represent the order between the steps, but are represented by step symbols, and the order between the steps can be adjusted.

下面通过具体实例和数据证明本发明方法的优越性。The superiority of the method of the present invention is demonstrated below through specific examples and data.

表1 SIMVD数据库上所提方法MMF与其他算法的综合性能比较结果：Table 1 Comprehensive performance comparison results of the proposed method MMF and other algorithms on the SIMVD database:

表1为本发明提出的基于多特征融合的沉浸式视频质量评价方法（用MMF表示）和其他先进算法在沉浸式视频数据库SIMVD上的实验结果对比，其中SSIM、MS-SSIM、GMSD、VSI、FSIM、SPSIM、GSS、ESIM、GFM、SpEED、ViS3、STMAD、VMAF、 SGFTM、IV-PSNR均为其他先进算法的名称，PLCC（皮尔森线性相关系数）、SROCC（斯皮尔曼秩相关系数）和RMSE（均方根误差）为三个通用的标准，是在视频质量评价领域用于考量评价方法的好坏有三个经典相关参数，其中PLCC和SROCC的值越接近于1，RMSE值越小，客观算法的结果和主观评价的结果相关性越高，说明算法越优越。在表1中对前三个性能较优的算法分别用粗体表示，从数据来看，本发明提出的基于多特征融合的沉浸式视频质量评价方法得出的PLCC、SROCC的值均大于其他算法更接近1，RMSE均比其他算法小，这表明本发明提出的基于多特征融合的沉浸式视频质量评价方法和主观评价的结果相关性越高，算法更为越优越，在评价沉浸式视频质量问题上更优越。Table 1 is a comparison of the experimental results of the immersive video quality evaluation method based on multi-feature fusion proposed in the present invention (expressed by MMF) and other advanced algorithms on the immersive video database SIMVD, where SSIM, MS-SSIM, GMSD, VSI, FSIM, SPSIM, GSS, ESIM, GFM, SpEED, ViS3, STMAD, VMAF, SGFTM, and IV-PSNR are the names of other advanced algorithms. PLCC (Pearson linear correlation coefficient), SROCC (Spearman rank correlation coefficient) and RMSE (root mean square error) are three common standards, which are three classic related parameters used to consider the quality of evaluation methods in the field of video quality evaluation. The closer the values of PLCC and SROCC are to 1, the smaller the RMSE value is, and the higher the correlation between the results of the objective algorithm and the results of the subjective evaluation is, the more superior the algorithm is. In Table 1, the top three algorithms with better performance are represented in bold. From the data, the PLCC and SROCC values obtained by the immersive video quality evaluation method based on multi-feature fusion proposed in the present invention are greater than those of other algorithms and closer to 1, and the RMSE is smaller than that of other algorithms. This shows that the higher the correlation between the immersive video quality evaluation method based on multi-feature fusion proposed in the present invention and the results of subjective evaluation, the more superior the algorithm is, and it is more superior in evaluating the quality of immersive video.

进一步参考图3，作为对上述各图所示方法的实现，本申请提供了一种基于多特征融合的沉浸式视频质量评价装置的一个实施例，该装置实施例与图1所示的方法实施例相对应，该装置具体可以应用于各种电子设备中。Further referring to Figure 3, as an implementation of the methods shown in the above figures, the present application provides an embodiment of an immersive video quality evaluation device based on multi-feature fusion. The device embodiment corresponds to the method embodiment shown in Figure 1, and the device can be specifically applied to various electronic devices.

本申请实施例提供了一种基于多特征融合的沉浸式视频质量评价装置，包括：The embodiment of the present application provides an immersive video quality assessment device based on multi-feature fusion, comprising:

纹理视频质量分数计算模块1，被配置为获取参考的沉浸式视频和待评价的沉浸式视频，参考的沉浸式视频包括参考纹理视频序列和参考深度视频序列，待评价的沉浸式视频包括失真纹理视频序列和失真深度视频序列，对参考纹理视频序列和失真纹理视频序列采用3D-LOG滤波器进行特征提取，得到参考纹理特征和失真纹理特征；根据参考纹理特征和失真纹理特征计算得到纹理特征相似度，基于纹理特征相似度通过3D-LOG池化策略得到纹理视频质量分数；The texture video quality score calculation module 1 is configured to obtain a reference immersive video and an immersive video to be evaluated, wherein the reference immersive video includes a reference texture video sequence and a reference depth video sequence, and the immersive video to be evaluated includes a distorted texture video sequence and a distorted depth video sequence, and extract features from the reference texture video sequence and the distorted texture video sequence using a 3D-LOG filter to obtain reference texture features and distorted texture features; calculate texture feature similarity based on the reference texture features and the distorted texture features, and obtain a texture video quality score based on the texture feature similarity using a 3D-LOG pooling strategy;

深度视频质量分数计算模块2，被配置为根据参考深度视频序列和失真深度视频序列计算得到参考深度特征和失真深度特征；根据参考深度特征和失真深度特征计算得到深度特征相似度并确定梯度权重，根据深度特征相似度和梯度权重计算得到深度视频质量分数；The depth video quality score calculation module 2 is configured to calculate reference depth features and distorted depth features according to the reference depth video sequence and the distorted depth video sequence; calculate the depth feature similarity according to the reference depth feature and the distorted depth feature and determine the gradient weight, and calculate the depth video quality score according to the depth feature similarity and the gradient weight;

质量分数计算模块3，被配置为根据纹理视频质量分数和深度视频质量分数计算得到待评价的沉浸式视频的质量分数。The quality score calculation module 3 is configured to calculate the quality score of the immersive video to be evaluated according to the texture video quality score and the depth video quality score.

图4为本发明实施例提供的电子设备的硬件结构示意图。如图4所示，本实施例的电子设备包括：处理器401以及存储器402；其中存储器402，用于存储计算机执行指令；处理器401，用于执行存储器存储的计算机执行指令，以实现上述实施例中电子设备所执行的各个步骤。具体可以参见前述方法实施例中的相关描述。FIG4 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present invention. As shown in FIG4, the electronic device of this embodiment includes: a processor 401 and a memory 402; wherein the memory 402 is used to store computer-executable instructions; the processor 401 is used to execute the computer-executable instructions stored in the memory to implement the various steps performed by the electronic device in the above embodiment. For details, please refer to the relevant description in the above method embodiment.

可选地，存储器402既可以是独立的，也可以跟处理器401集成在一起。Optionally, the memory 402 may be independent or integrated with the processor 401 .

当存储器402独立设置时，该电子设备还包括总线403，用于连接存储器402和处理器401。When the memory 402 is independently provided, the electronic device further includes a bus 403 for connecting the memory 402 and the processor 401 .

本发明实施例还提供一种计算机存储介质，计算机存储介质中存储有计算机执行指令，当处理器执行计算机执行指令时，实现如上的方法。An embodiment of the present invention further provides a computer storage medium, in which computer execution instructions are stored. When a processor executes the computer execution instructions, the above method is implemented.

本发明实施例还提供一种计算机程序产品，包括计算机程序，计算机程序被处理器执行时，实现如上的方法。An embodiment of the present invention further provides a computer program product, including a computer program. When the computer program is executed by a processor, the above method is implemented.

在本发明所提供的实施例中，应该理解到，所揭露的设备和方法，可以通过其它的方式实现。例如，以上所描述的设备实施例仅仅是示意性的，例如，模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个模块可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或模块的间接耦合或通信连接，可以是电性，机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic, for example, the division of modules is only a logical function division, and there may be other division methods in actual implementation, such as multiple modules can be combined or integrated into another system, or some features can be ignored or not executed. Another point, the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or modules, which can be electrical, mechanical or other forms.

作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the modules may be selected according to actual needs to implement the solution of this embodiment.

另外，在本发明各个实施例中的各功能模块可以集成在一个处理单元中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个单元中。上述模块成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each module may exist physically separately, or two or more modules may be integrated into one unit. The above-mentioned module-composed unit may be implemented in the form of hardware or in the form of hardware plus software functional units.

上述以软件功能模块的形式实现的集成的模块，可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器执行本申请各个实施例方法的部分步骤。The above-mentioned integrated module implemented in the form of a software function module can be stored in a computer-readable storage medium. The above-mentioned software function module is stored in a storage medium, including a number of instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) or a processor to perform some steps of the methods of various embodiments of the present application.

应理解，上述处理器可以是中央处理单元(Central Processing Unit，简称CPU)，还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，简称DSP)、专用集成电路(Application Specific Integrated Circuit，简称ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合发明所公开的方法的步骤可以直接体现为硬件处理器执行完成，或者用处理器中的硬件及软件模块组合执行完成。It should be understood that the processor may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), etc. A general-purpose processor may be a microprocessor or any conventional processor. The steps of the method disclosed in the invention may be directly implemented as being executed by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor.

存储器可能包含高速RAM存储器，也可能还包括非易失性存储NVM，例如至少一个磁盘存储器，还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。The memory may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and may also be a USB flash drive, a mobile hard disk, a read-only memory, a magnetic disk or an optical disk, etc.

总线可以是工业标准体系结构(Industry Standard Architecture，简称ISA)总线、外部设备互连(Peripheral Component Interconnect，简称PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture，简称EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示，本申请附图中的总线并不限定仅有一根总线或一种类型的总线。The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, the bus in the drawings of the present application is not limited to only one bus or one type of bus.

上述存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器(ROM)，磁存储器，快闪存储器，磁盘或光盘。存储介质可以是通用或专用计算机能够存取的任何可用介质。The above storage medium can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The storage medium can be any available medium that can be accessed by a general or special purpose computer.

一种示例性的存储介质耦合至处理器，从而使处理器能够从该存储介质读取信息，且可向该存储介质写入信息。当然，存储介质也可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路(Application Specific Integrated Circuits，简称ASIC)中。当然，处理器和存储介质也可以作为分立组件存在于电子设备或主控设备中。An exemplary storage medium is coupled to a processor so that the processor can read information from the storage medium and write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and the storage medium can be located in an application specific integrated circuit (ASIC). Of course, the processor and the storage medium can also exist as discrete components in an electronic device or a main control device.

本领域普通技术人员可以理解：实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时，执行包括上述各方法实施例的步骤；而前述的存储介质包括：ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that all or part of the steps of implementing the above-mentioned method embodiments can be completed by hardware related to program instructions. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, the steps of the above-mentioned method embodiments are executed; and the aforementioned storage medium includes: ROM, RAM, disk or optical disk, etc., various media that can store program codes.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or replace some or all of the technical features therein by equivalents. However, these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An immersive video quality evaluation method based on multi-feature fusion, characterized in that it comprises the following steps:

A reference immersive video and an immersive video to be evaluated are obtained, wherein the reference immersive video includes a reference texture video sequence and a reference depth video sequence, and the immersive video to be evaluated includes a distorted texture video sequence and a distorted depth video sequence, and a 3D-LOG filter is used to extract features from the reference texture video sequence and the distorted texture video sequence to obtain reference texture features and distorted texture features; texture feature similarity is calculated based on the reference texture features and the distorted texture features, and a texture video quality score is obtained based on the texture feature similarity through a 3D-LOG pooling strategy;

A reference depth feature and a distorted depth feature are calculated based on the reference depth video sequence and the distorted depth video sequence; a depth feature similarity is calculated based on the reference depth feature and the distorted depth feature and a gradient weight is determined, and a depth video quality score is calculated based on the depth feature similarity and the gradient weight;

The quality score of the immersive video to be evaluated is calculated according to the texture video quality score and the depth video quality score.

2. The immersive video quality assessment method based on multi-feature fusion according to claim 1 is characterized in that the feature extraction of the reference texture video sequence and the distorted texture video sequence using a 3D-LOG filter to obtain reference texture features and distorted texture features specifically includes:

The calculation formula of the 3D-LOG filter is:

;

in, Represents the horizontal, vertical and time coordinates in the space-time domain, is the standard deviation of the 3D Gaussian kernel function, , represents the function corresponding to the 3D-LOG filter;

The reference texture video sequence and the distorted texture video sequence are respectively input into the 3D-LOG filter for convolution to obtain reference texture features and distorted texture features, as shown in the following formula:

;

in, and Respectively represent the brightness value corresponding to each pixel of the input reference texture video sequence and the distorted texture video sequence, and Respectively represent the reference texture features and the distorted texture features extracted by the 3D-LOG filter, The symbol represents the convolution operation.

3. The immersive video quality evaluation method based on multi-feature fusion according to claim 1 is characterized in that the texture feature similarity is calculated based on the reference texture feature and the distorted texture feature, and the texture video quality score is obtained based on the texture feature similarity through a 3D-LOG pooling strategy, specifically comprising:

The texture feature similarity is calculated using the following formula: :

;

in, and Respectively represent the reference texture features and distorted texture features extracted by 3D-LOG, is a constant that maintains numerical stability;

The maximum value of the reference texture feature and the distorted texture feature is used as the texture weight , as shown below:

;

Among them, max means taking the maximum value;

The texture weight and texture feature similarity are weighted to obtain a texture video quality score , as shown below:

.

4. The immersive video quality assessment method based on multi-feature fusion according to claim 1 is characterized in that the step of calculating the reference depth feature and the distortion depth feature according to the reference depth video sequence and the distorted depth video sequence to obtain the reference depth feature and the distortion depth feature specifically includes:

The gradient amplitudes of the reference texture video sequence and the distorted texture video sequence are calculated respectively to obtain the corresponding depth features:

;

in, represents a depth video frame, and denote the partial derivatives in the horizontal and vertical directions respectively, and Represent the gradient amplitude components in the horizontal and vertical directions respectively, represents a depth feature, and the depth features calculated from the depth video frames in the reference depth video sequence and the distorted depth video sequence correspond to the reference depth features and distortion depth features , The symbol represents the convolution operation.

5. The immersive video quality assessment method based on multi-feature fusion according to claim 1 is characterized in that the depth feature similarity is calculated based on the reference depth feature and the distorted depth feature and the gradient weight is determined, and the depth video quality score is calculated based on the depth feature similarity and the gradient weight, specifically comprising:

The following formula is used to calculate the deep feature similarity:

;

in, and represent the reference depth feature and the distorted depth feature respectively, represents the deep feature similarity, is another constant that maintains numerical stability;

The maximum value of the reference depth feature and the distorted depth feature is used as the gradient weight , as shown below:

;

Among them, max means taking the maximum value;

The gradient weight and the depth feature similarity are weighted to obtain a depth video quality score. , as shown below:

.

6. The immersive video quality evaluation method based on multi-feature fusion according to claim 1 is characterized in that the quality score of the immersive video to be evaluated is calculated according to the texture video quality score and the depth video quality score, specifically comprising:

The importance of the texture video quality score and the depth video quality score is calculated to obtain an importance score. , as shown below:

;

in, It is a parameter used to adjust the relative importance between texture features and depth features;

The maximum value of the absolute value of the texture video quality score and the absolute value of the depth video quality score is used as the evaluation weight , as shown below:

;

Among them, max means taking the maximum value. is the absolute value symbol;

The evaluation weight and the importance score are weightedly calculated to obtain the quality score MMF of the immersive video to be evaluated, as shown in the following formula:

;

Wherein, N represents the number of the immersive video sequences to be evaluated, i=1,2,…,N.

7. An immersive video quality assessment device based on multi-feature fusion, characterized in that it includes:

The texture video quality score calculation module is configured to obtain a reference immersive video and an immersive video to be evaluated, wherein the reference immersive video includes a reference texture video sequence and a reference depth video sequence, and the immersive video to be evaluated includes a distorted texture video sequence and a distorted depth video sequence, and extract features from the reference texture video sequence and the distorted texture video sequence using a 3D-LOG filter to obtain reference texture features and distorted texture features; calculate texture feature similarity based on the reference texture features and the distorted texture features, and obtain a texture video quality score based on the texture feature similarity using a 3D-LOG pooling strategy;

A depth video quality score calculation module is configured to calculate reference depth features and distorted depth features according to the reference depth video sequence and the distorted depth video sequence; calculate depth feature similarity according to the reference depth features and the distorted depth features and determine gradient weights, and calculate a depth video quality score according to the depth feature similarity and the gradient weights;

The quality score calculation module is configured to calculate the quality score of the immersive video to be evaluated according to the texture video quality score and the depth video quality score.

8. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1 to 6.

9. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the method according to any one of claims 1 to 6 is implemented.

10. A computer program product, comprising a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 6 is implemented.