CN113038123A

CN113038123A - No-reference panoramic video quality evaluation method, system, terminal and medium

Info

Publication number: CN113038123A
Application number: CN202110302516.1A
Authority: CN
Inventors: 王永芳; 夏雨蒙
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-06-25

Abstract

The invention provides a method and a system for evaluating the quality of a non-reference panoramic video, which comprises the following steps: obtaining an ERP plane structural feature of a local binary pattern based on a gradient domain; performing super-pixel segmentation processing to respectively obtain a weight based on human eye perception of the super-pixels and a weight based on projection relation of the super-pixels; extracting the structural characteristics of the single-frame image of the panoramic video according to the structural characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation; and estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video. A corresponding terminal and medium are also provided. The method does not need an original reference sequence and does not need to know the distortion type of the sequence, belongs to a no-reference video quality evaluation technology, is more suitable for evaluating the distortion condition generated in transmission in practical application, and has stronger practicability in a real-time communication system; the performance of the distorted experimental results is highest, and the robustness is better.

Description

No-reference panoramic video quality evaluation method, system, terminal and medium

Technical Field

The invention relates to a panoramic video quality evaluation method in the technical field of multimedia, in particular to a reference-free panoramic video quality evaluation method, a system, a terminal and a medium based on superpixels.

Background

With the rapid development of Virtual Reality (VR) technology, high-quality panoramic video is required for more and more VR applications. The panoramic image and the video solve the defects that the visual angle of a common plane graph is single and the omni-directional feeling cannot be brought, and the scenes of the panoramic image and the video are richer due to all scenes in the 360-degree spherical range recorded by the panoramic image and the video. High-resolution panoramic images and videos can provide users with more immersive and truer experience, however, in a truer communication system, panoramic images and videos need to be subjected to projection and compression processing to be stored, transmitted and processed conveniently, the quality of the user experience is reduced by the panoramic images and videos with reduced quality, and dizziness and discomfort of the users are caused more seriously.

Although the accuracy of the full-reference quality evaluation model is generally high, in an actual communication system, generally, the receiving side generally has no undistorted source image/video, and therefore the practicality of the full-reference model is greatly reduced. No-Reference video quality assessment (NR-PVQA) refers to a direct measure of the visual quality of a distorted sequence in the absence of a Reference sequence. The NR-PVQA does not need an original reference sequence and does not need to know the distortion type of the sequence, and is more suitable for estimating the distortion condition generated in transmission in practical application. Therefore, the design of a non-reference panoramic video quality assessment method is of great significance.

Through search, the following results are found:

the invention discloses a Chinese patent application, namely a panoramic video quality evaluation method, with the publication number of CN110691236A and the publication date of 2020, 1 month and 14 days, which divides an original panoramic video and a panoramic video to be evaluated into a plurality of frame groups, projects the frame groups and the panoramic video to be evaluated onto six planes of a cube by adopting a cube projection method, then calculates the spatial domain similarity and the frequency domain similarity of each frame group and each projection surface, and fuses the spatial domain similarity and the frequency domain similarity of all the frame groups and all the projection surfaces to obtain an objective evaluation value of the panoramic video to be evaluated. The method still has the following technical problems: the method is a full reference panoramic evaluation method, and needs an original reference panoramic image, which cannot be obtained in real application; the method only extracts the spatial domain similarity and the frequency domain similarity of the original panoramic image and the distorted image to evaluate the panoramic quality, does not consider the projection relation and the human eye perception characteristics, and cannot accurately reflect the distortion condition of the panoramic content between the observation space and the processing plane.

At present, no explanation or report of the similar technology of the invention is found, and similar data at home and abroad are not collected.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method, a system, a terminal and a medium for evaluating the quality of a no-reference panoramic video based on superpixels.

According to an aspect of the present invention, there is provided a method for evaluating a reference-free panoramic video, including:

obtaining an ERP (Enterprise resource planning, ERP) plane structure characteristic of a local binary pattern based on a gradient domain of a distorted image;

performing superpixel segmentation processing on the distorted image to respectively obtain a weight based on human eye perception of the superpixel and a weight based on projection relation of the superpixel;

extracting the structural characteristics of the single-frame image of the panoramic video according to the structural characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation;

and estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video.

Preferably, the obtaining of the ERP plane structural feature of the local gradient domain binary pattern of the distorted image includes:

calculating a gradient image of a single frame image of a distorted image in the panoramic video;

and on the basis of the gradient domain, encoding the pixels of the gradient image to obtain the ERP plane structural feature of the local binary pattern based on the gradient domain.

Preferably, the calculating a gradient image of a single frame image in the panoramic video includes:

calculating image gradient by using a Prewitt operator, and expressing the gradient level of the distorted image by convolution of the single-frame image and the templates in two directions of the Prewitt operator, wherein the gradient image g (x, y) of the distorted image I (x, y) is expressed as:

in the formula, denotes a convolution operation; p is a radical of_xAnd p_yTemplates representing the transverse and longitudinal directions for calculating the edges of the transverse and longitudinal directions, respectively;

the encoding of the pixels of the gradient image on the basis of the gradient domain comprises:

encoding the pixel points of the gradient image by using a uniform LBP operator with unchanged rotation to obtain the structural feature LBP of the local binary pattern based on the gradient domain_P,R：

Wherein P represents the number of elements around the pixel at the center point, R represents the selected radius of the surrounding pixels, and g_cRepresenting the gradient amplitude, g, of the center pixel_iRepresenting gradient amplitudes of surrounding pixel points; wherein:

where u is a uniform scale calculated bit by bit and represents the number of transitions of the binary sequence from 0 to 1 and from 1 to 0.

Preferably, the number of hops does not exceed 2 in total.

Preferably, the performing the super-pixel segmentation processing on the distorted image includes:

and gathering pixels in the distorted image by adopting a linear iterative clustering method, and further segmenting discrete pixels into super pixels consisting of a plurality of pixels.

Preferably, the obtaining of the weight of the human eye perception based on the super-pixels comprises:

let the size of the panoramic image be M × N, and the ordinate of the superpixel, i.e., the distance of the superpixel from the upper boundary of the panoramic image, be y₁，N-y₁The distance of the superpixel from the lower boundary of the panoramic image, the weight ω of the single superpixel_1iComprises the following steps:

ω_1i＝min{d₁,d₂}

wherein:

weight ω of each super pixel_1iDetermined by the closest of the superpixels to the boundary, and thus the human eye perception weight ω of each superpixel₁Is defined as:

ω₁＝min{y₁,N-y₁,…y_n,N-y_n}

wherein n represents the number of pixels in the super-pixel; omega₁The larger the distance, the closer the superpixel is to the equator of the panoramic image, and vice versa.

Preferably, the obtaining the weight based on the projection relation of the super-pixel comprises:

let the coordinates of ERP plane and spherical surface be (x, y) and (y) respectively in continuous space

The transformation relation between the ERP plane and the spherical surface is as follows:

wherein theta is larger than theta, and belongs to (-pi, pi),

thus, areaThe draw ratio SR is defined as:

the weight SR (i, j) of the digital image is defined as:

SR(i,j)＝SR(x(i,j),y(i,j))

if mxn is the size of the ERP planar image, then { (i, j) |0< i ≦ M,0< j ≦ N }, then the transformation relationship between the continuous domain and the discrete domain is:

thus, the projective relationship is ultimately defined as:

combining the projection relation with a superpixel segmentation graph obtained after superpixel segmentation processing, namely calculating the distance between all pixels in the superpixel and the upper and lower boundaries of the panoramic image, and using the projection relation corresponding to the pixel point with the minimum distance as the weight of the whole superpixel, namely the weight omega of the single superpixel in panoramic projection₂Comprises the following steps:

ω₂＝SR(i,d_min)

the omega₂I.e. the weight of the projection relation based on the superpixel;

wherein d is_minRepresenting the distance between all pixel points in each super pixel and the upper and lower boundaries of the panoramic image:

d_min＝min{y₁,N-y₁,…y_n,N-y_n}。

preferably, the extracting structural features of a single frame image of a panoramic video includes:

weighting ω the human eye perception₁And a weight ω of said projection relation₂And performing fusion, wherein the obtained fusion weight is as follows:

ω＝ω₁·ω₂

combining the obtained structural features of the ERP plane with the fusion weight omega, performing superposition statistics on the fusion weight with the same LBP code, and then normalizing to obtain the structural features PW (k) of the single-frame image of the panoramic video:

wherein:

in the formula, N is the number of pixels; k is the value condition of gradient domain LBP coding and represents the panoramic integration weight;

and carrying out down sampling on the distorted image for multiple times to obtain the structural features of the panorama on different scales.

Preferably, the estimating the panoramic video quality score comprises:

obtaining the structural characteristics of the panoramic video single-frame image of the first t frames of the panoramic video and taking the average value of the structural characteristics, aiming at the distorted video sequence and the structural characteristics PW of the panorama of the distorted video sequence_videoComprises the following steps:

in the formula, PW_iThe structural characteristics of a panoramic video single-frame image of the ith frame in the first t frames of the panoramic video are obtained;

structural feature PW of panorama to be obtained_videoAnd inputting the data into an SVR model to obtain a quality score prediction model, and mapping the characteristics into a final panoramic video quality score.

According to another aspect of the present invention, there is provided a reference-free panoramic video evaluation system, including:

the local structural feature acquisition module is used for acquiring the ERP plane structural feature of the local binary pattern based on the gradient domain of the distorted image;

the weight acquisition module is used for carrying out superpixel segmentation processing on the distorted image and respectively acquiring the weight based on human eye perception of the superpixel and the weight based on the projection relation of the superpixel;

the panoramic structure characteristic acquisition module is used for extracting the structure characteristics of the single-frame image of the panoramic video according to the structure characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation;

and the quality evaluation module is used for estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video.

According to a third aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform any of the methods described above.

According to a fourth aspect of the invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operable to perform the method of any of the above.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following beneficial effects:

the method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video, provided by the invention, can accurately reflect the distortion condition of panoramic contents between an observation space and a processing plane by constructing the panoramic weighting structure characteristic based on the projection relation (projection format) and human eye perception.

The method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video do not need an original reference sequence and do not need to know the distortion type of the sequence, belong to a non-reference video quality evaluation technology, and are more suitable for evaluating the distortion condition generated in transmission in practical application.

The method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video consider the characteristic that an observation space is inconsistent with a mapping space, but the used panoramic structural characteristics are more consistent with the perception of human eyes.

The method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video have stronger practicability in a real-time communication system.

According to the method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video, the performance of a distorted experimental result is highest, and the robustness is better.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a flowchart of a method for evaluating quality of a non-reference panoramic video according to an embodiment of the present invention.

Fig. 2 is a flowchart of a method for evaluating quality of a non-reference panoramic video according to a preferred embodiment of the present invention.

Fig. 3 is a diagram illustrating an LBP calculation process according to a preferred embodiment of the present invention.

FIG. 4 is a graph of gradient domain LBP in accordance with a preferred embodiment of the present invention; wherein, (a) is an original image, (b) is an LBP corresponding to the original image, and (c) to (f) are LBPs corresponding to JPEG distortion, JPEG2000 distortion, Gaussian noise and Gaussian blur in sequence.

FIG. 5 is a diagram illustrating the human eye perception weight based on superpixels in a preferred embodiment of the present invention.

Fig. 6 is a schematic diagram of a mapping relationship between an ERP plane and a spherical image in a preferred embodiment of the present invention.

FIG. 7 is a diagram illustrating a distorted image and its fusion weights in accordance with a preferred embodiment of the present invention; wherein, (a) is a distorted image, and (b) is a fusion weight.

FIG. 8 is a fitting scatter plot of the prediction scores and subjective scores according to a preferred embodiment of the invention; wherein, the (a) to (e) are fitting scatter diagrams of the prediction scores and the subjective scores obtained by the WS-SSIM, S-PSNR, WS-PSNR, CPP-PSNR and SP-PVQA methods respectively.

Fig. 9 is a schematic diagram of a component module of a no-reference panoramic video quality evaluation system according to an embodiment of the present invention.

Detailed Description

The following examples illustrate the invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are given. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

As shown in fig. 1, the method for evaluating quality of a non-reference panoramic video according to this embodiment may include the following steps:

s100, obtaining an ERP (equal-Rectangular Projection, ERP) plane structure characteristic of a local binary pattern based on a gradient domain of the distorted image;

s200, performing superpixel segmentation processing on the distorted image to respectively obtain a weight based on human eye perception of the superpixel and a weight based on projection relation of the superpixel;

s300, extracting the structural characteristics of the single-frame image of the panoramic video according to the structural characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation;

and S400, estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video.

In S100 of this embodiment, obtaining the ERP plane structure feature of the local gradient domain-based binary pattern of the distorted image may include the following steps:

s101, calculating a gradient image of a single frame image of a distorted image in the panoramic video;

and S102, on the basis of the gradient domain, encoding pixels of the gradient image to obtain an ERP plane structural feature of a local binary pattern based on the gradient domain.

Further, in S101 of this embodiment, calculating a gradient image of a single frame image in the panoramic video may include the following steps:

in the formula, denotes a convolution operation; p is a radical of_xAnd p_yTemplates representing both the lateral and longitudinal directions are used to calculate the lateral and longitudinal edges, respectively.

Further, in S102 of this embodiment, encoding the pixels of the gradient image on the basis of the gradient domain may include the steps of:

encoding pixel points of the gradient image by using a uniform LBP operator with unchanged rotation to obtain a structural characteristic LBP of a local binary pattern based on a gradient domain_P,R：

Further, the number of transitions does not exceed 2 in total.

In S200 of this embodiment, the super-pixel segmentation processing on the distorted image may include the steps of:

Further, the pixels to be aggregated are determined by:

and converting the color distorted image into a 5-dimensional feature vector, wherein the 5-dimensional feature vector comprises three-dimensional color information and two-dimensional space positions in a CIELAB color space, constructing a distance measurement standard for the 5-dimensional feature vector, and judging pixels to be aggregated through distance measurement.

In S200 of this embodiment, obtaining the weight based on human eye perception of the superpixel may include the steps of:

S2A1, setting the size of the panoramic image as M multiplied by N, and the vertical coordinate of the super pixel, namely the distance of the super pixel from the boundary on the panoramic image as y₁，N-y₁The distance of the superpixel from the lower boundary of the panoramic image, the weight ω of the single superpixel_1iComprises the following steps:

ω_1i＝min{d₁,d₂}

wherein:

S2A2, weight ω of each superpixel_1iDetermined by the closest of the superpixels to the boundary, and thus the human eye perception weight ω of each superpixel₁Is defined as:

ω₁＝min{y₁,N-y₁,…y_n,N-y_n}

In S200 of this embodiment, obtaining the weight based on the projection relationship of the superpixel may include the steps of:

S2B1, providedThe coordinates of the ERP plane and the spherical surface are (x, y) and (y) respectively in a continuous space domain

wherein theta is larger than theta, and belongs to (-pi, pi),

thus, the area stretch ratio SR is defined as:

the weight SR (i, j) of the digital image is defined as:

SR(i,j)＝SR(x(i,j),y(i,j))

S2B2, if M × N is the size of the ERP plane image, then { (i, j) |0< i ≦ M,0< j ≦ N }, then the transformation relationship between the continuous domain and the discrete domain is:

thus, the projective relationship is ultimately defined as:

S2B3, combining the projection relation with the super-pixel segmentation graph obtained after the super-pixel segmentation processing, namely calculating the distance between all pixels in the super-pixel and the upper and lower boundaries of the panoramic image, and using the projection relation corresponding to the pixel point with the minimum distance as the weight of the whole super-pixel, namely the weight omega of the single super-pixel in the panoramic projection₂Comprises the following steps:

ω₂＝SR(i,d_min)

ω₂i.e. the weight of the projection relation based on the superpixel;

d_min＝min{y₁,N-y₁,…y_n,N-y_n}。

in S300 of this embodiment, extracting the structural feature of a single frame image of a panoramic video may include the following steps:

s301, the weight omega perceived by human eyes is weighted₁Weight ω of sum projection relation₂And performing fusion, wherein the obtained fusion weight is as follows:

ω＝ω₁·ω₂

s302, combining the obtained ERP plane structural features with fusion weights omega, carrying out superposition statistics on the fusion weights with the same LBP codes, and then normalizing to obtain the structural features PW (k) of the panoramic video single-frame image:

wherein:

and S303, carrying out down-sampling on the distorted image for multiple times to obtain the structural features of the panorama on different scales.

In S400 of this embodiment, estimating the panoramic video quality score may include the following steps:

s401, obtaining the structural characteristics of a panoramic video single-frame image of the first t frames of the panoramic video and taking the average value of the structural characteristics, aiming at a distorted video sequence and the structural characteristics PW of the panorama of the distorted video sequence_videoComprises the following steps:

s402, obtaining structural feature PW of the obtained panorama_videoAnd inputting the data into an SVR model to obtain a quality score prediction model, and mapping the characteristics into a final panoramic video quality score.

The Quality evaluation method of the non-reference Panoramic Video provided by the above embodiment of the present invention performs Quality monitoring on the distorted Panoramic Video, and the object near the equator displayed by the subjective data of the Panoramic Video watched by the user attracts more attention of the human eyes, and by using this central theory, the above embodiment of the present invention provides a non-reference Panoramic Video Quality evaluation algorithm (SP-PVQA) based on superpixel segmentation. The embodiment constructs the panoramic weighting structure characteristic based on the projection format and the human eye perception, and can accurately reflect the distortion condition of the panoramic content between the observation space and the processing plane. The embodiment does not need an original reference sequence or the distortion type of the sequence, belongs to a no-reference video quality evaluation model, and is more suitable for evaluating the distortion condition generated in transmission in practical application.

The method for evaluating the quality of the non-reference panoramic video provided by the preferred embodiment comprises the steps of firstly expressing the structural characteristics of each frame of image in the panoramic video on an ERP plane by using the second derivative of each frame of image, secondly fusing the fusion (panoramic) weight formed by the projection format (projection relation) based on super pixels and the perception of human eyes with the structural characteristics to obtain the panoramic structural characteristics of a single frame, finally taking the average value of the first t frames in the video as the panoramic structural characteristics of a panoramic video sequence, and finally putting the obtained panoramic structural characteristics into an SVR model to establish a quality prediction model to finish quality evaluation.

As shown in fig. 2, the method for evaluating quality of a non-reference panoramic video according to the preferred embodiment may include the following steps:

step 1, extracting the ERP plane structural features of a local binary pattern based on a gradient domain: the second derivative of the image can effectively capture the change of local edges which has influence on the visual perception quality of the panoramic image. Therefore, firstly, the gradient strength of a single-frame image in the panoramic video is calculated to serve as first-order derivative information, pixels are encoded by using an LBP operator on the basis of a gradient domain, the ERP plane structural characteristics of a local binary pattern based on the gradient domain are obtained, and more detailed edge information is obtained;

step 2, calculating the weight of human eye perception based on the super pixels: the super-pixel segmentation method can gather similar pixels, namely discrete pixels are segmented into super-pixels consisting of a plurality of pixels, and compared with the discrete pixels, the segmentation into the super-pixels is closer to the understanding of human eyes on the image content;

and 3, calculating the weight based on the projection relation (projection format) of the superpixel: when a pixel point is mapped from the ERP plane to a sphere, the area of the pixel is stretched to different degrees. Thus, the relationship between the observation space and the treatment space can be expressed as the area stretch ratio of the two;

and 4, extracting structural features of the single-frame image of the panoramic video: the structural characteristics of the ERP plane in the step 1 are calculated on the ERP plane image, and the distortion on the spherical surface cannot be linearly reflected, so that the structural characteristics of the mapping weighted panorama can be obtained by combining the weight graphs obtained in the step 2 and the step 3 with the structural characteristics of the ERP plane in the step 1, the distortion condition on the spherical surface can be accurately reflected, and the subjective perception of human eyes is closer;

step 5, panoramic image quality score estimation: and 4, mapping the characteristics of the distorted image into a final panoramic video quality score by the quality score prediction model obtained by the structural characteristic training of the single-frame image of the panoramic video obtained in the step 4.

As a preferred embodiment, in step 1, the method for extracting the ERP planar structural feature based on the local binary pattern of the gradient domain includes the following steps:

calculating the gradient of the image by using a Prewitt operator which is simple in calculation, wherein the gradient level of the distorted image is represented by convolution of the image and the templates in two directions of the Prewitt operator, and I (x, y) represents the distorted image, and then the gradient image is calculated as follows:

wherein, represents convolution operation, p_xAnd p_yThe templates represent the transverse and longitudinal directions, the edges of which are calculated respectively, I (x, y) and g (x, y) represent the distorted image and the corresponding gradient image respectively.

Using a uniform LBP operator with unchanged rotation to encode pixel points of the gradient image, wherein the calculation formula is as follows:

wherein P represents the number of elements around the pixel at the center point, R represents the selected radius of the surrounding pixels, and g_cRepresenting the gradient amplitude, g, of the center pixel_iRepresenting the gradient magnitude of surrounding pixels. Wherein:

u is a uniform scale calculated bit by bit, i.e. the number of transitions of the binary sequence from 0 to 1 and from 1 to 0 does not exceed 2. The LBP describes the relationship between the central pixel point and the surrounding pixel points of the images, and the local structure modes of the images can effectively describe the image structure distortion caused by different distortion reasons.

As a preferred embodiment, in step 2, the method of obtaining the weight based on the human eye perception of the superpixel is as follows:

first, a simple linear iterative clustering method (Si) is usedSample Linear iterative, SLIC) carries out superpixel segmentation processing on the distorted image, and superpixels can be obtained; then, further by the central theory that the content near the equator is more likely to attract attention, the human eye perception weight of each super pixel is calculated separately. Let the size of the panoramic image be M N and the vertical coordinate of the super-pixel be y₁I.e. the distance of the superpixel from the border on the panoramic image, N-y₁The distance from the superpixel to the lower boundary of the panoramic image, the weight of a single pixel is:

ω_1i＝min{d₁,d₂}

wherein:

the weight of each super-pixel should be determined by the closest pixel to the boundary among the super-pixels, so the human eye perception weight of each super-pixel is defined as:

ω₁＝min{y₁,N-y₁,…y_n,N-y_n}

where n represents the number of pixels in a super-pixel. Omega₁The larger the distance, the closer the superpixel is to the equator of the panoramic image, and vice versa.

As a preferred embodiment, in step 3, the method for obtaining the weight based on the projection relation of the super-pixel comprises:

The transformation relationship between the two is as follows:

wherein theta is larger than theta, and belongs to (-pi, pi),

thus, the area stretch ratio SR (Stretching ratio, SR) can be defined as:

the weight SR (i, j) of the digital image can be defined as:

SR(i,j)＝SR(x(i,j),y(i,j))

thus, the projective relationship is ultimately defined as:

combining the projection relation with the superpixel segmentation graph, namely calculating the distances between all pixels in the superpixels and the upper and lower boundaries of the panoramic image, and using the projection relation corresponding to the pixel point with the minimum distance as the weight of the whole superpixel, namely the weight of the single superpixel in the panoramic projection is as follows:

ω₂＝SR(i,d_min)

ω₂i.e. the weight of the projection relation based on the superpixel;

d_min＝min{y₁,N-y₁,…y_n,N-y_n}。

as a preferred embodiment, in step 4, the weight based on human eye perception and the weight based on projection relationship are fused, and the fused weight (fusion weight) can be obtained as follows:

ω＝ω₁·ω₂

the brighter parts in the figure represent the greater the weight of the superpixel.

Combining the obtained ERP plane structural features with the fusion weights, carrying out superposition statistics on the fusion weights with the same LBP codes, and then normalizing to obtain the structural features of the panoramic video single-frame image, wherein the calculation formula is as follows:

wherein:

n is the number of pixels, k is the value of gradient domain LBP coding and is the overall panoramic weight, and PW (k) is the structural feature of the panorama. Considering that a human visual system can capture different information on different image scales, a distorted image is downsampled for multiple times (for example, 4 times) to obtain panoramic structure features on different scales, and fig. 7 lists a statistical histogram of the panoramic structure under different distortion conditions, so that it can be seen that different distortions cause the panoramic features of a single frame to be significantly changed from the panoramic features of a single frame in an original video.

As a preferred embodiment, in step 5, in order to avoid causing vertigo and discomfort to the user, the content of the panoramic video usually has little change in scene, so by using this feature, the structural features of a single frame image of the panoramic video are respectively calculated for the first t frames of the panoramic video and averaged, and for a distorted video sequence, the panoramic structural features are defined as:

finally obtaining the characteristic PW of the panoramic structure_videoAnd putting the model into an SVR model to obtain a quality score prediction model, and mapping the characteristics into a final quality score.

The technical solutions provided by the preferred embodiments of the present invention are further described in detail below with reference to the accompanying drawings.

The method for evaluating the quality of the non-reference panoramic video provided by the preferred embodiment has the specific steps shown in fig. 2. The programming simulation implementation method under the Win10 environment comprises the following specific steps:

step 1, extracting the ERP plane structural features of a local binary pattern based on a gradient domain: the second derivative of the Image can effectively capture the change of Local edge that has an effect on the visual perception Quality of the panoramic Image (refer to Y.Fang, J.Yan, L.Li, J.Wu and W.Lin, "No Reference Quality Association for Screen Content Images with Box Local and Global Feature reproduction," in IEEE Transactions on Image Processing, Ap. 27, No.4, pp.1600-1610, April 2018.). Therefore, firstly, the gradient strength of a single-frame image in the panoramic video is calculated to serve as first-order derivative information, pixels are coded by using an LBP operator on the basis of a gradient domain, the ERP plane structural characteristics of a local binary pattern based on the gradient domain are obtained, and more detailed edge information is obtained;

step 2, calculating the weight of human eye perception based on the super pixels: the superpixel segmentation method can aggregate similar pixels, that is, segment discrete pixels into superpixels consisting of a plurality of pixels, and segment the superpixels closer to the understanding of the human eye about the Image content than the discrete pixels (refer to J.Lei et al, "A non-structural frame for display object detection," in IEEE Transactions on Multimedia, vol.18, No.9, pp.1783-1795, Sept.2016 and Y.Fang, X.Zhang, N.Imalogu, "A non-structural-based display detection model for 360-depth images," Signal Processing: Communication, vol.69, vol.1-7,2018.);

and 3, calculating the weight based on the projection relation of the super pixels: as shown in fig. 6, when the pixel points are mapped from the ERP plane to the spherical surface, the area of the pixel is stretched to different degrees. Thus, the relationship between the observation space and the treatment space can be expressed as the area stretch ratio of the two;

In step 1, calculating the gradient of the image by using a Prewitt operator which is simple in calculation, wherein the gradient level of the distorted image is represented by convolution of the image and templates in two directions of the Prewitt operator, and I (x, y) represents the distorted image, and then the gradient image is calculated as follows:

"+" denotes the convolution operation, p_xAnd p_yThe templates represent the transverse and longitudinal directions, the edges of which are calculated respectively, I (x, y) and g (x, y) represent the distorted image and the corresponding gradient image respectively.

The original LBP operator definition is calculated in a 3 × 3 window, as shown in fig. 3, the central pixel of the window is used as a threshold, the gray value of 8 adjacent pixels is compared with the gray value of the central pixel, if the gray value of the surrounding pixels is greater than or equal to the gray value of the central pixel, the position of the pixel is coded as 1, otherwise, the pixel is 0, an 8-bit binary number is generated in a clockwise direction, when the LBP operator definition is used, the binary number is usually converted into a decimal number, i.e., an LBP code, and the structural information of the window area is reflected by the value. In the conventional LBP calculation process as shown in fig. 3, 8 pixels in a 3 × 3 neighborhood are compared to generate 8-bit binary numbers, and 256 patterns are generated in total when the binary numbers are converted into decimal numbers, so that the generated patterns are excessive. Therefore, in order to solve the problem of excessive binary patterns, a uniform LBP operator with unchanged rotation is used for encoding the pixel points of the gradient image, and the calculation formula is as follows:

p represents the number of elements around the pixel of the central point, R represents the selected radius of the surrounding pixels, g_cRepresenting the gradient amplitude, g, of the center pixel_iRepresenting the gradient magnitude of surrounding pixels. Wherein:

u is a uniform scale calculated bit by bit, i.e. the number of transitions of the binary sequence from 0 to 1 and from 1 to 0 does not exceed 2. The LBP describes the relationship between the central pixel point and the surrounding pixel points of the image, and the local structure patterns of these images can effectively describe the image structure distortion caused by different distortion reasons, as shown in fig. 4, the LBP maps of the gradient domain of the reference image and the corresponding distorted image are listed, wherein (b) is the gradient domain LBP coding map of the panoramic reference image, and (c) to (f) represent four different types of distortion, and it can be seen that different distortion types can cause different changes in LBP coding, so the LBP in the gradient domain can effectively describe the image distortion.

In step 2, first, a Simple Linear Iterative Clustering (SLIC) (r. achanta, a. shaji, k. smith, et al. "SLIC superpixels compounded to state-of-the-art superpixel methods." IEEE Transactions on Pattern Analysis and Machine Analysis vol.34, no11, pp.2274-2282,2012) is used to perform a superpixel segmentation process on the distorted image, which is advantageous in that the amount of computation is small, the number of superpixel shapes is customizable, and the generated superpixel shapes are standardized.

Super-pixels can be obtained through SLIC segmentation algorithm, and furthermore, the super-pixels are easier to absorb through the content near the equatorFocusing on this central theory, the human eye perception weight of each super pixel is calculated separately, and the principle is shown in fig. 5. The weight of each super pixel is determined by the minimum distance between all pixel points in the super pixel and the upper and lower boundaries of the panoramic image, the size of the panoramic image is assumed to be M multiplied by N, and the vertical coordinate of the pixel is assumed to be y₁I.e. the distance of the pixel from the boundary on the panoramic image, N-y₁The distance from the pixel to the lower boundary of the panoramic image, the weight of the single pixel is:

ω_1i＝min{d₁，d₂} (5)

wherein:

ω₁＝min{y₁,N-y₁,…y_n，N-y_n} (7)

where n represents the number of pixels in the super-pixel.₁The larger the distance, the closer the superpixel is to the equator of the panoramic image, and vice versa.

Through the above calculation process, the structural features on the ERP plane can be obtained, but because a nonlinear relationship exists between the processing plane and the observation space, the features on the processing plane cannot accurately reflect the quality change of the observation space, and therefore, the panoramic weighted structural features need to be further extracted by using the characteristics of the panoramic video.

The weights of the projection format based on superpixels are obtained in step 3, assuming that the coordinates of ERP and sphere are (x, y) respectively in continuous space,

the transformation relationship between the two is as follows:

wherein theta is larger than theta, and belongs to (-pi, pi),

thus, the area stretch ratio SR (Stretching ratio, SR) can be defined as:

the weight SR (i, j) of the digital image can be defined as:

SR(i,j)＝SR(x(i,j),y(i,j)) (10)

assuming that M × N is the size of the ERP image, then { (i, j) |0< i ≦ M,0< j ≦ N }, then the transformation relationship between the continuous domain and the discrete domain is:

thus, the projective relationship is ultimately defined as:

ω₂＝SR(i,d_min) (13)

d_min＝min{y₁,N-y₁,…y_n,N-y_n} (14)

in step 4, we fuse the weight based on human eye perception and the weight based on projection relation, and then the weight after fusion can be obtained as:

ω＝ω₁·ω₂ (15)

as shown in fig. 7, the brighter portion indicates that the super pixel has a larger weight.

Combining the structural features obtained on the ERP plane with the fusion weights, carrying out superposition statistics on the fusion weights with the same LBP codes, and then normalizing to obtain the panoramic structural features of the panoramic video single-frame image, wherein the calculation formula is as follows:

wherein:

n is the number of pixels, k is the value of gradient domain LBP coding and is the overall panoramic weight, and PW (k) is the structural feature of the panorama. Considering that a human eye vision system can capture different information on different image scales, carrying out down-sampling on a distorted image for 4 times to obtain panoramic structure characteristics on different scales, and drawing a panoramic structure statistical histogram under different distortion conditions, wherein different distortions cause the panoramic characteristics of a single frame and the panoramic characteristics of the single frame in an original video to be obviously changed.

In step 5, in order to avoid causing vertigo and discomfort to the user, the content of the panoramic video usually has little change in scene, so by using this characteristic, the structural features of a single frame image of the panoramic video are respectively calculated for the first t frames of the panoramic video and averaged, and for a distorted video sequence, the panoramic structural features are defined as:

finally obtaining the characteristic PW of the panoramic structure_videoPutting into SVR to obtain mass fractionAnd the number prediction model is used for mapping the characteristics into a final quality score.

An experiment was conducted on a VR-VQA video data set (referred to as M.xu, C.Li, Y.Liu, X.Deng and J.Lu, "A objective visual quality assessment method of panoramic video," in IEEE International Conference on Multimedia and Expo (ICME), Hong Kong,2017, pp.517-522.) to evaluate the superpixel-based non-reference panoramic video quality assessment method (SP-PVQA) proposed by the above-described embodiment of the present invention. The experimental process is that the distorted images are divided into a training set and a testing set, 80% of the distorted images are the training set, the rest 20% of the distorted images are tested and repeated 1000 times, and the median values of the spearman order correlation coefficient (SRCC), the Pilson Linear Correlation Coefficient (PLCC), the Kendel order correlation coefficient (KRCC) and the Root Mean Square Error (RMSE) in the results are respectively taken as performance evaluation indexes. Table 1 lists the performance of five full-reference panoramic methods, namely peak signal-to-noise ratio (PSNR), Cassier parabola mapping peak signal-to-noise ratio (CPP-PSNR), spherical weighted peak signal-to-noise ratio (WS-PSNR), spherical peak signal-to-noise ratio (S-PSNR) and spherical weighted structure similarity (WS-SSIM), compared with the performance of the SP-PVQA method provided by the embodiment of the invention, and the indexes with the highest indexes are shown in bold.

TABLE 1 comparison of Overall Properties

From table 1, it can be seen that the SP-PVQA proposed by the above embodiment of the present invention is optimal in all three indexes compared to other full reference quality evaluation methods, and although RMSE is slightly inferior, the SP-PVQA proposed by the above embodiment of the present invention does not require any reference video information, and is more practical in a real-time communication system. In particular, the two models, WS-PSNR and WS-SSIM, also take into account the characteristics of the inconsistency between the observation space and the mapping space, but the panoramic structure features used by the SP-PVQA method proposed by the above embodiments of the present invention are more consistent with the perception of human eyes.

The SP-PVQA method proposed by the above embodiment of the present invention may also be applied to a panoramic image, calculate the structural feature of the panoramic distortion image on the ERP plane, then use the same steps to obtain the fusion weight, use the fusion weight to weight the feature on the plane into the panoramic structural feature capable of representing spherical distortion, and put the panoramic structural feature into the SVR to obtain the visual quality score of the distortion image.

In order to verify the performance of the non-reference panoramic video Quality evaluation method (SP-PVQA) proposed in the above embodiment of the present invention in panoramic image evaluation, experiments were performed on the OIQA image dataset (refer to h.dean, g.zhai, x.min, y.zhu, y.fang and x.yang, "Perceptual Quality Assessment of International Images," in IEEE International symmetry on Circuits and Systems (ISCAS), Florence, 2018, pp.1-5.). And dividing the distorted images into a training set and a testing set, wherein 80% of the distorted images are the training set, the rest 20% of the distorted images are tested, repeating the testing for 1000 times, and respectively selecting the median values of PLCC, SRCC, KRCC and RMSE in the results as performance evaluation indexes. Table 2 shows the performance of the PSNR and SSIM conventional methods, CPP-PSNR, WS-PSNR, S-PSNR, and WS-SSIM panoramic methods, respectively, compared with the SP-PVQA method proposed in the above embodiment of the present invention, and also shows the performance of the methods in which only the Projection Format (PF) is considered as the weight or the Human eye Perception characteristic (HP), and the four algorithms with the highest indexes in table 2 are shown in bold.

TABLE 2 comparison of Overall Properties

As can be seen from table 2, the SP-PVQA method proposed in the above embodiment of the present invention achieves the best performance in all four indexes, and in addition, the PF and HP which only consider one weighting factor also achieve good performance, and further, the two are combined to achieve the best performance. As shown in FIG. 8, a fitting scatter diagram between the predicted scores and the subjective scores obtained by the WS-SSIM, S-PSNR, WS-PSNR, CPP-PSNR and SP-PVQA methods on the same test set is shown, and it can be seen that the SP-PVQA method provided by the above embodiment of the present invention has the best fitting effect.

Another embodiment of the present invention provides a no-reference panoramic video evaluation system, as shown in fig. 9, which may include: the system comprises a local structural feature acquisition module, a weight acquisition module, a panoramic structural feature acquisition module and a quality evaluation module; wherein:

A third embodiment of the invention provides a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being operable to perform any of the methods described above when executing the program.

Optionally, a memory for storing a program; a Memory, which may include a volatile Memory (abbreviated RAM), such as a Random-Access Memory (RAM), a static Random-Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the memory may also comprise a non-volatile memory, such as a flash memory. The memories are used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in partition in the memory or memories. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.

A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments. Reference may be made in particular to the description relating to the preceding method embodiment.

The processor and the memory may be separate structures or may be an integrated structure integrated together. When the processor and the memory are separate structures, the memory, the processor may be coupled by a bus.

A fourth embodiment of the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operable to perform the method of any one of the preceding claims.

The method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video provided by the embodiments of the invention are based on super pixels, firstly, the structural characteristics of each frame of image in the panoramic video on an ERP plane are expressed by using the second derivative of each frame of image in the panoramic video, secondly, the fusion panoramic weight formed by the projection format based on the super pixels and the human eye perception is fused with the structural characteristics to obtain the panoramic structural characteristics of a single frame, and finally, the average value of the first t frames in the video is taken as the panoramic structural characteristics of a panoramic video sequence, and finally, the obtained panoramic structural characteristics are put into an SVR to establish a quality prediction model. Experiments are carried out on the disclosed panoramic video subjective quality evaluation database, and it is proved that the no-reference panoramic video quality evaluation method, system, terminal and medium provided by the embodiments of the invention can evaluate distortion conditions generated in transmission in practical application.

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may implement the composition of the system by referring to the technical solution of the method, that is, the embodiment in the method may be understood as a preferred example for constructing the system, and will not be described herein again.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims

1. A no-reference panoramic video evaluation method is characterized by comprising the following steps:

obtaining an ERP plane structural feature of a local binary pattern based on a gradient domain of a distorted image;

2. The method for evaluating the non-reference panoramic video according to claim 1, wherein the obtaining of the ERP plane structural feature of the local binary pattern based on the gradient domain of the distorted image comprises:

3. The method of claim 2, wherein the calculating the gradient image of the single frame image in the panoramic video comprises:

encoding the pixel points of the gradient image by using a uniform LBP operator with unchanged rotation to obtain the structural feature LBP of the local binary pattern based on the gradient domain_P，R：

4. The method for evaluating the non-reference panoramic video according to claim 1, wherein the performing the super-pixel segmentation process on the distorted image comprises:

5. The method for evaluating the non-reference panoramic video according to claim 1, wherein the obtaining the weight of the human eye perception based on the superpixel comprises the following steps:

ω_1i＝min{d₁,d₂}

wherein:

ω₁＝min{y₁，N-y₁，…y_n，N-y_n}

wherein n represents the number of pixels in the super-pixel; omega₁The larger the distance, the closer the superpixel is to the equator of the panoramic image, otherwise, the farther the superpixel is from the equator;

the obtaining of the weight based on the projection relation of the super-pixel comprises:

wherein theta is larger than theta, and belongs to (-pi, pi),

thus, the area stretch ratio SR is defined as:

the weight SR (i, j) of the digital image is defined as:

SR(i，j)＝SR(x(i，j)，y(i，j))

thus, the projective relationship is ultimately defined as:

will throw the powder inCombining the shadow relationship with the super-pixel segmentation graph obtained after the super-pixel segmentation processing, namely calculating the distance between all pixels in the super-pixels and the upper and lower boundaries of the panoramic image, and using the projection relationship corresponding to the pixel point with the minimum distance as the weight of the whole super-pixel, namely the weight omega of the single super-pixel in the panoramic projection₂Comprises the following steps:

ω₂＝SR(i，d_min)

the omega₂I.e. the weight of the projection relation based on the superpixel;

d_min＝min{y₁，N-y₁，…y_n，N-y_n}。

6. the method for evaluating the non-reference panoramic video according to claim 1, wherein the extracting the structural features of the single-frame image of the panoramic video comprises:

ω＝ω₁·ω₂

wherein:

7. The method of claim 1, wherein estimating the panoramic video quality score comprises:

8. A reference-free panoramic video evaluation system, comprising:

9. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, is operative to perform the method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 7.