CN113038123A - No-reference panoramic video quality evaluation method, system, terminal and medium - Google Patents

No-reference panoramic video quality evaluation method, system, terminal and medium Download PDF

Info

Publication number
CN113038123A
CN113038123A CN202110302516.1A CN202110302516A CN113038123A CN 113038123 A CN113038123 A CN 113038123A CN 202110302516 A CN202110302516 A CN 202110302516A CN 113038123 A CN113038123 A CN 113038123A
Authority
CN
China
Prior art keywords
image
weight
panoramic
panoramic video
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110302516.1A
Other languages
Chinese (zh)
Inventor
王永芳
夏雨蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202110302516.1A priority Critical patent/CN113038123A/en
Publication of CN113038123A publication Critical patent/CN113038123A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details

Abstract

The invention provides a method and a system for evaluating the quality of a non-reference panoramic video, which comprises the following steps: obtaining an ERP plane structural feature of a local binary pattern based on a gradient domain; performing super-pixel segmentation processing to respectively obtain a weight based on human eye perception of the super-pixels and a weight based on projection relation of the super-pixels; extracting the structural characteristics of the single-frame image of the panoramic video according to the structural characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation; and estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video. A corresponding terminal and medium are also provided. The method does not need an original reference sequence and does not need to know the distortion type of the sequence, belongs to a no-reference video quality evaluation technology, is more suitable for evaluating the distortion condition generated in transmission in practical application, and has stronger practicability in a real-time communication system; the performance of the distorted experimental results is highest, and the robustness is better.

Description

No-reference panoramic video quality evaluation method, system, terminal and medium
Technical Field
The invention relates to a panoramic video quality evaluation method in the technical field of multimedia, in particular to a reference-free panoramic video quality evaluation method, a system, a terminal and a medium based on superpixels.
Background
With the rapid development of Virtual Reality (VR) technology, high-quality panoramic video is required for more and more VR applications. The panoramic image and the video solve the defects that the visual angle of a common plane graph is single and the omni-directional feeling cannot be brought, and the scenes of the panoramic image and the video are richer due to all scenes in the 360-degree spherical range recorded by the panoramic image and the video. High-resolution panoramic images and videos can provide users with more immersive and truer experience, however, in a truer communication system, panoramic images and videos need to be subjected to projection and compression processing to be stored, transmitted and processed conveniently, the quality of the user experience is reduced by the panoramic images and videos with reduced quality, and dizziness and discomfort of the users are caused more seriously.
Although the accuracy of the full-reference quality evaluation model is generally high, in an actual communication system, generally, the receiving side generally has no undistorted source image/video, and therefore the practicality of the full-reference model is greatly reduced. No-Reference video quality assessment (NR-PVQA) refers to a direct measure of the visual quality of a distorted sequence in the absence of a Reference sequence. The NR-PVQA does not need an original reference sequence and does not need to know the distortion type of the sequence, and is more suitable for estimating the distortion condition generated in transmission in practical application. Therefore, the design of a non-reference panoramic video quality assessment method is of great significance.
Through search, the following results are found:
the invention discloses a Chinese patent application, namely a panoramic video quality evaluation method, with the publication number of CN110691236A and the publication date of 2020, 1 month and 14 days, which divides an original panoramic video and a panoramic video to be evaluated into a plurality of frame groups, projects the frame groups and the panoramic video to be evaluated onto six planes of a cube by adopting a cube projection method, then calculates the spatial domain similarity and the frequency domain similarity of each frame group and each projection surface, and fuses the spatial domain similarity and the frequency domain similarity of all the frame groups and all the projection surfaces to obtain an objective evaluation value of the panoramic video to be evaluated. The method still has the following technical problems: the method is a full reference panoramic evaluation method, and needs an original reference panoramic image, which cannot be obtained in real application; the method only extracts the spatial domain similarity and the frequency domain similarity of the original panoramic image and the distorted image to evaluate the panoramic quality, does not consider the projection relation and the human eye perception characteristics, and cannot accurately reflect the distortion condition of the panoramic content between the observation space and the processing plane.
At present, no explanation or report of the similar technology of the invention is found, and similar data at home and abroad are not collected.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method, a system, a terminal and a medium for evaluating the quality of a no-reference panoramic video based on superpixels.
According to an aspect of the present invention, there is provided a method for evaluating a reference-free panoramic video, including:
obtaining an ERP (Enterprise resource planning, ERP) plane structure characteristic of a local binary pattern based on a gradient domain of a distorted image;
performing superpixel segmentation processing on the distorted image to respectively obtain a weight based on human eye perception of the superpixel and a weight based on projection relation of the superpixel;
extracting the structural characteristics of the single-frame image of the panoramic video according to the structural characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation;
and estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video.
Preferably, the obtaining of the ERP plane structural feature of the local gradient domain binary pattern of the distorted image includes:
calculating a gradient image of a single frame image of a distorted image in the panoramic video;
and on the basis of the gradient domain, encoding the pixels of the gradient image to obtain the ERP plane structural feature of the local binary pattern based on the gradient domain.
Preferably, the calculating a gradient image of a single frame image in the panoramic video includes:
calculating image gradient by using a Prewitt operator, and expressing the gradient level of the distorted image by convolution of the single-frame image and the templates in two directions of the Prewitt operator, wherein the gradient image g (x, y) of the distorted image I (x, y) is expressed as:
Figure BDA0002986877260000021
in the formula, denotes a convolution operation; p is a radical ofxAnd pyTemplates representing the transverse and longitudinal directions for calculating the edges of the transverse and longitudinal directions, respectively;
the encoding of the pixels of the gradient image on the basis of the gradient domain comprises:
encoding the pixel points of the gradient image by using a uniform LBP operator with unchanged rotation to obtain the structural feature LBP of the local binary pattern based on the gradient domainP,R
Figure BDA0002986877260000031
Wherein P represents the number of elements around the pixel at the center point, R represents the selected radius of the surrounding pixels, and gcRepresenting the gradient amplitude, g, of the center pixeliRepresenting gradient amplitudes of surrounding pixel points; wherein:
Figure BDA0002986877260000032
Figure BDA0002986877260000033
where u is a uniform scale calculated bit by bit and represents the number of transitions of the binary sequence from 0 to 1 and from 1 to 0.
Preferably, the number of hops does not exceed 2 in total.
Preferably, the performing the super-pixel segmentation processing on the distorted image includes:
and gathering pixels in the distorted image by adopting a linear iterative clustering method, and further segmenting discrete pixels into super pixels consisting of a plurality of pixels.
Preferably, the obtaining of the weight of the human eye perception based on the super-pixels comprises:
let the size of the panoramic image be M × N, and the ordinate of the superpixel, i.e., the distance of the superpixel from the upper boundary of the panoramic image, be y1,N-y1The distance of the superpixel from the lower boundary of the panoramic image, the weight ω of the single superpixel1iComprises the following steps:
ω1i=min{d1,d2}
wherein:
Figure BDA0002986877260000034
weight ω of each super pixel1iDetermined by the closest of the superpixels to the boundary, and thus the human eye perception weight ω of each superpixel1Is defined as:
ω1=min{y1,N-y1,…yn,N-yn}
wherein n represents the number of pixels in the super-pixel; omega1The larger the distance, the closer the superpixel is to the equator of the panoramic image, and vice versa.
Preferably, the obtaining the weight based on the projection relation of the super-pixel comprises:
let the coordinates of ERP plane and spherical surface be (x, y) and (y) respectively in continuous space
Figure BDA0002986877260000035
The transformation relation between the ERP plane and the spherical surface is as follows:
Figure BDA0002986877260000041
wherein theta is larger than theta, and belongs to (-pi, pi),
Figure BDA0002986877260000042
thus, areaThe draw ratio SR is defined as:
Figure BDA0002986877260000043
the weight SR (i, j) of the digital image is defined as:
SR(i,j)=SR(x(i,j),y(i,j))
if mxn is the size of the ERP planar image, then { (i, j) |0< i ≦ M,0< j ≦ N }, then the transformation relationship between the continuous domain and the discrete domain is:
Figure BDA0002986877260000044
thus, the projective relationship is ultimately defined as:
Figure BDA0002986877260000045
combining the projection relation with a superpixel segmentation graph obtained after superpixel segmentation processing, namely calculating the distance between all pixels in the superpixel and the upper and lower boundaries of the panoramic image, and using the projection relation corresponding to the pixel point with the minimum distance as the weight of the whole superpixel, namely the weight omega of the single superpixel in panoramic projection2Comprises the following steps:
ω2=SR(i,dmin)
the omega2I.e. the weight of the projection relation based on the superpixel;
wherein d isminRepresenting the distance between all pixel points in each super pixel and the upper and lower boundaries of the panoramic image:
dmin=min{y1,N-y1,…yn,N-yn}。
preferably, the extracting structural features of a single frame image of a panoramic video includes:
weighting ω the human eye perception1And a weight ω of said projection relation2And performing fusion, wherein the obtained fusion weight is as follows:
ω=ω1·ω2
combining the obtained structural features of the ERP plane with the fusion weight omega, performing superposition statistics on the fusion weight with the same LBP code, and then normalizing to obtain the structural features PW (k) of the single-frame image of the panoramic video:
Figure BDA0002986877260000046
wherein:
Figure BDA0002986877260000047
in the formula, N is the number of pixels; k is the value condition of gradient domain LBP coding and represents the panoramic integration weight;
and carrying out down sampling on the distorted image for multiple times to obtain the structural features of the panorama on different scales.
Preferably, the estimating the panoramic video quality score comprises:
obtaining the structural characteristics of the panoramic video single-frame image of the first t frames of the panoramic video and taking the average value of the structural characteristics, aiming at the distorted video sequence and the structural characteristics PW of the panorama of the distorted video sequencevideoComprises the following steps:
Figure BDA0002986877260000051
in the formula, PWiThe structural characteristics of a panoramic video single-frame image of the ith frame in the first t frames of the panoramic video are obtained;
structural feature PW of panorama to be obtainedvideoAnd inputting the data into an SVR model to obtain a quality score prediction model, and mapping the characteristics into a final panoramic video quality score.
According to another aspect of the present invention, there is provided a reference-free panoramic video evaluation system, including:
the local structural feature acquisition module is used for acquiring the ERP plane structural feature of the local binary pattern based on the gradient domain of the distorted image;
the weight acquisition module is used for carrying out superpixel segmentation processing on the distorted image and respectively acquiring the weight based on human eye perception of the superpixel and the weight based on the projection relation of the superpixel;
the panoramic structure characteristic acquisition module is used for extracting the structure characteristics of the single-frame image of the panoramic video according to the structure characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation;
and the quality evaluation module is used for estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video.
According to a third aspect of the present invention, there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform any of the methods described above.
According to a fourth aspect of the invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operable to perform the method of any of the above.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following beneficial effects:
the method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video, provided by the invention, can accurately reflect the distortion condition of panoramic contents between an observation space and a processing plane by constructing the panoramic weighting structure characteristic based on the projection relation (projection format) and human eye perception.
The method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video do not need an original reference sequence and do not need to know the distortion type of the sequence, belong to a non-reference video quality evaluation technology, and are more suitable for evaluating the distortion condition generated in transmission in practical application.
The method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video consider the characteristic that an observation space is inconsistent with a mapping space, but the used panoramic structural characteristics are more consistent with the perception of human eyes.
The method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video have stronger practicability in a real-time communication system.
According to the method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video, the performance of a distorted experimental result is highest, and the robustness is better.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
fig. 1 is a flowchart of a method for evaluating quality of a non-reference panoramic video according to an embodiment of the present invention.
Fig. 2 is a flowchart of a method for evaluating quality of a non-reference panoramic video according to a preferred embodiment of the present invention.
Fig. 3 is a diagram illustrating an LBP calculation process according to a preferred embodiment of the present invention.
FIG. 4 is a graph of gradient domain LBP in accordance with a preferred embodiment of the present invention; wherein, (a) is an original image, (b) is an LBP corresponding to the original image, and (c) to (f) are LBPs corresponding to JPEG distortion, JPEG2000 distortion, Gaussian noise and Gaussian blur in sequence.
FIG. 5 is a diagram illustrating the human eye perception weight based on superpixels in a preferred embodiment of the present invention.
Fig. 6 is a schematic diagram of a mapping relationship between an ERP plane and a spherical image in a preferred embodiment of the present invention.
FIG. 7 is a diagram illustrating a distorted image and its fusion weights in accordance with a preferred embodiment of the present invention; wherein, (a) is a distorted image, and (b) is a fusion weight.
FIG. 8 is a fitting scatter plot of the prediction scores and subjective scores according to a preferred embodiment of the invention; wherein, the (a) to (e) are fitting scatter diagrams of the prediction scores and the subjective scores obtained by the WS-SSIM, S-PSNR, WS-PSNR, CPP-PSNR and SP-PVQA methods respectively.
Fig. 9 is a schematic diagram of a component module of a no-reference panoramic video quality evaluation system according to an embodiment of the present invention.
Detailed Description
The following examples illustrate the invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are given. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.
Fig. 1 is a flowchart of a method for evaluating quality of a non-reference panoramic video according to an embodiment of the present invention.
As shown in fig. 1, the method for evaluating quality of a non-reference panoramic video according to this embodiment may include the following steps:
s100, obtaining an ERP (equal-Rectangular Projection, ERP) plane structure characteristic of a local binary pattern based on a gradient domain of the distorted image;
s200, performing superpixel segmentation processing on the distorted image to respectively obtain a weight based on human eye perception of the superpixel and a weight based on projection relation of the superpixel;
s300, extracting the structural characteristics of the single-frame image of the panoramic video according to the structural characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation;
and S400, estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video.
In S100 of this embodiment, obtaining the ERP plane structure feature of the local gradient domain-based binary pattern of the distorted image may include the following steps:
s101, calculating a gradient image of a single frame image of a distorted image in the panoramic video;
and S102, on the basis of the gradient domain, encoding pixels of the gradient image to obtain an ERP plane structural feature of a local binary pattern based on the gradient domain.
Further, in S101 of this embodiment, calculating a gradient image of a single frame image in the panoramic video may include the following steps:
calculating image gradient by using a Prewitt operator, and expressing the gradient level of the distorted image by convolution of the single-frame image and the templates in two directions of the Prewitt operator, wherein the gradient image g (x, y) of the distorted image I (x, y) is expressed as:
Figure BDA0002986877260000071
in the formula, denotes a convolution operation; p is a radical ofxAnd pyTemplates representing both the lateral and longitudinal directions are used to calculate the lateral and longitudinal edges, respectively.
Further, in S102 of this embodiment, encoding the pixels of the gradient image on the basis of the gradient domain may include the steps of:
encoding pixel points of the gradient image by using a uniform LBP operator with unchanged rotation to obtain a structural characteristic LBP of a local binary pattern based on a gradient domainP,R
Figure BDA0002986877260000081
Wherein P represents the number of elements around the pixel at the center point, R represents the selected radius of the surrounding pixels, and gcRepresenting the gradient amplitude, g, of the center pixeliRepresenting gradient amplitudes of surrounding pixel points; wherein:
Figure BDA0002986877260000082
Figure BDA0002986877260000083
where u is a uniform scale calculated bit by bit and represents the number of transitions of the binary sequence from 0 to 1 and from 1 to 0.
Further, the number of transitions does not exceed 2 in total.
In S200 of this embodiment, the super-pixel segmentation processing on the distorted image may include the steps of:
and gathering pixels in the distorted image by adopting a linear iterative clustering method, and further segmenting discrete pixels into super pixels consisting of a plurality of pixels.
Further, the pixels to be aggregated are determined by:
and converting the color distorted image into a 5-dimensional feature vector, wherein the 5-dimensional feature vector comprises three-dimensional color information and two-dimensional space positions in a CIELAB color space, constructing a distance measurement standard for the 5-dimensional feature vector, and judging pixels to be aggregated through distance measurement.
In S200 of this embodiment, obtaining the weight based on human eye perception of the superpixel may include the steps of:
S2A1, setting the size of the panoramic image as M multiplied by N, and the vertical coordinate of the super pixel, namely the distance of the super pixel from the boundary on the panoramic image as y1,N-y1The distance of the superpixel from the lower boundary of the panoramic image, the weight ω of the single superpixel1iComprises the following steps:
ω1i=min{d1,d2}
wherein:
Figure BDA0002986877260000084
S2A2, weight ω of each superpixel1iDetermined by the closest of the superpixels to the boundary, and thus the human eye perception weight ω of each superpixel1Is defined as:
ω1=min{y1,N-y1,…yn,N-yn}
wherein n represents the number of pixels in the super-pixel; omega1The larger the distance, the closer the superpixel is to the equator of the panoramic image, and vice versa.
In S200 of this embodiment, obtaining the weight based on the projection relationship of the superpixel may include the steps of:
S2B1, providedThe coordinates of the ERP plane and the spherical surface are (x, y) and (y) respectively in a continuous space domain
Figure BDA0002986877260000096
The transformation relation between the ERP plane and the spherical surface is as follows:
Figure BDA0002986877260000091
wherein theta is larger than theta, and belongs to (-pi, pi),
Figure BDA0002986877260000092
thus, the area stretch ratio SR is defined as:
Figure BDA0002986877260000093
the weight SR (i, j) of the digital image is defined as:
SR(i,j)=SR(x(i,j),y(i,j))
S2B2, if M × N is the size of the ERP plane image, then { (i, j) |0< i ≦ M,0< j ≦ N }, then the transformation relationship between the continuous domain and the discrete domain is:
Figure BDA0002986877260000094
thus, the projective relationship is ultimately defined as:
Figure BDA0002986877260000095
S2B3, combining the projection relation with the super-pixel segmentation graph obtained after the super-pixel segmentation processing, namely calculating the distance between all pixels in the super-pixel and the upper and lower boundaries of the panoramic image, and using the projection relation corresponding to the pixel point with the minimum distance as the weight of the whole super-pixel, namely the weight omega of the single super-pixel in the panoramic projection2Comprises the following steps:
ω2=SR(i,dmin)
ω2i.e. the weight of the projection relation based on the superpixel;
wherein d isminRepresenting the distance between all pixel points in each super pixel and the upper and lower boundaries of the panoramic image:
dmin=min{y1,N-y1,…yn,N-yn}。
in S300 of this embodiment, extracting the structural feature of a single frame image of a panoramic video may include the following steps:
s301, the weight omega perceived by human eyes is weighted1Weight ω of sum projection relation2And performing fusion, wherein the obtained fusion weight is as follows:
ω=ω1·ω2
s302, combining the obtained ERP plane structural features with fusion weights omega, carrying out superposition statistics on the fusion weights with the same LBP codes, and then normalizing to obtain the structural features PW (k) of the panoramic video single-frame image:
Figure BDA0002986877260000101
wherein:
Figure BDA0002986877260000102
in the formula, N is the number of pixels; k is the value condition of gradient domain LBP coding and represents the panoramic integration weight;
and S303, carrying out down-sampling on the distorted image for multiple times to obtain the structural features of the panorama on different scales.
In S400 of this embodiment, estimating the panoramic video quality score may include the following steps:
s401, obtaining the structural characteristics of a panoramic video single-frame image of the first t frames of the panoramic video and taking the average value of the structural characteristics, aiming at a distorted video sequence and the structural characteristics PW of the panorama of the distorted video sequencevideoComprises the following steps:
Figure BDA0002986877260000103
in the formula, PWiThe structural characteristics of a panoramic video single-frame image of the ith frame in the first t frames of the panoramic video are obtained;
s402, obtaining structural feature PW of the obtained panoramavideoAnd inputting the data into an SVR model to obtain a quality score prediction model, and mapping the characteristics into a final panoramic video quality score.
The Quality evaluation method of the non-reference Panoramic Video provided by the above embodiment of the present invention performs Quality monitoring on the distorted Panoramic Video, and the object near the equator displayed by the subjective data of the Panoramic Video watched by the user attracts more attention of the human eyes, and by using this central theory, the above embodiment of the present invention provides a non-reference Panoramic Video Quality evaluation algorithm (SP-PVQA) based on superpixel segmentation. The embodiment constructs the panoramic weighting structure characteristic based on the projection format and the human eye perception, and can accurately reflect the distortion condition of the panoramic content between the observation space and the processing plane. The embodiment does not need an original reference sequence or the distortion type of the sequence, belongs to a no-reference video quality evaluation model, and is more suitable for evaluating the distortion condition generated in transmission in practical application.
Fig. 2 is a flowchart of a method for evaluating quality of a non-reference panoramic video according to a preferred embodiment of the present invention.
The method for evaluating the quality of the non-reference panoramic video provided by the preferred embodiment comprises the steps of firstly expressing the structural characteristics of each frame of image in the panoramic video on an ERP plane by using the second derivative of each frame of image, secondly fusing the fusion (panoramic) weight formed by the projection format (projection relation) based on super pixels and the perception of human eyes with the structural characteristics to obtain the panoramic structural characteristics of a single frame, finally taking the average value of the first t frames in the video as the panoramic structural characteristics of a panoramic video sequence, and finally putting the obtained panoramic structural characteristics into an SVR model to establish a quality prediction model to finish quality evaluation.
As shown in fig. 2, the method for evaluating quality of a non-reference panoramic video according to the preferred embodiment may include the following steps:
step 1, extracting the ERP plane structural features of a local binary pattern based on a gradient domain: the second derivative of the image can effectively capture the change of local edges which has influence on the visual perception quality of the panoramic image. Therefore, firstly, the gradient strength of a single-frame image in the panoramic video is calculated to serve as first-order derivative information, pixels are encoded by using an LBP operator on the basis of a gradient domain, the ERP plane structural characteristics of a local binary pattern based on the gradient domain are obtained, and more detailed edge information is obtained;
step 2, calculating the weight of human eye perception based on the super pixels: the super-pixel segmentation method can gather similar pixels, namely discrete pixels are segmented into super-pixels consisting of a plurality of pixels, and compared with the discrete pixels, the segmentation into the super-pixels is closer to the understanding of human eyes on the image content;
and 3, calculating the weight based on the projection relation (projection format) of the superpixel: when a pixel point is mapped from the ERP plane to a sphere, the area of the pixel is stretched to different degrees. Thus, the relationship between the observation space and the treatment space can be expressed as the area stretch ratio of the two;
and 4, extracting structural features of the single-frame image of the panoramic video: the structural characteristics of the ERP plane in the step 1 are calculated on the ERP plane image, and the distortion on the spherical surface cannot be linearly reflected, so that the structural characteristics of the mapping weighted panorama can be obtained by combining the weight graphs obtained in the step 2 and the step 3 with the structural characteristics of the ERP plane in the step 1, the distortion condition on the spherical surface can be accurately reflected, and the subjective perception of human eyes is closer;
step 5, panoramic image quality score estimation: and 4, mapping the characteristics of the distorted image into a final panoramic video quality score by the quality score prediction model obtained by the structural characteristic training of the single-frame image of the panoramic video obtained in the step 4.
As a preferred embodiment, in step 1, the method for extracting the ERP planar structural feature based on the local binary pattern of the gradient domain includes the following steps:
calculating the gradient of the image by using a Prewitt operator which is simple in calculation, wherein the gradient level of the distorted image is represented by convolution of the image and the templates in two directions of the Prewitt operator, and I (x, y) represents the distorted image, and then the gradient image is calculated as follows:
Figure BDA0002986877260000111
wherein, represents convolution operation, pxAnd pyThe templates represent the transverse and longitudinal directions, the edges of which are calculated respectively, I (x, y) and g (x, y) represent the distorted image and the corresponding gradient image respectively.
Using a uniform LBP operator with unchanged rotation to encode pixel points of the gradient image, wherein the calculation formula is as follows:
Figure BDA0002986877260000121
wherein P represents the number of elements around the pixel at the center point, R represents the selected radius of the surrounding pixels, and gcRepresenting the gradient amplitude, g, of the center pixeliRepresenting the gradient magnitude of surrounding pixels. Wherein:
Figure BDA0002986877260000122
Figure BDA0002986877260000123
u is a uniform scale calculated bit by bit, i.e. the number of transitions of the binary sequence from 0 to 1 and from 1 to 0 does not exceed 2. The LBP describes the relationship between the central pixel point and the surrounding pixel points of the images, and the local structure modes of the images can effectively describe the image structure distortion caused by different distortion reasons.
As a preferred embodiment, in step 2, the method of obtaining the weight based on the human eye perception of the superpixel is as follows:
first, a simple linear iterative clustering method (Si) is usedSample Linear iterative, SLIC) carries out superpixel segmentation processing on the distorted image, and superpixels can be obtained; then, further by the central theory that the content near the equator is more likely to attract attention, the human eye perception weight of each super pixel is calculated separately. Let the size of the panoramic image be M N and the vertical coordinate of the super-pixel be y1I.e. the distance of the superpixel from the border on the panoramic image, N-y1The distance from the superpixel to the lower boundary of the panoramic image, the weight of a single pixel is:
ω1i=min{d1,d2}
wherein:
Figure BDA0002986877260000124
the weight of each super-pixel should be determined by the closest pixel to the boundary among the super-pixels, so the human eye perception weight of each super-pixel is defined as:
ω1=min{y1,N-y1,…yn,N-yn}
where n represents the number of pixels in a super-pixel. Omega1The larger the distance, the closer the superpixel is to the equator of the panoramic image, and vice versa.
As a preferred embodiment, in step 3, the method for obtaining the weight based on the projection relation of the super-pixel comprises:
let the coordinates of ERP plane and spherical surface be (x, y) and (y) respectively in continuous space
Figure BDA0002986877260000125
The transformation relationship between the two is as follows:
Figure BDA0002986877260000131
wherein theta is larger than theta, and belongs to (-pi, pi),
Figure BDA0002986877260000132
thus, the area stretch ratio SR (Stretching ratio, SR) can be defined as:
Figure BDA0002986877260000133
the weight SR (i, j) of the digital image can be defined as:
SR(i,j)=SR(x(i,j),y(i,j))
if mxn is the size of the ERP planar image, then { (i, j) |0< i ≦ M,0< j ≦ N }, then the transformation relationship between the continuous domain and the discrete domain is:
Figure BDA0002986877260000134
thus, the projective relationship is ultimately defined as:
Figure BDA0002986877260000135
combining the projection relation with the superpixel segmentation graph, namely calculating the distances between all pixels in the superpixels and the upper and lower boundaries of the panoramic image, and using the projection relation corresponding to the pixel point with the minimum distance as the weight of the whole superpixel, namely the weight of the single superpixel in the panoramic projection is as follows:
ω2=SR(i,dmin)
ω2i.e. the weight of the projection relation based on the superpixel;
wherein d isminRepresenting the distance between all pixel points in each super pixel and the upper and lower boundaries of the panoramic image:
dmin=min{y1,N-y1,…yn,N-yn}。
as a preferred embodiment, in step 4, the weight based on human eye perception and the weight based on projection relationship are fused, and the fused weight (fusion weight) can be obtained as follows:
ω=ω1·ω2
the brighter parts in the figure represent the greater the weight of the superpixel.
Combining the obtained ERP plane structural features with the fusion weights, carrying out superposition statistics on the fusion weights with the same LBP codes, and then normalizing to obtain the structural features of the panoramic video single-frame image, wherein the calculation formula is as follows:
Figure BDA0002986877260000136
wherein:
Figure BDA0002986877260000141
n is the number of pixels, k is the value of gradient domain LBP coding and is the overall panoramic weight, and PW (k) is the structural feature of the panorama. Considering that a human visual system can capture different information on different image scales, a distorted image is downsampled for multiple times (for example, 4 times) to obtain panoramic structure features on different scales, and fig. 7 lists a statistical histogram of the panoramic structure under different distortion conditions, so that it can be seen that different distortions cause the panoramic features of a single frame to be significantly changed from the panoramic features of a single frame in an original video.
As a preferred embodiment, in step 5, in order to avoid causing vertigo and discomfort to the user, the content of the panoramic video usually has little change in scene, so by using this feature, the structural features of a single frame image of the panoramic video are respectively calculated for the first t frames of the panoramic video and averaged, and for a distorted video sequence, the panoramic structural features are defined as:
Figure BDA0002986877260000142
finally obtaining the characteristic PW of the panoramic structurevideoAnd putting the model into an SVR model to obtain a quality score prediction model, and mapping the characteristics into a final quality score.
The technical solutions provided by the preferred embodiments of the present invention are further described in detail below with reference to the accompanying drawings.
The method for evaluating the quality of the non-reference panoramic video provided by the preferred embodiment has the specific steps shown in fig. 2. The programming simulation implementation method under the Win10 environment comprises the following specific steps:
step 1, extracting the ERP plane structural features of a local binary pattern based on a gradient domain: the second derivative of the Image can effectively capture the change of Local edge that has an effect on the visual perception Quality of the panoramic Image (refer to Y.Fang, J.Yan, L.Li, J.Wu and W.Lin, "No Reference Quality Association for Screen Content Images with Box Local and Global Feature reproduction," in IEEE Transactions on Image Processing, Ap. 27, No.4, pp.1600-1610, April 2018.). Therefore, firstly, the gradient strength of a single-frame image in the panoramic video is calculated to serve as first-order derivative information, pixels are coded by using an LBP operator on the basis of a gradient domain, the ERP plane structural characteristics of a local binary pattern based on the gradient domain are obtained, and more detailed edge information is obtained;
step 2, calculating the weight of human eye perception based on the super pixels: the superpixel segmentation method can aggregate similar pixels, that is, segment discrete pixels into superpixels consisting of a plurality of pixels, and segment the superpixels closer to the understanding of the human eye about the Image content than the discrete pixels (refer to J.Lei et al, "A non-structural frame for display object detection," in IEEE Transactions on Multimedia, vol.18, No.9, pp.1783-1795, Sept.2016 and Y.Fang, X.Zhang, N.Imalogu, "A non-structural-based display detection model for 360-depth images," Signal Processing: Communication, vol.69, vol.1-7,2018.);
and 3, calculating the weight based on the projection relation of the super pixels: as shown in fig. 6, when the pixel points are mapped from the ERP plane to the spherical surface, the area of the pixel is stretched to different degrees. Thus, the relationship between the observation space and the treatment space can be expressed as the area stretch ratio of the two;
and 4, extracting structural features of the single-frame image of the panoramic video: the structural characteristics of the ERP plane in the step 1 are calculated on the ERP plane image, and the distortion on the spherical surface cannot be linearly reflected, so that the structural characteristics of the mapping weighted panorama can be obtained by combining the weight graphs obtained in the step 2 and the step 3 with the structural characteristics of the ERP plane in the step 1, the distortion condition on the spherical surface can be accurately reflected, and the subjective perception of human eyes is closer;
step 5, panoramic image quality score estimation: and 4, mapping the characteristics of the distorted image into a final panoramic video quality score by the quality score prediction model obtained by the structural characteristic training of the single-frame image of the panoramic video obtained in the step 4.
In step 1, calculating the gradient of the image by using a Prewitt operator which is simple in calculation, wherein the gradient level of the distorted image is represented by convolution of the image and templates in two directions of the Prewitt operator, and I (x, y) represents the distorted image, and then the gradient image is calculated as follows:
Figure BDA0002986877260000151
"+" denotes the convolution operation, pxAnd pyThe templates represent the transverse and longitudinal directions, the edges of which are calculated respectively, I (x, y) and g (x, y) represent the distorted image and the corresponding gradient image respectively.
The original LBP operator definition is calculated in a 3 × 3 window, as shown in fig. 3, the central pixel of the window is used as a threshold, the gray value of 8 adjacent pixels is compared with the gray value of the central pixel, if the gray value of the surrounding pixels is greater than or equal to the gray value of the central pixel, the position of the pixel is coded as 1, otherwise, the pixel is 0, an 8-bit binary number is generated in a clockwise direction, when the LBP operator definition is used, the binary number is usually converted into a decimal number, i.e., an LBP code, and the structural information of the window area is reflected by the value. In the conventional LBP calculation process as shown in fig. 3, 8 pixels in a 3 × 3 neighborhood are compared to generate 8-bit binary numbers, and 256 patterns are generated in total when the binary numbers are converted into decimal numbers, so that the generated patterns are excessive. Therefore, in order to solve the problem of excessive binary patterns, a uniform LBP operator with unchanged rotation is used for encoding the pixel points of the gradient image, and the calculation formula is as follows:
Figure BDA0002986877260000152
p represents the number of elements around the pixel of the central point, R represents the selected radius of the surrounding pixels, gcRepresenting the gradient amplitude, g, of the center pixeliRepresenting the gradient magnitude of surrounding pixels. Wherein:
Figure BDA0002986877260000153
Figure BDA0002986877260000154
u is a uniform scale calculated bit by bit, i.e. the number of transitions of the binary sequence from 0 to 1 and from 1 to 0 does not exceed 2. The LBP describes the relationship between the central pixel point and the surrounding pixel points of the image, and the local structure patterns of these images can effectively describe the image structure distortion caused by different distortion reasons, as shown in fig. 4, the LBP maps of the gradient domain of the reference image and the corresponding distorted image are listed, wherein (b) is the gradient domain LBP coding map of the panoramic reference image, and (c) to (f) represent four different types of distortion, and it can be seen that different distortion types can cause different changes in LBP coding, so the LBP in the gradient domain can effectively describe the image distortion.
In step 2, first, a Simple Linear Iterative Clustering (SLIC) (r. achanta, a. shaji, k. smith, et al. "SLIC superpixels compounded to state-of-the-art superpixel methods." IEEE Transactions on Pattern Analysis and Machine Analysis vol.34, no11, pp.2274-2282,2012) is used to perform a superpixel segmentation process on the distorted image, which is advantageous in that the amount of computation is small, the number of superpixel shapes is customizable, and the generated superpixel shapes are standardized.
Super-pixels can be obtained through SLIC segmentation algorithm, and furthermore, the super-pixels are easier to absorb through the content near the equatorFocusing on this central theory, the human eye perception weight of each super pixel is calculated separately, and the principle is shown in fig. 5. The weight of each super pixel is determined by the minimum distance between all pixel points in the super pixel and the upper and lower boundaries of the panoramic image, the size of the panoramic image is assumed to be M multiplied by N, and the vertical coordinate of the pixel is assumed to be y1I.e. the distance of the pixel from the boundary on the panoramic image, N-y1The distance from the pixel to the lower boundary of the panoramic image, the weight of the single pixel is:
ω1i=min{d1,d2} (5)
wherein:
Figure BDA0002986877260000161
the weight of each super-pixel should be determined by the closest pixel to the boundary among the super-pixels, so the human eye perception weight of each super-pixel is defined as:
ω1=min{y1,N-y1,…yn,N-yn} (7)
where n represents the number of pixels in the super-pixel.1The larger the distance, the closer the superpixel is to the equator of the panoramic image, and vice versa.
Through the above calculation process, the structural features on the ERP plane can be obtained, but because a nonlinear relationship exists between the processing plane and the observation space, the features on the processing plane cannot accurately reflect the quality change of the observation space, and therefore, the panoramic weighted structural features need to be further extracted by using the characteristics of the panoramic video.
The weights of the projection format based on superpixels are obtained in step 3, assuming that the coordinates of ERP and sphere are (x, y) respectively in continuous space,
Figure BDA0002986877260000162
the transformation relationship between the two is as follows:
Figure BDA0002986877260000163
wherein theta is larger than theta, and belongs to (-pi, pi),
Figure BDA0002986877260000171
thus, the area stretch ratio SR (Stretching ratio, SR) can be defined as:
Figure BDA0002986877260000172
the weight SR (i, j) of the digital image can be defined as:
SR(i,j)=SR(x(i,j),y(i,j)) (10)
assuming that M × N is the size of the ERP image, then { (i, j) |0< i ≦ M,0< j ≦ N }, then the transformation relationship between the continuous domain and the discrete domain is:
Figure BDA0002986877260000173
thus, the projective relationship is ultimately defined as:
Figure BDA0002986877260000174
combining the projection relation with the superpixel segmentation graph, namely calculating the distances between all pixels in the superpixels and the upper and lower boundaries of the panoramic image, and using the projection relation corresponding to the pixel point with the minimum distance as the weight of the whole superpixel, namely the weight of the single superpixel in the panoramic projection is as follows:
ω2=SR(i,dmin) (13)
wherein d isminRepresenting the distance between all pixel points in each super pixel and the upper and lower boundaries of the panoramic image:
dmin=min{y1,N-y1,…yn,N-yn} (14)
in step 4, we fuse the weight based on human eye perception and the weight based on projection relation, and then the weight after fusion can be obtained as:
ω=ω1·ω2 (15)
as shown in fig. 7, the brighter portion indicates that the super pixel has a larger weight.
Combining the structural features obtained on the ERP plane with the fusion weights, carrying out superposition statistics on the fusion weights with the same LBP codes, and then normalizing to obtain the panoramic structural features of the panoramic video single-frame image, wherein the calculation formula is as follows:
Figure BDA0002986877260000175
wherein:
Figure BDA0002986877260000176
n is the number of pixels, k is the value of gradient domain LBP coding and is the overall panoramic weight, and PW (k) is the structural feature of the panorama. Considering that a human eye vision system can capture different information on different image scales, carrying out down-sampling on a distorted image for 4 times to obtain panoramic structure characteristics on different scales, and drawing a panoramic structure statistical histogram under different distortion conditions, wherein different distortions cause the panoramic characteristics of a single frame and the panoramic characteristics of the single frame in an original video to be obviously changed.
In step 5, in order to avoid causing vertigo and discomfort to the user, the content of the panoramic video usually has little change in scene, so by using this characteristic, the structural features of a single frame image of the panoramic video are respectively calculated for the first t frames of the panoramic video and averaged, and for a distorted video sequence, the panoramic structural features are defined as:
Figure BDA0002986877260000181
finally obtaining the characteristic PW of the panoramic structurevideoPutting into SVR to obtain mass fractionAnd the number prediction model is used for mapping the characteristics into a final quality score.
An experiment was conducted on a VR-VQA video data set (referred to as M.xu, C.Li, Y.Liu, X.Deng and J.Lu, "A objective visual quality assessment method of panoramic video," in IEEE International Conference on Multimedia and Expo (ICME), Hong Kong,2017, pp.517-522.) to evaluate the superpixel-based non-reference panoramic video quality assessment method (SP-PVQA) proposed by the above-described embodiment of the present invention. The experimental process is that the distorted images are divided into a training set and a testing set, 80% of the distorted images are the training set, the rest 20% of the distorted images are tested and repeated 1000 times, and the median values of the spearman order correlation coefficient (SRCC), the Pilson Linear Correlation Coefficient (PLCC), the Kendel order correlation coefficient (KRCC) and the Root Mean Square Error (RMSE) in the results are respectively taken as performance evaluation indexes. Table 1 lists the performance of five full-reference panoramic methods, namely peak signal-to-noise ratio (PSNR), Cassier parabola mapping peak signal-to-noise ratio (CPP-PSNR), spherical weighted peak signal-to-noise ratio (WS-PSNR), spherical peak signal-to-noise ratio (S-PSNR) and spherical weighted structure similarity (WS-SSIM), compared with the performance of the SP-PVQA method provided by the embodiment of the invention, and the indexes with the highest indexes are shown in bold.
TABLE 1 comparison of Overall Properties
Figure BDA0002986877260000182
From table 1, it can be seen that the SP-PVQA proposed by the above embodiment of the present invention is optimal in all three indexes compared to other full reference quality evaluation methods, and although RMSE is slightly inferior, the SP-PVQA proposed by the above embodiment of the present invention does not require any reference video information, and is more practical in a real-time communication system. In particular, the two models, WS-PSNR and WS-SSIM, also take into account the characteristics of the inconsistency between the observation space and the mapping space, but the panoramic structure features used by the SP-PVQA method proposed by the above embodiments of the present invention are more consistent with the perception of human eyes.
The SP-PVQA method proposed by the above embodiment of the present invention may also be applied to a panoramic image, calculate the structural feature of the panoramic distortion image on the ERP plane, then use the same steps to obtain the fusion weight, use the fusion weight to weight the feature on the plane into the panoramic structural feature capable of representing spherical distortion, and put the panoramic structural feature into the SVR to obtain the visual quality score of the distortion image.
In order to verify the performance of the non-reference panoramic video Quality evaluation method (SP-PVQA) proposed in the above embodiment of the present invention in panoramic image evaluation, experiments were performed on the OIQA image dataset (refer to h.dean, g.zhai, x.min, y.zhu, y.fang and x.yang, "Perceptual Quality Assessment of International Images," in IEEE International symmetry on Circuits and Systems (ISCAS), Florence, 2018, pp.1-5.). And dividing the distorted images into a training set and a testing set, wherein 80% of the distorted images are the training set, the rest 20% of the distorted images are tested, repeating the testing for 1000 times, and respectively selecting the median values of PLCC, SRCC, KRCC and RMSE in the results as performance evaluation indexes. Table 2 shows the performance of the PSNR and SSIM conventional methods, CPP-PSNR, WS-PSNR, S-PSNR, and WS-SSIM panoramic methods, respectively, compared with the SP-PVQA method proposed in the above embodiment of the present invention, and also shows the performance of the methods in which only the Projection Format (PF) is considered as the weight or the Human eye Perception characteristic (HP), and the four algorithms with the highest indexes in table 2 are shown in bold.
TABLE 2 comparison of Overall Properties
Figure BDA0002986877260000191
As can be seen from table 2, the SP-PVQA method proposed in the above embodiment of the present invention achieves the best performance in all four indexes, and in addition, the PF and HP which only consider one weighting factor also achieve good performance, and further, the two are combined to achieve the best performance. As shown in FIG. 8, a fitting scatter diagram between the predicted scores and the subjective scores obtained by the WS-SSIM, S-PSNR, WS-PSNR, CPP-PSNR and SP-PVQA methods on the same test set is shown, and it can be seen that the SP-PVQA method provided by the above embodiment of the present invention has the best fitting effect.
Another embodiment of the present invention provides a no-reference panoramic video evaluation system, as shown in fig. 9, which may include: the system comprises a local structural feature acquisition module, a weight acquisition module, a panoramic structural feature acquisition module and a quality evaluation module; wherein:
the local structural feature acquisition module is used for acquiring the ERP plane structural feature of the local binary pattern based on the gradient domain of the distorted image;
the weight acquisition module is used for carrying out superpixel segmentation processing on the distorted image and respectively acquiring the weight based on human eye perception of the superpixel and the weight based on the projection relation of the superpixel;
the panoramic structure characteristic acquisition module is used for extracting the structure characteristics of the single-frame image of the panoramic video according to the structure characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation;
and the quality evaluation module is used for estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video.
A third embodiment of the invention provides a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being operable to perform any of the methods described above when executing the program.
Optionally, a memory for storing a program; a Memory, which may include a volatile Memory (abbreviated RAM), such as a Random-Access Memory (RAM), a static Random-Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the memory may also comprise a non-volatile memory, such as a flash memory. The memories are used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in partition in the memory or memories. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
The computer programs, computer instructions, etc. described above may be stored in one or more memories in a partitioned manner. And the computer programs, computer instructions, data, etc. described above may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps of the method according to the above embodiments. Reference may be made in particular to the description relating to the preceding method embodiment.
The processor and the memory may be separate structures or may be an integrated structure integrated together. When the processor and the memory are separate structures, the memory, the processor may be coupled by a bus.
A fourth embodiment of the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operable to perform the method of any one of the preceding claims.
The method, the system, the terminal and the medium for evaluating the quality of the non-reference panoramic video provided by the embodiments of the invention are based on super pixels, firstly, the structural characteristics of each frame of image in the panoramic video on an ERP plane are expressed by using the second derivative of each frame of image in the panoramic video, secondly, the fusion panoramic weight formed by the projection format based on the super pixels and the human eye perception is fused with the structural characteristics to obtain the panoramic structural characteristics of a single frame, and finally, the average value of the first t frames in the video is taken as the panoramic structural characteristics of a panoramic video sequence, and finally, the obtained panoramic structural characteristics are put into an SVR to establish a quality prediction model. Experiments are carried out on the disclosed panoramic video subjective quality evaluation database, and it is proved that the no-reference panoramic video quality evaluation method, system, terminal and medium provided by the embodiments of the invention can evaluate distortion conditions generated in transmission in practical application.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, and the like in the system, and those skilled in the art may implement the composition of the system by referring to the technical solution of the method, that is, the embodiment in the method may be understood as a preferred example for constructing the system, and will not be described herein again.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices provided by the present invention in purely computer readable program code means, the method steps can be fully programmed to implement the same functions by implementing the system and its various devices in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices thereof provided by the present invention can be regarded as a hardware component, and the devices included in the system and various devices thereof for realizing various functions can also be regarded as structures in the hardware component; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims (10)

1. A no-reference panoramic video evaluation method is characterized by comprising the following steps:
obtaining an ERP plane structural feature of a local binary pattern based on a gradient domain of a distorted image;
performing superpixel segmentation processing on the distorted image to respectively obtain a weight based on human eye perception of the superpixel and a weight based on projection relation of the superpixel;
extracting the structural characteristics of the single-frame image of the panoramic video according to the structural characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation;
and estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video.
2. The method for evaluating the non-reference panoramic video according to claim 1, wherein the obtaining of the ERP plane structural feature of the local binary pattern based on the gradient domain of the distorted image comprises:
calculating a gradient image of a single frame image of a distorted image in the panoramic video;
and on the basis of the gradient domain, encoding the pixels of the gradient image to obtain the ERP plane structural feature of the local binary pattern based on the gradient domain.
3. The method of claim 2, wherein the calculating the gradient image of the single frame image in the panoramic video comprises:
calculating image gradient by using a Prewitt operator, and expressing the gradient level of the distorted image by convolution of the single-frame image and the templates in two directions of the Prewitt operator, wherein the gradient image g (x, y) of the distorted image I (x, y) is expressed as:
Figure FDA0002986877250000011
in the formula, denotes a convolution operation; p is a radical ofxAnd pyTemplates representing the transverse and longitudinal directions for calculating the edges of the transverse and longitudinal directions, respectively;
the encoding of the pixels of the gradient image on the basis of the gradient domain comprises:
encoding the pixel points of the gradient image by using a uniform LBP operator with unchanged rotation to obtain the structural feature LBP of the local binary pattern based on the gradient domainP,R
Figure FDA0002986877250000012
Wherein P represents the number of elements around the pixel at the center point, R represents the selected radius of the surrounding pixels, and gcRepresenting the gradient amplitude, g, of the center pixeliRepresenting gradient amplitudes of surrounding pixel points; wherein:
Figure FDA0002986877250000021
Figure FDA0002986877250000022
where u is a uniform scale calculated bit by bit and represents the number of transitions of the binary sequence from 0 to 1 and from 1 to 0.
4. The method for evaluating the non-reference panoramic video according to claim 1, wherein the performing the super-pixel segmentation process on the distorted image comprises:
and gathering pixels in the distorted image by adopting a linear iterative clustering method, and further segmenting discrete pixels into super pixels consisting of a plurality of pixels.
5. The method for evaluating the non-reference panoramic video according to claim 1, wherein the obtaining the weight of the human eye perception based on the superpixel comprises the following steps:
let the size of the panoramic image be M × N, and the ordinate of the superpixel, i.e., the distance of the superpixel from the upper boundary of the panoramic image, be y1,N-y1The distance of the superpixel from the lower boundary of the panoramic image, the weight ω of the single superpixel1iComprises the following steps:
ω1i=min{d1,d2}
wherein:
Figure FDA0002986877250000023
weight ω of each super pixel1iDetermined by the closest of the superpixels to the boundary, and thus the human eye perception weight ω of each superpixel1Is defined as:
ω1=min{y1,N-y1,…yn,N-yn}
wherein n represents the number of pixels in the super-pixel; omega1The larger the distance, the closer the superpixel is to the equator of the panoramic image, otherwise, the farther the superpixel is from the equator;
the obtaining of the weight based on the projection relation of the super-pixel comprises:
let the coordinates of ERP plane and spherical surface be (x, y) and (y) respectively in continuous space
Figure FDA0002986877250000027
The transformation relation between the ERP plane and the spherical surface is as follows:
Figure FDA0002986877250000024
wherein theta is larger than theta, and belongs to (-pi, pi),
Figure FDA0002986877250000025
thus, the area stretch ratio SR is defined as:
Figure FDA0002986877250000026
the weight SR (i, j) of the digital image is defined as:
SR(i,j)=SR(x(i,j),y(i,j))
if mxn is the size of the ERP planar image, then { (i, j) |0< i ≦ M,0< j ≦ N }, then the transformation relationship between the continuous domain and the discrete domain is:
Figure FDA0002986877250000031
thus, the projective relationship is ultimately defined as:
Figure FDA0002986877250000032
will throw the powder inCombining the shadow relationship with the super-pixel segmentation graph obtained after the super-pixel segmentation processing, namely calculating the distance between all pixels in the super-pixels and the upper and lower boundaries of the panoramic image, and using the projection relationship corresponding to the pixel point with the minimum distance as the weight of the whole super-pixel, namely the weight omega of the single super-pixel in the panoramic projection2Comprises the following steps:
ω2=SR(i,dmin)
the omega2I.e. the weight of the projection relation based on the superpixel;
wherein d isminRepresenting the distance between all pixel points in each super pixel and the upper and lower boundaries of the panoramic image:
dmin=min{y1,N-y1,…yn,N-yn}。
6. the method for evaluating the non-reference panoramic video according to claim 1, wherein the extracting the structural features of the single-frame image of the panoramic video comprises:
weighting ω the human eye perception1And a weight ω of said projection relation2And performing fusion, wherein the obtained fusion weight is as follows:
ω=ω1·ω2
combining the obtained structural features of the ERP plane with the fusion weight omega, performing superposition statistics on the fusion weight with the same LBP code, and then normalizing to obtain the structural features PW (k) of the single-frame image of the panoramic video:
Figure FDA0002986877250000033
wherein:
Figure FDA0002986877250000034
in the formula, N is the number of pixels; k is the value condition of gradient domain LBP coding and represents the panoramic integration weight;
and carrying out down sampling on the distorted image for multiple times to obtain the structural features of the panorama on different scales.
7. The method of claim 1, wherein estimating the panoramic video quality score comprises:
obtaining the structural characteristics of the panoramic video single-frame image of the first t frames of the panoramic video and taking the average value of the structural characteristics, aiming at the distorted video sequence and the structural characteristics PW of the panorama of the distorted video sequencevideoComprises the following steps:
Figure FDA0002986877250000041
in the formula, PWiThe structural characteristics of a panoramic video single-frame image of the ith frame in the first t frames of the panoramic video are obtained;
structural feature PW of panorama to be obtainedvideoAnd inputting the data into an SVR model to obtain a quality score prediction model, and mapping the characteristics into a final panoramic video quality score.
8. A reference-free panoramic video evaluation system, comprising:
the local structural feature acquisition module is used for acquiring the ERP plane structural feature of the local binary pattern based on the gradient domain of the distorted image;
the weight acquisition module is used for carrying out superpixel segmentation processing on the distorted image and respectively acquiring the weight based on human eye perception of the superpixel and the weight based on the projection relation of the superpixel;
the panoramic structure characteristic acquisition module is used for extracting the structure characteristics of the single-frame image of the panoramic video according to the structure characteristics of the local binary pattern, the weight perceived by human eyes and the weight of the projection relation;
and the quality evaluation module is used for estimating the quality score of the panoramic video according to the structural characteristics of the single-frame image of the panoramic video.
9. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, is operative to perform the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 7.
CN202110302516.1A 2021-03-22 2021-03-22 No-reference panoramic video quality evaluation method, system, terminal and medium Pending CN113038123A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110302516.1A CN113038123A (en) 2021-03-22 2021-03-22 No-reference panoramic video quality evaluation method, system, terminal and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110302516.1A CN113038123A (en) 2021-03-22 2021-03-22 No-reference panoramic video quality evaluation method, system, terminal and medium

Publications (1)

Publication Number Publication Date
CN113038123A true CN113038123A (en) 2021-06-25

Family

ID=76472298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110302516.1A Pending CN113038123A (en) 2021-03-22 2021-03-22 No-reference panoramic video quality evaluation method, system, terminal and medium

Country Status (1)

Country Link
CN (1) CN113038123A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115424051A (en) * 2022-09-16 2022-12-02 中国矿业大学 Panoramic stitching image quality evaluation method
CN115423812A (en) * 2022-11-05 2022-12-02 松立控股集团股份有限公司 Panoramic monitoring planarization display method
CN117036154A (en) * 2023-08-17 2023-11-10 中国石油大学(华东) Panoramic video fixation point prediction method without head display and distortion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104079925A (en) * 2014-07-03 2014-10-01 中国传媒大学 Ultrahigh definition video image quality objective evaluation method based on visual perception characteristic
CN104915945A (en) * 2015-02-04 2015-09-16 中国人民解放军海军装备研究院信息工程技术研究所 Quality evaluation method without reference image based on regional mutual information
CN109740592A (en) * 2018-12-04 2019-05-10 上海大学 Based on the picture quality of memory without ginseng appraisal procedure
CN110046673A (en) * 2019-04-25 2019-07-23 上海大学 No reference tone mapping graph image quality evaluation method based on multi-feature fusion
CN111292336A (en) * 2020-01-21 2020-06-16 宁波大学 Omnidirectional image non-reference quality evaluation method based on segmented spherical projection format
CN111311595A (en) * 2020-03-16 2020-06-19 清华大学深圳国际研究生院 No-reference quality evaluation method for image quality and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104079925A (en) * 2014-07-03 2014-10-01 中国传媒大学 Ultrahigh definition video image quality objective evaluation method based on visual perception characteristic
CN104915945A (en) * 2015-02-04 2015-09-16 中国人民解放军海军装备研究院信息工程技术研究所 Quality evaluation method without reference image based on regional mutual information
CN109740592A (en) * 2018-12-04 2019-05-10 上海大学 Based on the picture quality of memory without ginseng appraisal procedure
CN110046673A (en) * 2019-04-25 2019-07-23 上海大学 No reference tone mapping graph image quality evaluation method based on multi-feature fusion
CN111292336A (en) * 2020-01-21 2020-06-16 宁波大学 Omnidirectional image non-reference quality evaluation method based on segmented spherical projection format
CN111311595A (en) * 2020-03-16 2020-06-19 清华大学深圳国际研究生院 No-reference quality evaluation method for image quality and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
夏雨蒙: "全景图像/视频质量评价方法研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115424051A (en) * 2022-09-16 2022-12-02 中国矿业大学 Panoramic stitching image quality evaluation method
CN115424051B (en) * 2022-09-16 2023-06-27 中国矿业大学 Panoramic stitching image quality evaluation method
CN115423812A (en) * 2022-11-05 2022-12-02 松立控股集团股份有限公司 Panoramic monitoring planarization display method
CN117036154A (en) * 2023-08-17 2023-11-10 中国石油大学(华东) Panoramic video fixation point prediction method without head display and distortion
CN117036154B (en) * 2023-08-17 2024-02-02 中国石油大学(华东) Panoramic video fixation point prediction method without head display and distortion

Similar Documents

Publication Publication Date Title
Gu et al. Multiscale natural scene statistical analysis for no-reference quality evaluation of DIBR-synthesized views
CN113038123A (en) No-reference panoramic video quality evaluation method, system, terminal and medium
US8908989B2 (en) Recursive conditional means image denoising
Zheng et al. Segmented spherical projection-based blind omnidirectional image quality assessment
Vitoria et al. Semantic image inpainting through improved wasserstein generative adversarial networks
Appina et al. Study of subjective quality and objective blind quality prediction of stereoscopic videos
CN112950596B (en) Tone mapping omnidirectional image quality evaluation method based on multiple areas and multiple levels
Tian et al. Quality assessment of DIBR-synthesized views: An overview
WO2021248966A1 (en) Point cloud quality assessment method, encoder, decoder, and storage medium
CN111105376B (en) Single-exposure high-dynamic-range image generation method based on double-branch neural network
Zhang et al. Sparse representation-based video quality assessment for synthesized 3D videos
CN112703532B (en) Image processing method, device, equipment and storage medium
CN111127298B (en) Panoramic image blind quality assessment method
Li et al. Predicting the quality of view synthesis with color-depth image fusion
CN114648482A (en) Quality evaluation method and system for three-dimensional panoramic image
Xu et al. EPES: Point cloud quality modeling using elastic potential energy similarity
Fan et al. Multiscale cross-connected dehazing network with scene depth fusion
Hovhannisyan et al. AED-Net: A single image dehazing
Lin et al. Recent advances and challenges of visual signal quality assessment
CN116912148B (en) Image enhancement method, device, computer equipment and computer readable storage medium
Zheng et al. Overwater image dehazing via cycle-consistent generative adversarial network
Yang et al. Latitude and binocular perception based blind stereoscopic omnidirectional image quality assessment for VR system
CN111369435B (en) Color image depth up-sampling method and system based on self-adaptive stable model
Qiu et al. Blind 360-degree image quality assessment via saliency-guided convolution neural network
CN112634278B (en) Super-pixel-based just noticeable distortion method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210625

RJ01 Rejection of invention patent application after publication