CN108389189B - Three-dimensional image quality evaluation method based on dictionary learning - Google Patents

Three-dimensional image quality evaluation method based on dictionary learning Download PDF

Info

Publication number
CN108389189B
CN108389189B CN201810126932.9A CN201810126932A CN108389189B CN 108389189 B CN108389189 B CN 108389189B CN 201810126932 A CN201810126932 A CN 201810126932A CN 108389189 B CN108389189 B CN 108389189B
Authority
CN
China
Prior art keywords
dictionary
image
salient
sift
distorted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810126932.9A
Other languages
Chinese (zh)
Other versions
CN108389189A (en
Inventor
李素梅
常永莉
韩旭
侯春萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201810126932.9A priority Critical patent/CN108389189B/en
Publication of CN108389189A publication Critical patent/CN108389189A/en
Application granted granted Critical
Publication of CN108389189B publication Critical patent/CN108389189B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Abstract

The invention belongs to the field of image processing, and aims to obtain a stereo image quality evaluation method consistent with human eye subjective feeling.A reference stereo image pair is used for training a scale invariant feature transformation SIFT dictionary and a salient dictionary, wherein the SIFT dictionary is used for carrying out SIFT transformation on the reference stereo image pair, expressing an image by SIFT features, and carrying out dictionary training by using a feature notation method and a Lagrange duality method to obtain an SIFT dictionary; processing the sparse coefficient matrix to obtain the quality Q1 of the corresponding distorted stereo image and the quality Q2 of the corresponding distorted stereo image; and finally, combining the Q1 and the Q2 to obtain a final quality score Q of the distorted stereo image pair. The invention is mainly applied to the image processing occasion.

Description

Three-dimensional image quality evaluation method based on dictionary learning
Technical Field
The invention belongs to the field of image processing, and relates to research of a stereo image quality evaluation method, and application of some characteristics and sparse codes in a human visual system and a sparse dictionary in stereo image objective evaluation. In particular to a stereo image quality evaluation method based on dictionary learning.
Background
Stereoscopic images/videos can bring the visual experience of the viewers as being personally on the scene, and therefore, the generation, processing, transmission, display and quality evaluation of the stereoscopic images/videos have become a hot research problem of the stereoscopic imaging technology. However, certain noise is inevitably generated in each link of the stereoscopic imaging technology, so that a viewer generates a visual discomfort phenomenon. It is important to design a systematic and effective evaluation method for stereo image/video quality to illustrate the correctness of each link of stereo imaging technology. The visual discomfort phenomenon generated by the stereoscopic image itself is mainly studied herein. The quality evaluation of the stereo image comprises a subjective evaluation method and an objective evaluation method. Subjective evaluation is closer to the real feeling of human, but is time-consuming and labor-consuming. The objective evaluation method is not influenced by subjective factors of people, and has the advantages of good stability, easy operability and the like.
At present, scholars at home and abroad have already proposed a plurality of algorithms for objectively evaluating the quality of a stereo image. Document [1] Cardoso et proposes a disparity weighting method for evaluating the quality of a stereoscopic image, which treats an absolute difference map of left and right views as a disparity map and its pixel values as weight adjustment factors for a reference and distorted stereoscopic image pair. Document [2] Khan proposes a statistical model based on natural stereo image pairs. It is described how a statistical model leads to an estimated quality of a stereo image, while studying the statistical correlation between luminance and disparity sub-bands that directly describe the structural content. Due to the limitation of the stereo matching algorithm, the quality of the stereo image is evaluated directly by adopting parallax information, and the evaluation accuracy is low. In this article only the disparity map of the reference stereo image pair is considered, avoiding the inaccuracy of evaluating the disparity map of the distorted stereo image pair.
Documents [3-5] show that the perceptive information of neural coding is obtained by a small fraction of activated neurons, and the idea suggests sparse coding of images. Document [6] proposes an objective evaluation method for stereo image quality based on sparse representation. The three-dimensional images of different frequency bands are trained respectively to obtain corresponding dictionaries, sparse representation is carried out on the test images by using the obtained dictionaries, and finally, sparse feature similarity is fused to obtain the quality scores of the three-dimensional images. Document [7] proposes a full-reference stereo image quality evaluation method. And (3) simulating the human eye characteristics to learn a multi-scale dictionary, and calculating the sparse feature similarity and the global brightness similarity by taking the sparse energy and the sparse complexity as the basis of binocular combination to obtain a quality score. Document [8] proposes a full-reference stereo image quality evaluation method based on sparseness. And finally, integrating the quality scores of the distorted three-dimensional images according to the structural and non-structural characteristics by using the sparse coefficient matrixes obtained by corresponding dictionary pairs. Both of the above two methods based on dictionaries use only the left viewpoint of a reference stereo image pair as data for training a dictionary, the stereo image is a fusion of left and right views into one image in the human brain, and if only the left viewpoint image is used, right viewpoint image information that a part of left viewpoint images do not have may be lost, so the trained dictionary may not contain complete stereo information. Document [9] studies on the human visual system show that humans tend to pay attention to certain regions when viewing images, which are called salient regions, and this characteristic is called visual saliency. Document [10] shows that the human eye tends to pay attention to the central region of the image when viewing the image. At present, no document combines visual saliency features with a sparse dictionary to evaluate the quality of a stereo image. Therefore, the absolute difference map and some salient features are used to extract the saliency map of the stereoscopic image in combination with visual saliency and taking into account the depth information (including both left viewpoint information and right viewpoint information) of the stereoscopic image. In order to be more accordant with the visual characteristics of human eyes, the model optimizes the central deviation and the fovea characteristics to obtain a final remarkable stereo image as input data of a training dictionary.
Disclosure of Invention
In order to overcome the defects of the prior art and obtain a stereo image quality evaluation method consistent with human eye subjective feeling, the invention adopts the technical scheme that two dictionaries are trained by utilizing a reference stereo image pair based on a stereo image quality evaluation method of dictionary learning, a Scale-invariant Feature transform (SIFT) dictionary and a significant dictionary are transformed by utilizing the reference stereo image pair, the SIFT dictionary specifically carries out SIFT transformation on the reference stereo image pair, images are represented by SIFT features, and dictionary training is carried out by utilizing a Feature-sign method (Feature-sign) and a Lagrange duality method to obtain an SIFT dictionary; the salient dictionary is specifically that an initial stereoscopic vision salient image is obtained by combining an absolute difference image, the initial stereoscopic vision salient image is optimized by adopting the central offset and fovea characteristics conforming to a human visual mechanism to obtain a final salient image, the salient image of a reference stereoscopic image pair is extracted, and then the front m salient blocks of each salient image are selected as source data for training the salient dictionary to obtain the salient dictionary by using n multiplied by n overlapped blocks according to the size value of the variance; in the testing stage, after SIFT transformation is carried out on the reference and distorted stereo image pairs on one hand, sparse coding is carried out by combining a trained SIFT dictionary to obtain sparse coefficient matrixes of the reference and distorted stereo image pairs, and the sparse coefficient matrixes are processed to obtain the quality Q1 of the corresponding distorted stereo image; on the other hand, saliency processing is carried out on the reference and distorted stereo image pair to obtain a reference and distorted stereo salient image, all salient blocks which are not overlapped by n x n are used as input data to be combined with a salient dictionary to obtain a sparse coefficient matrix of the corresponding reference and distorted stereo image pair, and the sparse coefficient matrix is processed to obtain the quality Q2 of the corresponding distorted stereo image; and finally, combining the Q1 and the Q2 to obtain a final quality score Q of the distorted stereo image pair.
The quality of the stereo salient image is adopted to reflect the quality of the stereo image, the absolute difference image of the left view and the right view is adopted to represent depth information, firstly, the brightness, the chroma and the texture contrast characteristics of the image are extracted to represent the salient information, the absolute difference image is combined to obtain an initial characteristic salient image, and then, the initial characteristic salient image is optimized by utilizing the characteristics of central deviation and central concavity to obtain a final stereo salient image.
Further:
(1) center offset
An anisotropic gaussian kernel is used to model the factor cb (center bias) of central shift of attention from center to periphery:
Figure GDA0001704589320000021
CB (x, y) represents pixel point (x, y) to center point (x)0,y0) Offset information of (a), (b), (c), and (d)0,y0) Coordinates of the center point representing the distorted right viewpoint, (x, y) are coordinates of pixel points, sigmahAnd σvRespectively representing the standard deviation of the image in the horizontal direction and the vertical direction, taking sigmah=1/3W,σv1/3H, where W and H represent the number of horizontal and vertical pixels of the image;
(2) central concave
The characteristic simulation is carried out by adopting a function shown in formula (2):
Figure GDA0001704589320000022
wherein e (x, y) represents the retinal departure of the pixel point (x, y)Heart degree in degrees; f is spatial frequency, which is in cycles/degree; c0Is a contrast threshold; delta is a spatial frequency attenuation parameter; e.g. of the type1Is a half resolution centrifugation constant;
the method comprises the following steps of searching a gray value which enables the difference between the foreground and the background of the three-dimensional saliency map to be maximum by using a maximum inter-class variance method, wherein the gray value is an optimal threshold value, dividing the three-dimensional saliency map into a saliency region and a non-saliency region by using the threshold value, further calculating the retina centrifugation degree e (x, y) by using the spatial distance relation between a pixel and a saliency pixel closest to the pixel, and if any pixel coordinate is (x, y) and a saliency pixel coordinate closest to the pixel coordinate is (x1, y1), calculating the centrifugation degree e (x, y) by using a formula (3):
Figure GDA0001704589320000031
wherein, W is the number of horizontal pixels of the image, v is the viewing distance, and the euclidean distance between the pixel point (x, y) and the pixel point (x1, y 1):
Figure GDA0001704589320000032
the dictionary training comprises the following specific steps: fixing one of the dictionary and the sparse coefficient matrix to solve the other, and then carrying out iteration to obtain a proper dictionary and sparse coefficient matrix:
Figure GDA0001704589320000033
Figure GDA0001704589320000034
in equations (4) and (5), X is the input signal, B is the complete dictionary, and S is the sparse coefficient matrix. I | · | purple windFIs the F-norm, λ is the regularization parameter, | · | | | luminance1Is a1Norm, BiRepresenting the ith column of atoms in the dictionary, formula (4) is an L1 regularized optimization problem, and typical optimization methods comprise a least angle regression method (LARS) and a characteristic symbolThe Feature-sign method changes the undifferentiable concave problem into an unconstrained quadratic optimization problem qp (quadratic optimization) by guessing the sign of a sparse coefficient in each iteration step; formula (5) is a typical quadratic constraint least square optimization problem, and the optimization methods include a QCQP convex optimization method, an iterative gradient descent method and a Lagrange multiplier method.
In one example, the significant dictionary training step specifically includes processing an image by using 8 × 8 overlapped windows to obtain a plurality of 8 × 8 small image blocks, extracting the first 3000 8 × 8 significant blocks of each significant image by using variance information, vectorizing all the significant blocks and forming a matrix as input of dictionary training, changing each significant block into a column vector, performing iterative training according to feature-sign algorithm and lagrangian dual method to obtain a proper sparse significant dictionary, and selecting the size of the significant dictionary to be 64 × 128 because the size of the small image block is 8 × 8.
Specifically, in the training stage of the SIFT dictionary, a 16 × 16 neighborhood is taken as a sampling window by taking a feature point as a center, the relative directions of the sampling point and the feature point are classified into a direction histogram containing 8 bins after gaussian weighting, gradient histograms in 8 directions are calculated on 4 × 4 small blocks to form a seed point, 16 seed points of 4 × 4 are used for describing each key point, so that a 128-dimensional SIFT feature vector of 4 × 4 × 8 is obtained for one key point, and 128 × 1024 is selected according to the size of the SIFT dictionary in the feature dimension text.
After a significant dictionary and a Scale Invariant Feature Transform (SIFT) dictionary are obtained in a dictionary training stage, testing a distorted image by using the obtained dictionary, preprocessing a reference image and a test image, then directly performing sparse coding by using the corresponding dictionary obtained in the training stage, obtaining sparse coefficient matrixes of the reference stereo image pair and the distorted stereo image pair by using an Orthogonal Matching Pursuit (OMP) algorithm, selecting sparse changes of the images as the basis of stereo image quality evaluation, and respectively combining the sparse matrixes of the reference stereo image pair and the distorted stereo image pair of the two channels to obtain stereo image quality scores Q of the two channels1And Q2(ii) a Most preferablyAnd combining the quality scores of the two channels to obtain the quality score Q of the final distorted stereo image.
Specifically acquiring a quality evaluation Q value:
the quality of the distorted image is evaluated from a sparse matrix of the distorted image,
Figure GDA0001704589320000041
respectively representing a reference and a distortion stereo pair, and combining the reference and the distortion stereo pair with an SIFT dictionary for sparse coding after SIFT preprocessing;
Figure GDA0001704589320000042
respectively representing sparse coefficient matrixes for performing sparse coding on the reference stereo pair and the distortion stereo pair which are combined with a significance dictionary after significance preprocessing; o represents a reference image, d represents a distorted image, l represents a left image, r represents a right image, 1 represents a SIFT dictionary test channel, and 2 represents a salient dictionary test channel; the left view quality for the test of the SIFT dictionary channel is obtained from equations (6) (7) (8) (9):
Figure GDA0001704589320000043
Figure GDA0001704589320000044
Figure GDA0001704589320000045
Figure GDA00017045893200000414
s (i, j) represents a sparse coefficient similarity index, and the closer the value of S (i, j) is to 1, the smaller the distortion degree of the distorted image is,
Figure GDA0001704589320000046
and
Figure GDA00017045893200000413
respectively representing values at the reference left image sparse matrix and the distorted left image sparse matrix (i, j), t, c being constants, M representing the number of rows of the sparse matrix, N representing the number of columns of sparse coefficients, where α + β is 1; similarly, the right view quality of the SIFT dictionary test channel is obtained
Figure GDA0001704589320000047
For the left and right test stereo image quality scores of the salient dictionary test channel, the same formulas (6), (7), (8) and (9) are used to respectively obtain
Figure GDA0001704589320000048
And
Figure GDA0001704589320000049
combining the quality scores of the left view and the right view by using the weight geometry to obtain the quality score Q of the stereo image of the SIFT dictionary testing channel1And stereo image quality score Q of salient dictionary test channel2
Figure GDA00017045893200000410
The mean square value of the distorted stereo image pair representing the SIFT dictionary test channel and the significant dictionary test channel respectively to the sparse matrix of the left and right distorted stereo images is obtained by the following formulas (10) and (11) for solving the weight:
Figure GDA00017045893200000411
Figure GDA00017045893200000412
according to the obtained weight and the formula (12), the mass fraction Q of the two channels is obtained1And Q2
Figure GDA0001704589320000051
Finally, the quality fraction Q for the distorted stereoscopic image is Q1And Q2In combination, as in formula (13)
Figure GDA0001704589320000052
The invention has the characteristics and beneficial effects that:
the method is very effective for most of evaluation effects of different distortion types in two public LIVE libraries, the correlation between the evaluation result and the subjective evaluation result is strong, the goodness of fit between the obtained data and the subjective data is good, the quality of the stereo image can be well reflected, and the subjective feeling of human eyes is met.
Description of the drawings:
FIG. 1 is a flow chart of the algorithm herein. In the figure, (a) Dictionary Learning Stage; (b) and a Test Stage.
Fig. 2 is a process for acquiring a stereoscopic saliency map.
Detailed Description
In order to obtain a stereo image quality evaluation method consistent with human eye subjective feeling. The invention provides a stereo image quality evaluation method based on SIFT feature and saliency sparse dictionary learning. Two dictionaries, the SIFT dictionary and the salient dictionary, are trained using a reference stereo pair of images. The SIFT dictionary is used for carrying out SIFT transformation on the reference stereo image pair, representing the image by using SIFT characteristics and carrying out dictionary training by using Feature-sign and Lagrangian dual method to obtain the SIFT dictionary; and the saliency dictionary is used for obtaining an initial stereoscopic saliency map by combining the absolute difference map, and optimizing the initial stereoscopic saliency map by adopting the central offset and fovea characteristics conforming to the human visual mechanism to obtain a final saliency map. And extracting the saliency maps of the reference stereo image pair, and then selecting the first 3000 saliency blocks of each saliency map as source data for training the saliency dictionary to obtain the saliency dictionary by using 8 multiplied by 8 overlapped blocks according to the magnitude value of the variance. The algorithm is tested on two public LIVE libraries, and the PLCC value of the LIVEI reaches 0.9429, and the SROCC value reaches 0.9383; PLCC values on LIVE II reached 0.9116 and SROCC values reached 0.9036. The experimental result shows that the evaluation result of the algorithm has better correlation with the subjective evaluation result and is more in line with the perception of the human visual system.
The invention provides a three-dimensional image quality evaluation method based on dictionary learning, which comprises two stages: a dictionary training phase and a testing phase. In the dictionary training stage, two dictionaries are trained, one is a Scale-invariant feature transform (SIFT) dictionary, and the other is a salient dictionary. And the SIFT dictionary is obtained by carrying out SIFT on the reference stereo image pair, representing the image by using SIFT descriptor characteristics as input data of dictionary training, and carrying out dictionary training by using Feature-sign and Lagrange dual method. The salient dictionary is used for firstly extracting salient images of the reference stereo image pair and optimizing by utilizing the fovea centralis and the central deviation to obtain a final salient image; then, the first 3000 significant blocks of each significant image are selected as source data for training a significant dictionary by using 8 multiplied by 8 overlapped blocks according to the magnitude value of the variance, and a characteristic-sign method (Feature-sign) and a Lagrange dual method are used[11]And performing dictionary training to obtain the significant dictionary. In the testing stage, after SIFT transformation is carried out on the reference and distorted stereo image pairs on one hand, sparse coding is carried out by combining a trained SIFT dictionary to obtain sparse coefficient matrixes of the reference and distorted stereo image pairs, and the sparse coefficient matrixes are processed to obtain the quality Q1 of the corresponding distorted stereo image; and on the other hand, the saliency processing is carried out on the reference and distorted stereo image pair to obtain a reference and distorted stereo salient image, all salient blocks which are not overlapped by 8 multiplied by 8 are used as input data and combined with the salient dictionary to obtain a sparse coefficient matrix of the corresponding reference and distorted stereo image pair, and the sparse coefficient matrix is processed to obtain the quality Q2 of the corresponding distorted stereo image. And finally, combining the quality scores obtained by the two channels to obtain the final quality score Q of the distorted stereo image pair.
As shown in fig. 1, the flow chart of the sparse representation stereo image quality evaluation algorithm is divided into two stages: a dictionary learning phase and a testing phase.
The individual steps will be analyzed in detail below:
1. three-dimensional significance detection model
Stereoscopic images have a larger amount of information than single-viewpoint images, it is impossible for human eyes to match all feature edges in a short time, most people pay attention to only those "important areas", then extract the boundaries of objects in these areas, and finally match these boundaries to form stereoscopic vision. According to the stereoscopic vision attention characteristic of human eyes, the observer can pay more attention to the content of the salient region in the image[12]Thus, the quality of the stereoscopic saliency map is employed herein to reflect the quality of the stereoscopic images. The depth information needs to be considered for stereo images compared to planar images, so the algorithm uses an absolute difference map of left and right views to represent the depth information. Firstly, the brightness, the chroma and the texture contrast characteristics of the image are extracted to represent the salient information, and an initial characteristic salient map is obtained by combining an absolute difference map. And finally, optimizing the three-dimensional saliency map by using characteristics of central deviation, central concavity and the like to obtain a final three-dimensional saliency map.
(1) Center offset
By Center Bias (CB) characteristic, the human eye always tends to look for the visual fixation point from the Center of the image when viewing the image, and then the attention of the human eye decreases from the Center to the periphery[10]. Using anisotropic Gaussian kernel functions in this context[13]To simulate the central shift (CB) factor of the spread of attention from the center to the periphery:
Figure GDA0001704589320000061
CB (x, y) represents pixel point (x, y) to center point (x)0,y0) The offset information of (1). (x)0,y0) Coordinates of the center point representing the distorted right viewpoint, (x, y) are coordinates of pixel points, sigmahAnd σvRespectively representing the standard deviation of the horizontal and vertical directions of the image, taken here
σh=1/3W,σv1/3H, where W and H represent the number of horizontal and vertical pixels of the image.
(2) Central concave
It is well known that the density of retinal photoreceptors decreases rapidly from the fovea to the periphery[10][14]. Therefore, when the image is mapped on the retina, the spatial frequency resolution of the human eye to the foveal region is higher, and the partial region is the region which is focused by the human eye and can better distinguish details, namely the salient region; as the eccentricity e increases, the spatial resolution of the human eye decreases. Studies have shown that the Contrast Sensitivity Function (CSF) can be expressed as a Function of the eccentricity e[14]. This property is modeled herein by a function shown in equation (2):
Figure GDA0001704589320000062
wherein e (x, y) represents the retinal centrifugation degree of the pixel point (x, y) and the unit is degree (degree); f is the spatial frequency, which is in cycles/degree; c0Is a contrast threshold; delta is a spatial frequency attenuation parameter; e.g. of the type1Is the half resolution centrifugation constant. According to the fitting result of the experiment[15]Taking δ to be 0.106, e1=2.3,C0=1/64。
And searching the gray value which enables the difference between the foreground and the background of the three-dimensional saliency map to be maximum by using a maximum inter-class variance method, wherein the gray value is the optimal threshold value. The threshold is used to divide the stereoscopic saliency map into salient regions and non-salient regions. Further, the retinal decentration e (x, y) is calculated using the spatial distance relationship between the pixel and the salient pixel closest thereto. Let any pixel coordinate be (x, y), and the nearest significant pixel coordinate be (x)1,y1). The degree of centrifugation e (x, y) is then determined by equation (3):
Figure GDA0001704589320000071
wherein, W is the horizontal pixel number of the image, v is the viewing distance, the text value is 5 (the unit is the image width), d is
Pixel point (x, y) and pixel point (x)1,y1) Euclidean distance of (c):
Figure GDA0001704589320000072
2 dictionary learning
The purpose of sparse representation is to approximate an input signal vector with a weighted linear combination of a small number of "base atoms". For dictionary training, if the dictionary B and the sparse coefficient S are solved simultaneously, the optimization problem is non-convex, but if one of the dictionary B and the sparse coefficient S is solved, the convex optimization problem is solved, so that one of the dictionary B and the sparse coefficient S is usually fixed to solve the other one during dictionary training, and then iteration is carried out to obtain a proper dictionary and sparse coefficient matrix.
Figure GDA0001704589320000073
Figure GDA0001704589320000074
In equations (4) and (5), X is the input signal, B is the complete dictionary, and S is the sparse coefficient matrix. I | · | purple windFIs the F-norm, λ is the regularization parameter, | · | | | luminance1Is a1Norm, BiRepresenting the ith column of atoms in the dictionary. Equation (4) is a L1 regularized optimization problem, and a typical optimization method is a least angle regression method (LARS)[16]Feature-symbol search method (Feature-sign search Algorithm)[11]And the like. In each iteration step, the Feature-sign method changes the undifferentiable concave problem into an unconstrained Quadratic Optimization (QP) problem by guessing the sign of the sparse coefficient, so that the operation speed is improved, and the accuracy of solving the sparse coefficient in a redundant dictionary is also improved. Formula (5) is a typical quadratic constraint least square optimization problem, and typical optimization methods include a QCQP convex optimization method and an iterative gradient descent method[17]Lagrange multiplier method (laggrangedual)[11]. The iterative gradient descent method has the disadvantages of slow convergence speed and large time consumption, and the Lagrange dual method is an effective method based on the gradient descent method.
Sparse coding is carried out according to the idea of feature-sign algorithm, and dictionary learning is carried out by adopting a Lagrange dual method. The training samples are diverse and universal, and therefore all reference stereo pairs in the LIVE library are used as training samples herein. Therefore, the dictionary can automatically learn the depth information of the stereo image, so that the result of the algorithm is more suitable for subjective feeling of human eyes, and the trained dictionary is more accurate.
2.1 salient dictionary
As shown in figure 1(a) during the learning phase of the dictionary.
After processing the stereo images to obtain stereo saliency maps according to the flowchart of fig. 2, a saliency dictionary needs to be obtained by training the saliency maps. The significance map obtained by the method shows that some regions which are not significant have small influence on the stereo image quality, so that part of training data can be ignored during dictionary training, the speed of dictionary training is improved, and the quality of the significance dictionary obtained by training can be guaranteed. Therefore, consideration needs to be given to how to exclude some insignificant areas. Document [6] indicates that the larger the variance of an image, the more abundant the structural information it has. Processing the images by using 8 × 8 overlapped windows to obtain a plurality of 8 × 8 small image blocks, extracting the first 3000 8 × 8 significant blocks of each significant image by using variance information, vectorizing all the significant blocks and forming a matrix as input of dictionary training, changing each significant block into a column vector, and performing iterative training according to feature-sign algorithm and Lagrange dual method to obtain a proper sparse significant dictionary. Since the size of the small image blocks is 8 × 8, the size of the salient dictionary herein is chosen to be 64 × 128.
2.2 SIFT dictionary
As shown in figure 1(a) during the learning phase of the dictionary.
SIFT, which is a description used in the field of image processing. The description has scale invariance, can detect key points in the image and is a local feature descriptor. Taking a neighborhood of 16 multiplied by 16 as a sampling window by taking the feature point as the center, and classifying the relative directions of the sampling point and the feature point into a direction histogram containing 8 bins after Gaussian weighting. Gradient histograms of 8 directions are calculated on every 4 × 4 small block to form a seed point, and 16 seed points of 4 × 4 are used for describing each key point, so that a 128-dimensional SIFT feature vector of 4 × 4 × 8 is obtained for one key point. The size of the text SIFT dictionary is selected to be 128 × 1024 according to the feature dimension.
2.3. Coefficient of sparseness
Fig. 1(a) after a significant dictionary and a SIFT dictionary are obtained in a dictionary training stage, a distorted image is tested using the obtained dictionaries. As shown in fig. 1(b), after preprocessing the reference image and the test image, the corresponding dictionary obtained in the training stage is directly utilized for sparse coding, and OMP is used[18]The algorithm obtains a sparse coefficient matrix for the reference stereo image pair and the distorted stereo image pair. For a given image, the sparse coding of neurons is different for images of different distortion types. Therefore, the sparsity variation of the image is selected as a basis for the evaluation of the stereoscopic image quality. Respectively combining the sparse matrixes of the reference stereo image pair and the distorted stereo image pair of the two channels to obtain the stereo image quality scores Q of the two channels1And Q2(ii) a And finally, combining the quality scores of the two channels to obtain the quality score Q of the final distorted stereo image.
2.4. Quality evaluation Q
And evaluating the quality of the distorted image according to the sparse matrix of the distorted image.
Figure GDA0001704589320000081
Respectively representing a reference and a distortion stereo pair, and combining the reference and the distortion stereo pair with an SIFT dictionary for sparse coding after SIFT preprocessing;
Figure GDA0001704589320000082
respectively representing sparse coefficient matrixes for performing sparse coding on the reference stereo pair and the distortion stereo pair which are combined with a significance dictionary after significance preprocessing; o denotes a reference image, d denotes a distorted image, lRepresenting the left image, r the right image, 1 the SIFT dictionary test channel, and 2 the salient dictionary test channel.
The left view quality for the test of the SIFT dictionary channel is obtained from equations (6) (7) (8) (9):
Figure GDA0001704589320000083
Figure GDA0001704589320000084
Figure GDA0001704589320000085
Figure GDA0001704589320000091
s (i, j) represents a sparse coefficient similarity index, and the closer the value of S (i, j) is to 1, the smaller the distortion degree of the distorted image is.
Figure GDA0001704589320000092
And
Figure GDA0001704589320000093
respectively representing the values at the reference left image sparse matrix and the distorted left image sparse matrix (i, j), t, c being constants. M denotes the number of rows of the sparse matrix and N denotes the number of columns of sparse coefficients, where α + β is 1; similarly, the right view quality of the SIFT dictionary test channel is obtained
Figure GDA0001704589320000094
For the left and right test stereo image quality scores of the salient dictionary test channel, the same formulas (6), (7), (8) and (9) are used to respectively obtain
Figure GDA0001704589320000095
And
Figure GDA0001704589320000096
combining the quality scores of the left view and the right view by using the weight geometry to obtain the quality score Q of the stereo image of the SIFT dictionary testing channel1And stereo image quality score Q of salient dictionary test channel2
Figure GDA0001704589320000097
And respectively representing the mean square value of the distorted stereo image of the SIFT dictionary testing channel and the salient dictionary testing channel to the sparse matrix of the left distorted stereo image and the right distorted stereo image. The solution to the weights is given by equations (10) (11):
Figure GDA0001704589320000098
Figure GDA0001704589320000099
according to the obtained weight and the formula (12), the mass fraction Q of the two channels is obtained1And Q2
Figure GDA00017045893200000910
Finally, the quality fraction Q for the distorted stereoscopic image is Q1And Q2In combination, as in formula (13)
Figure GDA00017045893200000911
Tables 1 and 2 are data results for different indices obtained for different distortion types by the algorithm and document [7] herein. Table 3 gives the overall performance comparison results for the different algorithms. The data bolded in the table is the best data among all data. As can be seen from the data in tables 1 and 2, the evaluation effect of the algorithm on different distortion types in two public LIVE libraries is most effective, and the PLCC and SROCC values in the tables are most over 0.9, which shows that the correlation between the evaluation result of the method and the subjective evaluation result is strong. Although the algorithms herein generally evaluate the effect on individual distortion types, the algorithms herein are also very effective in general. As can be seen from table 3, the experimental results of the algorithm are more effective than those of other methods according to the comparison between different algorithms, so the goodness of fit between the data obtained by the algorithm and the subjective data is better. Therefore, the method provided by the invention can well reflect the quality of the stereo image and is in line with the subjective feeling of human eyes.
TABLE 1 LIVE I database Performance comparison of two different methods
Figure GDA00017045893200000912
TABLE 2 LIVE II database Performance comparison of two different methods
Figure GDA0001704589320000101
TABLE 3 Overall Performance comparison of different evaluation methods
Figure GDA0001704589320000102
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
1. Three-dimensional significance detection model
Stereoscopic images have a larger amount of information than single-viewpoint images, it is impossible for human eyes to match all feature edges in a short time, most people pay attention to only those "important areas", then extract the boundaries of objects in these areas, and finally match these boundaries to form stereoscopic vision. According to the stereoscopic vision attention characteristic of human eyes, the observer can pay more attention to the content of the salient region in the image[12]Thus, the quality of the stereoscopic saliency map is employed herein to reflect the quality of the stereoscopic images. The depth information needs to be considered for stereo images compared to planar images, so the algorithm uses an absolute difference map of left and right views to represent the depth information. Firstly, the brightness, the chroma and the texture contrast characteristics of the image are extracted to represent the salient information, and an initial characteristic salient map is obtained by combining an absolute difference map. And finally, optimizing the three-dimensional saliency map by using characteristics of central deviation, central concavity and the like to obtain a final three-dimensional saliency map.
2. Dictionary learning
2.1 salient dictionary
As shown in figure 1(a) during the learning phase of the dictionary.
After processing the stereo images to obtain stereo saliency maps according to the flowchart of fig. 2, a saliency dictionary needs to be obtained by training the saliency maps. The significance map obtained by the method shows that some regions which are not significant have small influence on the stereo image quality, so that part of training data can be ignored during dictionary training, the speed of dictionary training is improved, and the quality of the significance dictionary obtained by training can be guaranteed. Therefore, consideration needs to be given to how to exclude some insignificant areas. Document [6] indicates that the larger the variance of an image, the more abundant the structural information it has. Processing the images by using 8 × 8 overlapped windows to obtain a plurality of 8 × 8 small image blocks, extracting the first 3000 8 × 8 significant blocks of each significant image by using variance information, vectorizing all the significant blocks and forming a matrix as input of dictionary training, changing each significant block into a column vector, and performing iterative training according to feature-sign algorithm and Lagrange dual method to obtain a proper sparse significant dictionary. Since the size of the small image blocks is 8 × 8, the size of the salient dictionary herein is chosen to be 64 × 128.
2.2 SIFT dictionary
As shown in figure 1(a) during the learning phase of the dictionary.
SIFT, which is a description used in the field of image processing. The description has scale invariance, can detect key points in the image and is a local feature descriptor. Taking a neighborhood of 16 multiplied by 16 as a sampling window by taking the feature point as the center, and classifying the relative directions of the sampling point and the feature point into a direction histogram containing 8 bins after Gaussian weighting. Gradient histograms of 8 directions are calculated on every 4 × 4 small block to form a seed point, and 16 seed points of 4 × 4 are used for describing each key point, so that a 128-dimensional SIFT feature vector of 4 × 4 × 8 is obtained for one key point. The size of the text SIFT dictionary is selected to be 128 × 1024 according to the feature dimension.
3. Coefficient of sparseness
Fig. 1(a) after a significant dictionary and a SIFT dictionary are obtained in a dictionary training stage, a distorted image is tested using the obtained dictionaries. As shown in fig. 1(b), after preprocessing the reference image and the test image, the corresponding dictionary obtained in the training stage is directly utilized for sparse coding, and OMP is used[18]The algorithm obtains a sparse coefficient matrix for the reference stereo image pair and the distorted stereo image pair. For a given image, the sparse coding of neurons is different for images of different distortion types. Therefore, the sparsity variation of the image is selected as a basis for the evaluation of the stereoscopic image quality. Respectively combining the sparse matrixes of the reference stereo image pair and the distorted stereo image pair of the two channels to obtain the stereo image quality scores Q of the two channels1And Q2(ii) a And finally, combining the quality scores of the two channels to obtain the quality score Q of the final distorted stereo image.
4 results of the experiment
4.1 test database and Performance indicators
The test database herein uses two public LIVE databases. The LIVEI database contains 365 symmetrically distorted stereo image pairs and 20 original stereo image pairs, with five distortion types, JPEG, JP2K, Gbillur, WN, and FF.
The LIVE II database contains 360 symmetric and asymmetric distorted stereo image pairs, 8 original stereo image pairs, and the distortion types include JPEG, JP2K, Gblur, WN, and FF.
Two evaluation indexes, PLCC (personnel area correlation) and SROCC (statistical rank order correlation) are used herein to evaluate the performance of the proposed model. The closer the values of PLCC and SROCC are to 1, the better the correlation between the objective evaluation method and the subjective evaluation method is. Because the subjective DMOS value is different from the interval of the objective algorithm quality fraction, the correlation coefficient needs to be subjected to nonlinear logic mapping when being calculated, and five-parameter fitting is adopted to obtain the subjective predicted value of the three-dimensional image quality of the model.
4.2 Performance comparison
Tables 1 and 2 are data results for different indices obtained for different distortion types by the algorithm and document [7] herein. Table 3 gives the overall performance comparison results for the different algorithms. The data bolded in the table is the best data among all data. As can be seen from the data in tables 1 and 2, the evaluation effect of the algorithm on different distortion types in two public LIVE libraries is most effective, and the PLCC and SROCC values in the tables are most over 0.9, which shows that the correlation between the evaluation result of the method and the subjective evaluation result is strong. Although the algorithms herein generally evaluate the effect on individual distortion types, the algorithms herein are also very effective in general. As can be seen from table 3, the experimental results of the algorithm are more effective than those of other methods according to the comparison between different algorithms, so the goodness of fit between the data obtained by the algorithm and the subjective data is better. Therefore, the method provided by the invention can well reflect the quality of the stereo image and is in line with the subjective feeling of human eyes.
5. Conclusion
The three-dimensional image quality evaluation method based on the sparse dictionary is provided, two dictionaries are trained, and finally the quality of two channels is combined to obtain a final distortion three-dimensional image quality score. In order to ensure the completeness of the dictionary, the original reference stereo image pair is used as a data source for dictionary training, and the salient dictionary and the SIFT dictionary are obtained through image preprocessing training. And adding the visual saliency into the stereo image quality evaluation, simultaneously optimizing by using the characteristics of the fovea centralis and the central deviation to obtain a final saliency map, and further training to obtain a saliency dictionary. The experimental result shows that the performance of the algorithm is good, and compared with the subjective evaluation value, the model prediction result is accurate.
Reference to the literature
[1]J.V.de Miranda Cardoso,C.D.M.Regis,and M.S.de Alencar,“Disparity weighting applied to full-reference and no reference stereoscopic image quality assessment,”in 2015IEEE International Conference on Consumer Electronics(ICCE),pp.477-480,Jan 2015.
[2]S.K.Md,B.Appina,and S.S.Channappayya,“Full-reference stereo image quality assessment using natural stereo scene statistics,”IEEE Signal Processing Letters,vol.22,pp.1985-1989,Nov 2015.
[3]B.A.Olshausen et al.,“Emergence of simple-cell receptive field properties by learning a sparse code for natural images,”Nature,vol.381,no.6583,pp.607-609,1996.
[4]B.A.Olshausen and D.J.Field,“Sparse coding of sensory inputs,”Current opinion in neurobiology,vol.14,no.4,pp.481-487,2004.
[5]P.Reinagel,“How do visual neurons respond in the real world?,”Current opinion in Neurobiology,vol.11,no.4,pp.437-442,2001.
[6]Li Kemeng,Shao Feng et al.The Method of Evaluating the Objective Quality of Stereoscopic Images Based on Sparse Representation[J].Journal of Optoelectronics laser,2014,25(11):
2227-2233.
Lekemeng, Shaofeng, etc. stereoscopic image objective quality evaluation method based on sparse representation [ J ]. photoelectron laser, 2014,25(11):2227-2233.
[7]F.Shao,K.Li,W.Lin,G.Jiang,M.Yu,and Q.Dai,“Full-referencequality assessment of stereoscopic images by learning binocular receptive field properties,”IEEE Transactions on Image Processing,vol.24,pp.2971–2983,Oct 2015.
[8]S.K.Md and S.S.Channappayya,“Sparsity Based Stereoscopic Image Quality Assessment,”2016 50th Asilomar Conference on Signals,Systemsand Computers,pp.1858-1862,2016.
[9]Tsotsos J K,Culhane S M,Wai W Y K,et al.Modelling Visual Attention via Selective Tuning.Artificial Intelligence,Oct.1995,78(1):507-545.
[10]Tseng P,Carmi R,Camerson I G M,et al.Quantifying center bias of observers in free viewing of dynamic natural scenes[J].Journal of Vision,2009,9(7):1-16.
[11]Lee H,Battle A,Raina R,et al.Efficient sparse coding algorithms[J].Advances in neural information processing systems,2007,19:801.
[12]WANG F.Visual saliency detection based on context and background.Dalian University of Technology,2013
(Wangfei. visual saliency detection based on context and background. university of great graduate, 2013)
[13]Le Meur O.,Le Callet,P.,Barba,et al.A coherent computational approach to model bottom-up visual attention[J].Pattern Analysis and Machine Intelligence,IEEE Transactions on,2006,28(5):802-817.
[14]ZhouWang,Ligang Lu,Alan Conrad Bovik.Foveation Scalable Video Coding With Automatic Fixation Selection[J].IEEE TRANSACTIONS ON IMAGE PROCESSING,FEBRUARY 2003,12(2):243-254.
[15]W.S.Geisler and J.S.Perry,A real-time foveated multiresolutionsystem for low-bandwidth video communication,Proc.SPIE,vol.3299,pp.294–305,July 1998.
[16]Efron B,Hastie T,Johnstone I,et al.Least angle regression[J].The Annals of statistics,2004,32(2):407-499.
[17]Censor Y,Zenios S A.Parallel optimization:Theory,algorithms,and applications[M].Oxford University Press on Demand,1997.
[18]R.Rubinstein,M.Zibulevsky,and M.Elad,“Efficient implementationof the k-svd algorithm using batch orthogonal matching pursuit,”CSTechnion,vol.40,no.8,pp.1–15,2008.

Claims (8)

1. A three-dimensional image quality evaluation method based on dictionary learning is characterized in that two dictionaries are trained by utilizing a reference three-dimensional image pair, a Scale-invariant Feature transform (SIFT) dictionary and a significant dictionary are used, the SIFT dictionary specifically carries out SIFT transformation on the reference three-dimensional image pair, SIFT features are used for representing images, and dictionary training is carried out by using a Feature-sign method and a Lagrange duality method to obtain an SIFT dictionary; the salient dictionary is specifically that an initial stereoscopic vision salient image is obtained by combining an absolute difference image, the initial stereoscopic vision salient image is optimized by adopting the central offset and fovea characteristics conforming to a human visual mechanism to obtain a final salient image, the salient image of a reference stereoscopic image pair is extracted, and then the front m salient blocks of each salient image are selected as source data for training the salient dictionary to obtain the salient dictionary by using n multiplied by n overlapped blocks according to the size value of the variance; in the testing stage, after SIFT transformation is carried out on the reference and distorted stereo image pairs on one hand, sparse coding is carried out by combining a trained SIFT dictionary to obtain sparse coefficient matrixes of the reference and distorted stereo image pairs, and the sparse coefficient matrixes are processed to obtain the quality Q1 of the corresponding distorted stereo image; on the other hand, saliency processing is carried out on the reference and distorted stereo image pair to obtain a reference and distorted stereo salient image, all salient blocks which are not overlapped by n x n are used as input data to be combined with a salient dictionary to obtain a sparse coefficient matrix of the corresponding reference and distorted stereo image pair, and the sparse coefficient matrix is processed to obtain the quality Q2 of the corresponding distorted stereo image; and finally, combining the Q1 and the Q2 to obtain a final quality score Q of the distorted stereo image pair.
2. The method as claimed in claim 1, wherein the quality of the stereoscopic saliency map is used to reflect the quality of the stereoscopic image, the absolute difference map of the left and right views is used to represent the depth information, first, the luminance, chrominance and texture contrast features of the image are extracted to represent the saliency information, the initial feature saliency map is obtained by combining the absolute difference map, and then, the initial feature saliency map is optimized by using the center shift and fovea characteristics to obtain the final stereoscopic saliency map.
3. The stereo image quality evaluation method based on dictionary learning according to claim 1, characterized by further comprising:
(1) center offset
An anisotropic gaussian kernel is used to model the factor cb (center bias) of central shift of attention from center to periphery:
Figure FDA0001573766480000011
CB (x, y) represents pixel point (x, y) to center point (x)0,y0) Offset information of (a), (b), (c), and (d)0,y0) Coordinates of the center point representing the distorted right viewpoint, (x, y) are coordinates of pixel points, sigmahAnd σvRespectively representing the standard deviation of the image in the horizontal direction and the vertical direction, taking sigmah=1/3W,σv1/3H, where W and H represent the number of horizontal and vertical pixels of the image;
(2) central concave
The characteristic simulation is carried out by adopting a function shown in formula (2):
Figure FDA0001573766480000012
wherein e (x, y) represents the retinal centrifugation degree of the pixel point (x, y) and the unit is degree; f is spatial frequency, which is in cycles/degree; c0Is a contrast threshold; delta is a spatial frequency attenuation parameter; e.g. of the type1Is a half resolution centrifugation constant;
searching a gray value which enables the difference between the foreground and the background of the three-dimensional saliency map to be maximum by using a maximum inter-class variance method, wherein the gray value is an optimal threshold value, dividing the three-dimensional saliency map into a saliency region and a non-saliency region by using the threshold value, further calculating the retina centrifugation degree e (x, y) by using the spatial distance relation between a pixel and a saliency pixel closest to the pixel, and assuming that the coordinate of any pixel is (x, y) and the coordinate of a saliency pixel closest to the pixel is (x, y)1,y1) The eccentricity e (x, y) is represented by the formula3) Obtaining:
Figure FDA0001573766480000021
wherein W is the number of horizontal pixels of the image, v is the viewing distance, pixel (x, y) and pixel (x)1,y1) Euclidean distance of (c):
Figure FDA0001573766480000022
4. the stereo image quality evaluation method based on dictionary learning as claimed in claim 1, wherein the dictionary training comprises the following steps: fixing one of the dictionary and the sparse coefficient matrix to solve the other, and then carrying out iteration to obtain a proper dictionary and sparse coefficient matrix:
Figure FDA0001573766480000023
Figure FDA0001573766480000024
in equations (4) and (5), X is the input signal, B is the complete dictionary, and S is the sparse coefficient matrix. I | · | purple windFIs the F-norm, λ is the regularization parameter, | · | | | luminance1Is a1Norm, BiRepresenting the ith column of atoms in a dictionary, wherein formula (4) is an L1 regularized optimization problem, and typical optimization methods include a least angle regression method (LARS), a Feature-sign search method Feature-sign (Feature-sign search Algorithm), wherein in each iteration step, the Feature-sign method changes an irrelative concave problem into an unconstrained secondary optimization problem QP (quadratic optimization) by guessing the sign of a sparse coefficient; formula (5) is a typical quadratic constraint least square optimization problem, and the optimization methods include a QCQP convex optimization method, an iterative gradient descent method and a Lagrange multiplier method.
5. The method for evaluating the quality of the stereo image based on the dictionary learning as claimed in claim 1, wherein in one example, the step of training the salient dictionary specifically includes processing the image by using 8 × 8 overlapped windows to obtain a plurality of 8 × 8 small image blocks, extracting the first 3000 8 × 8 salient blocks of each salient image by using variance information, vectorizing all the salient blocks and forming a matrix as the input of dictionary training, changing each salient block into a column vector, and performing iterative training according to feature-sign algorithm and lagrange dual method to obtain a proper sparse salient dictionary, wherein the size of the small image block is 8 × 8, so that the size of the salient dictionary in this text is selected to be 64 × 128.
6. The method as claimed in claim 5, wherein in the training stage of the SIFT dictionary, a 16 × 16 neighborhood is taken as a sampling window with a feature point as a center, the relative directions of the sampling point and the feature point are classified into a histogram of directions including 8 bins after Gaussian weighting, a histogram of gradients in 8 directions is calculated on each 4 × 4 block to form a seed point, 16 seed points of 4 × 4 are used for describing each key point, so that a 128-dimensional SIFT feature vector of 4 × 4 × 8 is obtained for one key point, and 128 × 1024 is selected according to the size of the SIFT dictionary in the feature dimension.
7. The dictionary learning-based stereo image quality evaluation method as claimed in claim 5, wherein after a significant dictionary and a SIFT dictionary are obtained in a dictionary training stage, the obtained dictionary is used for testing a distorted image, after a reference image and a test image are preprocessed, sparse coding is directly carried out by using the corresponding dictionary obtained in the training stage, sparse coefficient matrixes of the reference stereo image pair and the distorted stereo image pair are obtained by using an OMP algorithm, sparsity change of the images is selected as a basis of stereo image quality evaluation, and the sparse matrixes of the reference stereo image pair and the distorted stereo image pair of two channels are respectively combined to obtain stereo image quality scores Q of the two channels1And Q2(ii) a And finally, combining the quality scores of the two channels to obtain the quality score Q of the final distorted stereo image.
8. The dictionary learning-based stereoscopic image quality evaluation method as claimed in claim 5, wherein the quality evaluation Q value specifically obtaining step:
the quality of the distorted image is evaluated from a sparse matrix of the distorted image,
Figure FDA0001573766480000031
respectively representing a reference and a distortion stereo pair, and combining the reference and the distortion stereo pair with an SIFT dictionary for sparse coding after SIFT preprocessing;
Figure FDA0001573766480000032
respectively representing sparse coefficient matrixes for performing sparse coding on the reference stereo pair and the distortion stereo pair which are combined with a significance dictionary after significance preprocessing; o represents a reference image, d represents a distorted image, l represents a left image, r represents a right image, 1 represents a SIFT dictionary test channel, and 2 represents a salient dictionary test channel; the left view quality for the test of the SIFT dictionary channel is obtained from equations (6) (7) (8) (9):
Figure FDA0001573766480000033
Figure FDA0001573766480000034
Figure FDA0001573766480000035
Figure FDA0001573766480000036
s (i, j) represents a sparse coefficient similarity index,and the closer the value of S (i, j) is to 1, the smaller the degree of distortion of the distorted image is,
Figure FDA0001573766480000037
and
Figure FDA0001573766480000038
respectively representing values at the reference left image sparse matrix and the distorted left image sparse matrix (i, j), t, c being constants, M representing the number of rows of the sparse matrix, N representing the number of columns of sparse coefficients, where α + β is 1; similarly, the right view quality of the SIFT dictionary test channel is obtained
Figure FDA0001573766480000039
For the left and right test stereo image quality scores of the salient dictionary test channel, the same formulas (6), (7), (8) and (9) are used to respectively obtain
Figure FDA00015737664800000310
And
Figure FDA00015737664800000311
combining the quality scores of the left view and the right view by using the weight geometry to obtain the quality score Q of the stereo image of the SIFT dictionary testing channel1And stereo image quality score Q of salient dictionary test channel2
Figure FDA00015737664800000312
The mean square value of the distorted stereo image pair representing the SIFT dictionary test channel and the significant dictionary test channel respectively to the sparse matrix of the left and right distorted stereo images is obtained by the following formulas (10) and (11) for solving the weight:
Figure FDA00015737664800000313
Figure FDA00015737664800000314
according to the obtained weight and the formula (12), the mass fraction Q of the two channels is obtained1And Q2
Figure FDA0001573766480000041
Finally, the quality fraction Q for the distorted stereoscopic image is Q1And Q2In combination, as in formula (13)
Figure FDA0001573766480000042
CN201810126932.9A 2018-02-08 2018-02-08 Three-dimensional image quality evaluation method based on dictionary learning Expired - Fee Related CN108389189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810126932.9A CN108389189B (en) 2018-02-08 2018-02-08 Three-dimensional image quality evaluation method based on dictionary learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810126932.9A CN108389189B (en) 2018-02-08 2018-02-08 Three-dimensional image quality evaluation method based on dictionary learning

Publications (2)

Publication Number Publication Date
CN108389189A CN108389189A (en) 2018-08-10
CN108389189B true CN108389189B (en) 2021-05-14

Family

ID=63074570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810126932.9A Expired - Fee Related CN108389189B (en) 2018-02-08 2018-02-08 Three-dimensional image quality evaluation method based on dictionary learning

Country Status (1)

Country Link
CN (1) CN108389189B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706196B (en) * 2018-11-12 2022-09-30 浙江工商职业技术学院 Clustering perception-based no-reference tone mapping image quality evaluation algorithm
CN109887023B (en) * 2019-01-11 2020-12-29 杭州电子科技大学 Binocular fusion stereo image quality evaluation method based on weighted gradient amplitude
CN111833323B (en) * 2020-07-08 2021-02-02 哈尔滨市科佳通用机电股份有限公司 Image quality judgment method for task-divided rail wagon based on sparse representation and SVM (support vector machine)
CN114648482A (en) * 2020-12-19 2022-06-21 中国科学院深圳先进技术研究院 Quality evaluation method and system for three-dimensional panoramic image
CN112597916B (en) * 2020-12-24 2021-10-26 中标慧安信息技术股份有限公司 Face image snapshot quality analysis method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134204A (en) * 2014-07-09 2014-11-05 中国矿业大学 Image definition evaluation method and image definition evaluation device based on sparse representation
CN104581143A (en) * 2015-01-14 2015-04-29 宁波大学 Reference-free three-dimensional picture quality objective evaluation method based on machine learning
CN106162162A (en) * 2016-08-01 2016-11-23 宁波大学 A kind of reorientation method for objectively evaluating image quality based on rarefaction representation
CN106504236A (en) * 2016-10-20 2017-03-15 天津大学 Based on rarefaction representation without referring to objective evaluation method for quality of stereo images
CN106780432A (en) * 2016-11-14 2017-05-31 浙江科技学院 A kind of objective evaluation method for quality of stereo images based on sparse features similarity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134204A (en) * 2014-07-09 2014-11-05 中国矿业大学 Image definition evaluation method and image definition evaluation device based on sparse representation
CN104581143A (en) * 2015-01-14 2015-04-29 宁波大学 Reference-free three-dimensional picture quality objective evaluation method based on machine learning
CN106162162A (en) * 2016-08-01 2016-11-23 宁波大学 A kind of reorientation method for objectively evaluating image quality based on rarefaction representation
CN106504236A (en) * 2016-10-20 2017-03-15 天津大学 Based on rarefaction representation without referring to objective evaluation method for quality of stereo images
CN106780432A (en) * 2016-11-14 2017-05-31 浙江科技学院 A kind of objective evaluation method for quality of stereo images based on sparse features similarity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Supervised dictionary learning for blind image quality assessment";Qiuping Jiang;《VCIP》;20190425;第1-4页 *
"基于联合字典的无参考真实失真图像的质量评价";高影;《光电子·激光》;20180131;第105-112页 *

Also Published As

Publication number Publication date
CN108389189A (en) 2018-08-10

Similar Documents

Publication Publication Date Title
CN107633513B (en) 3D image quality measuring method based on deep learning
CN108389189B (en) Three-dimensional image quality evaluation method based on dictionary learning
CN110414377B (en) Remote sensing image scene classification method based on scale attention network
Niu et al. 2D and 3D image quality assessment: A survey of metrics and challenges
WO2018023734A1 (en) Significance testing method for 3d image
Ye et al. Real-time no-reference image quality assessment based on filter learning
CN112446476A (en) Neural network model compression method, device, storage medium and chip
CN110060236B (en) Stereoscopic image quality evaluation method based on depth convolution neural network
CN106897669B (en) Pedestrian re-identification method based on consistent iteration multi-view migration learning
Berbar Three robust features extraction approaches for facial gender classification
CN106126585B (en) The unmanned plane image search method combined based on quality grading with perceived hash characteristics
Fang et al. Deep3DSaliency: Deep stereoscopic video saliency detection model by 3D convolutional networks
CN108960142B (en) Pedestrian re-identification method based on global feature loss function
CN108470178B (en) Depth map significance detection method combined with depth credibility evaluation factor
CN113963032A (en) Twin network structure target tracking method fusing target re-identification
Guo et al. Image esthetic assessment using both hand-crafting and semantic features
CN108460400A (en) A kind of hyperspectral image classification method of combination various features information
Yang et al. No-reference stereoimage quality assessment for multimedia analysis towards Internet-of-Things
Sabry et al. Image retrieval using convolutional autoencoder, infogan, and vision transformer unsupervised models
CN106384364A (en) LPP-ELM based objective stereoscopic image quality evaluation method
Sowmya et al. Significance of processing chrominance information for scene classification: a review
CN114841887A (en) Image restoration quality evaluation method based on multi-level difference learning
CN112580442B (en) Behavior identification method based on multi-dimensional pyramid hierarchical model
Junhua et al. No-reference image quality assessment based on AdaBoost_BP neural network in wavelet domain
Fang et al. Learning visual saliency for stereoscopic images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210514