CN108648207B

CN108648207B - Stereo image quality evaluation method based on segmented stack type self-encoder

Info

Publication number: CN108648207B
Application number: CN201810444082.7A
Authority: CN
Inventors: 杨嘉琛; 赵洋
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-05-10
Filing date: 2018-05-10
Publication date: 2021-07-09
Anticipated expiration: 2038-05-10
Also published as: CN108648207A

Abstract

The invention relates to a stereo image quality evaluation method based on a segmented stack type self-encoder, which is used for extracting primary edge characteristics from a sum graph, a difference graph and a monocular diagram under an unsupervised condition

Inputting three trained segmented stacked self-encoders S-SAE to obtain abstract deep edge characteristics

The primary color feature of the color map is then corrected using a stacked auto-encoder SAE

Encoding to obtain abstract deep color characteristics

And finally, fitting the deep characteristic vector of the stereo image with the corresponding MOS value, and predicting the quality score of the stereo image to be measured by using the deep characteristic vector.

Description

Stereo image quality evaluation method based on segmented stack type self-encoder

Technical Field

The invention belongs to the field of image processing, and relates to an objective evaluation method for quality of a non-reference stereo image.

Background

With the rapid development of the stereoscopic display technology, the stereoscopic display technology has been widely applied to various fields. Compared with a plane image, the stereo image brings brand-new experience and presence to audiences. Therefore, stereo image processing research is receiving a lot of attention. However, due to the influence of equipment, processing means and other factors, the stereoscopic image may introduce distortion during the acquisition, compression, transmission and storage processes, which affects the quality of the stereoscopic image. Therefore, there is a need to develop an evaluation method that can effectively evaluate the quality of stereoscopic images. Although subjective quality evaluation is a very reliable evaluation method, the subjective evaluation method consumes a lot of manpower and time, and is poor in real-time performance. In addition, the subjective evaluation method is easily interfered by human and external environmental factors, and the evaluation result is not stable enough. Compared with subjective evaluation, objective evaluation utilizes software to evaluate the quality of images, does not need organization personnel to participate in a large number of subjective tests, is simple to operate, and is highly related to the result of the subjective evaluation, so that the objective evaluation is more and more concerned by related researchers.

Currently, objective stereoscopic image quality evaluation is mainly divided into three methods, namely full-reference stereoscopic image quality evaluation, partial-reference quality evaluation and no-reference quality evaluation, according to whether an original image is used in the evaluation process. The former two methods make objective evaluation on the stereo image by using the original image or partial information of the original image, and have great limitations. The method is not required to be subjected to reference quality evaluation, and is the method which is most suitable for practical situations. Due to the problems that binocular characteristics are not fully considered, the process of binocular stereo image processing is not thoroughly understood, and the like, stereo image quality evaluation is still a hotspot and difficulty of current research.

Disclosure of Invention

The invention aims to provide a non-reference stereo image quality evaluation method for simulating the perception of human eyes to stereo images and the image processing process. The invention extracts the primary characteristics of the edges, colors and the like of the stereo image, converts the primary characteristics into deep level characteristics which are more in line with human eye characteristic abstraction through the segmented stacking type self-encoder or the stacking type self-encoder, and accordingly makes more comprehensive, accurate and objective evaluation on the quality of the stereo image. The technical scheme is as follows:

a stereo image quality evaluation method based on a segmented stack type self-encoder extracts primary edge features from a sum graph, a difference graph and a monocular graph under an unsupervised condition

And

inputting three trained segmented stacked self-encoders S-SAE, obtaining abstract deep edge characteristics

And

Encoding to obtain abstract deep color characteristics

And finally, fitting the deep characteristic vector of the stereo image with the corresponding MOS value, and predicting the quality score of the stereo image to be measured by using the deep characteristic vector. The method comprises the following steps:

the first step is as follows: synthesis of sum graph (S), difference graph (D) and monocular graph (C) of left and right LoG graphs

Filtering the image pair by using a Gauss Laplace LoG filter to obtain left and right LoG images, and setting parameters of LoG as (n, sigma) belonging to { (3,0.5), (7,1), (13,2) }, wherein sigma is a standard deviation of a Gauss Laplace operator, thereby obtaining left and right LoG images of three edge thicknesses; then, calculating a sum graph, a difference graph and a single eye graph of each left and right LoG graph;

the second step is that: extracting primary edge features of sum, difference and monocular images

And

fitting the MSCN coefficient histogram of the sum graph by using a generalized Gaussian distribution GGD model, and taking the variance and the shape parameters of the GGD as 2 characteristics of the sum graph; respectively fitting MSCN neighborhood coefficient histograms in 4 directions of horizontal, vertical, main diagonal and secondary diagonal of the graph by using 4 asymmetric generalized Gaussian distribution AGGD models, and respectively calculating the mean value, variance and shape of the 4 AGGD modelsThe 4 parameters of shape and size are taken as the characteristics of the sum graph, and 16 characteristics are extracted; in addition, the amplitude, variance and entropy information of the sum graph are taken as 3 characteristics of the sum graph; there are three edge thickness sums because there are left and right LoG plots for three edge thicknesses; according to the steps, 21-dimensional feature vectors are extracted from the sum graph of each edge thickness, and 63-dimensional primary edge feature vectors are finally extracted from the sum graph

The difference image and the monocular image have the same feature extraction method as the sum image, and 63-dimensional primary edge feature vectors are extracted from both the difference image and the monocular image

And

the third step: training 3 segmented stacked autoencoder S-SAE

Randomly selecting 50% of the image pairs in the image library trains three segmented stacked self-encoders S-SAE to extract primary edge features from the sum, difference and monocular images

And

respectively training three segmented stacked self-encoders S-SAE as samples;

each segmented stacked self-encoder S-SAE consists of 3 local stacked self-encoders SAE; segmented stacked self-encoder will input according to different image edge thickness

The method comprises the following steps of dividing the system into three sections, inputting 21 characteristics of each section into 3 local SAEs, and training the 3 local SAEs respectively; training to obtain 3 local SAEs with three hidden layers, wherein the number of each layer unit of the 3 local SAEs is 21-18-14-12, finally, connecting output layers of local SAE in series to obtain a segmented stacked self-encoder S-SAE-S of the sum graph;

training to obtain a segmented stacked self-encoder S-SAE-D of a difference diagram and a segmented stacked self-encoder S-SAE-C of a single-eye diagram according to the steps, wherein the number of each layer unit of local SAE of the two S-SAEs is 21-18-14-12;

the fourth step: extracting deep features of sum, difference and monocular maps

And

will be provided with

And

the trained S-SAE-S, S-SAE-D and S-SAE-C are input, and the three-segment stacked self-encoder will

And

respectively encoded as abstract deep-edge features

And

the fifth step: extracting primary color feature vectors of color maps

The three color maps of the left image were fitted separately with 6 AGGD models: three color maps of the RG map, the BY map, the Lum map and the right map: RG map, BY map and Lum map,extracting the shape, the left variance and the right variance of the AGGD model, and simultaneously calculating kurtosis and skewness values of the 6 AGGD models; 30-dimensional primary color feature vector is extracted

And a sixth step: extracting deep color feature vectors of color maps

Randomly selecting 50% of image pairs in an image library to train a stacked self-encoder SAE, wherein the number of each layer unit of the stacked self-encoder is 30-25-20-15, and the method comprises the following steps of

Input of the trained SAE, SAE will

Encoding as abstract deep color feature vectors

The seventh step: computing stereo image local quality score

Randomly selecting 80% of image pairs in the image library as a training set, and using the image pairs in the training set

With corresponding MOS training

A corresponding support vector regression (SVR-S); using SVR-S prediction and the quality score of the image, the remaining 20% of the image pairs in the image library are used as a test set

By the method, the mass fractions of the difference image, the monocular image and the color image are obtained respectively

And

eighth step: calculating an objective quality score for a stereoscopic image

1 calculating local quality scores related to edge information by using dynamic weights

Weighting the sum image quality score and the difference image quality score to obtain a sum-difference image quality score Q_SD：

Q_SD＝W_DQ_D+(1-W_D)Q_S 1

Wherein the weight of the difference map

μ_LAnd mu_RIs a desire for L and R, σ_L,σ_RIs the variance, C₁，C₂Is a constant value, C₁＝0.6，C₂＝5；

Combining the sum-difference image quality fraction and the monocular image fraction to obtain an edge characteristic quality fraction Q_edge：

Q_edge＝W_CQ_C+(1-W_C)Q_SD 2

Wherein the content of the first and second substances,

C₃，C₄is a constant value, C₃＝0.55，C₄＝0.8；

2 calculating stereoscopic image quality score

The quality score of the edge is assigned a higher weight:

Q＝W_edgeQ_edge+W_colorQ_color 3

wherein the edge weight W_edge＝0.8、W_color＝0.2。

The method for evaluating the objective quality of the three-dimensional image provided by the invention is based on the edge information and the color information of the image, combines the operation mechanism of the whole visual perception channel, utilizes the segmented stacked self-encoder to simulate the process of processing the image information by human eyes, and establishes a non-reference three-dimensional image quality objective evaluation model. The obtained objective evaluation result and the subjective evaluation result of the quality of the three-dimensional image have high consistency, and the quality of the three-dimensional image can be reflected more accurately.

Drawings

Fig. 1 is a flowchart of a method for evaluating stereoscopic image quality based on a segmented stacked auto-encoder, fig. 2 is an RG diagram, a BY diagram and a Lum diagram of a left diagram in an image pair, and fig. 3 is a structural diagram of a segmented stacked auto-encoder.

Detailed Description

The invention relates to a stereo image quality evaluation method based on a segmented stack type self-encoder, which comprises the following steps of converting primary edge feature vectors of a sum image, a difference image and a monocular image into abstract deep layer edge feature vectors by using three segmented stack type self-encoders, then encoding primary color features of a color image by using a stack type automatic encoder to obtain the abstract deep layer color feature vectors, wherein the deep layer feature vectors can reflect the distortion degree of the stereo image, so that the quality evaluation of the distorted stereo image is carried out, and the method comprises the following steps:

the first step is as follows: synthesis of left and right LoG maps (L)_GoL、R_GoL) Sum, difference and monocular images

Simulating the process of extracting the image edges by retinal nerve cells, filtering the image pair by using a Laplacian of Gaussian (LoG) filter, inputting the image pair into an n × n Gaussian low-pass filter, applying a 3 × 3 weighted mask window to a 3 × 3 region centered at (i, j), and calculating the correlation value (convolution and sum) of the window; here, the setting parameter is set to (n, σ) ∈ { (3,0.5), (7,1), (13,2) }, σ is the standard deviation of the laplacian of gaussian operator, thereby obtaining LoG maps of 3 kinds of edge thicknesses; then, a sum map (S), a difference map (D), and a monocular map (C) of the left and right LoG maps are calculated. The calculation method is as follows:

S(i,j)＝L_LoG(i,j)+R_LoG(i,j) (1)

D(i,j)＝L_LoG(i,j)-R_LoG(i,j) (2)

C(i,j)＝W_LL_LoG(i,j)+w_R((i+d(i,j)),j)R_LoG((i+d(i,j)),j) (3)

wherein L is_LoGIs a left LoG plot, R_LoGIs the right LoG image, d is the parallax, W_LAnd W_RThe weights of the left and right LoG maps are obtained by assigning normalized Gabor filter energy responses, and are defined as:

wherein is GE_LAnd GE_RIndicating the energy response values of the left and right LoG plots at all sizes and orientations, respectively.

And

fitting MSCN coefficient histogram of the sum graph by using a Generalized Gaussian Distribution (GGD) model, and taking variance and shape parameters of the GGD as 2 characteristics of the sum graph; respectively fitting MSCN neighborhood coefficient histograms in 4 directions of a horizontal direction, a vertical direction, a main diagonal line and a secondary diagonal line of a sum graph by using 4 Asymmetric Generalized Gaussian Distribution (AGGD) models, taking 4 parameters of a mean value, a variance, a shape and a size of the 4 AGGD models as the characteristics of the sum graph, and extracting 16 characteristics; in addition, the amplitude, variance, entropy information of the sum graph are taken as 3 features of the sum graph. Since there are left and right LoG maps of three edge thicknesses, there are sum maps of three edge thicknesses. According to the steps, 21-dimensional feature vectors can be extracted from the sum graph of each edge thickness, and 63-dimensional primary edge feature vectors are finally extracted from the sum graph

The feature extraction method of the difference image and the monocular image is completely the same as that of the sum image, and 63-dimensional primary edge feature vectors are extracted from both the difference image and the monocular image

And

the third step: training 3 segmented stacked autoencoder S-SAE

Randomly select 50% of the image pairs in the image library to train 3S-SAEs. Extracting primary edge features from the sum, difference and monocular images

And

three segmented stacked autoencoders are trained as samples, respectively.

The segmented stacked self-encoder consists of 3 local stacked self-encoders (local SAE). Segmented stacked self-encoder will input according to different edge thickness

The method is divided into three sections, each section has 21 characteristics, the characteristics are input into 3 local SAEs, the 3 local SAEs are trained respectively, and the training method is the same as that of a stacked self-encoder; training to obtain 3 local SAEs with three hidden layers, wherein the unit number of each layer of the 3 local SAEs is 21-18-14-12, and finally, connecting the output layers of the local SAEs in series to obtain a segmented stacked self-encoder (S-SAE-S) of the sum graph.

According to the steps, a segment-stacked self-encoder (S-SAE-D) of a difference diagram and a segment-stacked self-encoder (S-SAE-C) of a monocular diagram are obtained by training, and the number of units of each layer of the local SAE of the two S-SAEs is 21-18-14-12.

The fourth step: extracting deep features of sum, difference and monocular maps

And

will be provided with

And

And

respectively encoded as abstract deep-edge features

And

the fifth step: extracting primary color feature vectors of color maps

The vision system processes color signals in the lateral geniculate nucleus. The color excitation is coded by comparing the activity of different cones by using the outer knee kernel, and the processing process of color information in the outer knee kernel is simulated by using reverse coding. In the retina, cones are divided into three types: l-cones, M-cones and S-cones, which are sensitive to long (red related), medium (green), and short (blue) wavelengths, respectively. There are three types of reverse channel coding, red and green channel (Lum), blue and yellow channel (RG), and light brightness channel (BY).

Wherein the content of the first and second substances,

the MSCN coefficient is obtained by calculating the MSCN coefficient after the RGB image of the image pair is subjected to logarithmic transformation.

Fitting the color chart (RG chart, BY chart and Lum chart) of the left chart and the color chart (RG chart, BY chart and Lum chart) of the right chart BY 6 AGGD models, extracting the shape, left variance and right variance of the AGGD, and simultaneously calculating the kurtosis and skewness values of the 6 AGGD; extracting 30-dimensional primary color feature vector from color image

And a sixth step: extracting deep color feature vectors of color maps

Randomly selecting 50% of the image pairs in the image library trains a stacked self-encoder SAE, the number of the units of each layer of the stacked self-encoder is 30-25-20-15. Will be provided with

Input of the trained SAE, SAE will

Encoding as abstract deep color feature vectors

The seventh step: computing stereo image local quality score

With corresponding MOS training

A corresponding support vector regression machine (SVR-S); using SVR-S prediction and the quality score of the image, the remaining 20% of the image pairs in the image library are used as a test set

And

eighth step: calculating an objective quality score for a stereoscopic image

(1) Computing local quality scores associated with edge information using dynamic weights

Q_SD＝W_DQ_D+(1-W_D)Q_S (9)

Wherein the weight of the difference map

μ_LAnd mu_RIs a desire for L and R, σ_L,σ_RIs the variance, C₁，C₂Is a constant value, C₁＝0.6，C₂＝5。

Dividing the sum-difference image quality fraction and the single eye image fractionCombining the numbers to obtain an edge characteristic quality fraction Q_edge：

Q_edge＝W_CQ_C+(1-W_C)Q_SD (10)

Wherein the content of the first and second substances,

C₃，C₄is a constant value, C₃＝0.55，C₄＝0.8。

(2) Calculating stereo image quality scores

Since edge information is more important than color information in stereoscopic image quality evaluation, a higher weight is assigned to the quality score of an edge:

Q＝W_edgeQ_edge+W_colorQ_color (11)

according to the experiment, the edge weight W is proved_edge＝0.8、W_colorWhen the value is 0.2, the algorithm has the best effect.

Claims

1. A stereo image quality evaluation method based on a segmented stack type self-encoder extracts primary edge features from a sum graph, a difference graph and a monocular graph under an unsupervised condition

And

And

Encoding to obtain abstract deep color characteristics

And finally, fitting the deep characteristic vector of the stereo image with the corresponding MOS value, and predicting the quality score of the stereo image to be tested by utilizing the deep characteristic vector of the stereo image to be tested, wherein the method comprises the following steps:

the first step is as follows: synthesis of sum map S, difference map D and monocular map C of left and right LoG maps

Filtering the image pair by using a Gaussian Laplace LoG filter, inputting the image pair into an n multiplied by n Gaussian low-pass filter to obtain left and right LoG images, and setting parameters of LoG as (n, sigma) epsilon { (3,0.5), (7,1), (13,2) }, wherein sigma is a standard difference of a Gaussian Laplace operator, thereby obtaining left and right LoG images with three edge thicknesses; then, calculating a sum graph, a difference graph and a single eye graph of each left and right LoG graph;

And

fitting the MSCN coefficient histogram of the sum graph by using a generalized Gaussian distribution GGD model, and taking the variance and the shape parameters of the GGD as 2 characteristics of the sum graph; respectively fitting MSCN neighborhood coefficient histograms in 4 directions of a horizontal direction, a vertical direction, a main diagonal line and a secondary diagonal line of a sum graph by using 4 asymmetric generalized Gaussian distribution AGGD models, taking 4 parameters of the mean value, the variance, the shape and the size of the 4 AGGD models as the characteristics of the sum graph, and extracting 16 characteristics; in addition, the amplitude, variance and entropy information of the sum graph are taken as 3 characteristics of the sum graph; there are three edge thickness sums because there are left and right LoG plots for three edge thicknesses; according to the steps, 21-dimensional feature vectors are extracted from the sum graph of each edge thickness, and 63-dimensional primary edge feature vectors are finally extracted from the sum graph

And

the third step: training 3 segmented stacked autoencoder S-SAE

And

respectively training three segmented stacked self-encoders S-SAE as samples;

The method comprises the following steps of dividing the system into three sections, inputting 21 characteristics of each section into 3 local SAEs, and training the 3 local SAEs respectively; training to obtain 3 local SAEs with three hidden layers, wherein the unit number of each layer of the 3 local SAEs is 21-18-14-12, and finally connecting the output layers of the local SAEs in series to obtain a segmented stacked self-encoder S-SAE-S of the sum graph;

the fourth step: extracting deep features of sum, difference and monocular maps

And

will be provided with

And

And

respectively encoded as abstract deep-edge features

And

the fifth step: extracting primary color feature vectors of color maps

The three color maps of the left image were fitted separately with 6 AGGD models: three color maps of the RG map, the BY map, the Lum map and the right map: extracting the shape, left variance and right variance of the AGGD model from the RG graph, the BY graph and the Lum graph, and simultaneously calculating kurtosis and skewness values of the 6 AGGD models; 30-dimensional primary color feature vector is extracted