CN108470336B

CN108470336B - Stereo image quality evaluation method based on stack type automatic encoder

Info

Publication number: CN108470336B
Application number: CN201810272419.0A
Authority: CN
Inventors: 杨嘉琛; 赵洋
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2021-06-29
Anticipated expiration: 2038-03-29
Also published as: CN108470336A

Abstract

The invention relates to a three-dimensional image quality evaluation method based on a stack type automatic encoder, which comprises the following steps: synthesizing a monocular diagram by using the left view and the right view; will be leftThe right view is converted into a binocular fusion graph and a binocular difference graph which are mutually irrelevant; calculating a monocular diagram I_CMSCN coefficient of

And the monocular diagram is at the horizontal H, vertical V and main diagonal D₁And the secondary diagonal D₂MSCN neighborhood coefficients in four directions; extracting a primary feature vector from the monocular image; extracting a primary feature vector of the binocular fusion map; extracting features of the binocular disparity map according to the same method; training a stack type automatic encoder to obtain deep features; training a corresponding support vector regression machine; and obtaining the quality score of the stereo image. The invention improves the accuracy of the objective evaluation method.

Description

Stereo image quality evaluation method based on stack type automatic encoder

Technical Field

The invention belongs to the field of image processing, and relates to an objective evaluation method for quality of a non-reference stereo image.

Background

With the rapid development of multimedia technology in recent years, stereoscopic display technology is widely used. Meanwhile, compared with a plane image, the stereo image has stronger visual perception to people and brings more real audio-visual perception and telepresence experience to people, so that stereo image/video processing research is widely concerned by people. However, due to the influence of factors such as equipment and processing means, the image/video distortion is inevitably caused during the acquisition, compression, transmission and storage processes of the stereo image/video, and the quality of the stereo image/video is further influenced. Therefore, it is important to research an evaluation method capable of effectively evaluating the quality of a stereoscopic image. Although subjective quality evaluation is a reliable evaluation method, the subjective evaluation method is easily interfered by human and external environmental factors, the evaluation result is not stable enough, and a large amount of manpower and material resources are consumed. Compared with subjective evaluation, objective evaluation evaluates the quality of images in a software mode, does not need participants and a large number of subjective tests, is simple to operate, and is highly related to the subjective evaluation, so that the objective evaluation is more and more concerned by related researchers.

The current stereo image quality evaluation is mainly divided into three types of methods according to whether to refer to an original image. The first type is full-reference stereo image quality evaluation, the second type is partial-reference quality evaluation, and the two methods need to refer to an original image or make objective evaluation on the stereo image by partial information of the original image, so that the method has great limitation. The third type is no reference quality evaluation, and the method does not need to refer to an original image and is the method which is most suitable for practical situations. Due to inaccuracy of acquisition of depth information of the stereo image and insufficient consideration of binocular characteristics, evaluation of the quality of the stereo image is still a hotspot and a difficulty of current research.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and establish a non-reference stereo image quality evaluation method which is based on a stack type automatic encoder and fully considers the visual characteristics of human eyes. The invention extracts the basic characteristics which accord with the human eye three-dimensional perception from the monocular diagram, the binocular fusion diagram and the binocular difference diagram, and converts the characteristics into the deep level characteristics which accord with the human eye characteristics through the stacked self-encoder, thereby being capable of making more comprehensive and accurate objective evaluation on the quality of the three-dimensional image. The technical scheme is as follows:

a three-dimensional image quality evaluation method based on a stack type automatic encoder comprises the following steps:

the first step is as follows: monocular picture I synthesized by using left and right views_C；

The second step is that: converting the left view and the right view into a binocular fusion graph and a binocular difference graph which are mutually irrelevant;

the third step: extracting primary feature vectors of a monocular diagram

(1) Calculating a monocular diagram I_CMSCN coefficient of

And the monocular diagram is at the horizontal H, vertical V and main diagonal D₁And the secondary diagonal D₂MSCN neighborhood coefficients in four directions;

(2) extracting MSCN neighborhood coefficients of the monocular image in 4 directions of horizontal, vertical, main diagonal and secondary diagonal, and defining as follows:

(3) fitting the MSCN coefficient histogram of the monocular diagram and MSCN neighborhood coefficient histograms of the monocular diagram in four directions of horizontal, vertical, main diagonal and secondary diagonal by using an asymmetric generalized Gaussian distribution model (AGGD); taking the mean value, the variance, the shape parameter and the size parameter of the 5 AGGD models as the characteristics of the monocular diagram, and extracting 20 characteristics; fitting the gradient amplitude of the monocular diagram by using the Weber distribution, and extracting the shape parameters and the size parameters of the Weber distribution as 2 characteristics of the monocular diagram; furthermore, a monocular image is subjected to DCT transformation, 5 features of block, sharpness, smoothness, kurtosis and Jansen Shannon divergence JSD in a DCT domain are extracted, and a 27-dimensional primary feature vector P is extracted from the monocular image_C ⁽ⁱ⁾；

The fourth step: extracting primary feature vectors of binocular fusion map

(1) Performing joint adaptive normalization JAN on a gradient magnitude graph GM and a Laplace transform graph LOG of a binocular fusion graph, quantizing the normalized gradient magnitude graph and the Laplace transform graph into an M level and an N level respectively, and calculating the marginal probability density P of the two quantized graphs_GM、P_LOGAnd independently distributed Q_GM、Q_LOGHere, M ═ N ═ 5, so 20 features are extracted;

(2) performing wavelet decomposition on the binocular fusion map in 6 directions of 2-level size, theta epsilon {0 degrees, 30 degrees, 60 degrees, 90 degrees, 120 degrees and 150 degrees }; respectively fitting 6 directions by using 6 generalized Gaussian distribution GGDs (generalized Gaussian distribution) with wavelet coefficients of 2-level sizes in each direction, and taking shape parameters of the GGDs as the characteristics of a binocular fusion graph to obtain 6 characteristics; fitting wavelet coefficients of all sizes and all directions by using a GGD, and extracting shape parameters of the GGD to serve as 1 characteristic of a binocular fusion graph; extracting 27-dimensional feature vector from common binocular fusion image

The fifth step: extracting features from the binocular difference image according to the same method, and extracting 27-dimensional primary feature vector from the binocular difference image

And a sixth step: training stack type automatic encoder

Selecting training data, left-right view

And

respectively training a stacked automatic encoder SAE-C of a single-eye diagram, a stacked automatic encoder SAE-S of a binocular fusion diagram and a stacked automatic encoder SAE-D of a binocular difference diagram as samples;

the seventh step: three stacked self-encoders will

And

encoding deep features as abstractions, respectively

And

wherein the number of units of each layer of SAE-C, SAE-S and SAE-D is 27-25-20-15;

eighth step: building training set and test set, using left and right views in training set

With corresponding MOS training

A corresponding support vector regression (SVR-C); in the same way, utilize

With corresponding MOS training

A corresponding support vector regression (SVR-SD);

the ninth step: for the test set, the quality scores of the monocular map, the binocular fusion map/binocular difference map are predicted using SVR-C, SVR-SD, and then the stereoscopic image quality scores are formed by weighting.

The method for evaluating the objective quality of the stereo image provided by the invention is based on a binocular inhibition and depth perception mechanism, and establishes a non-reference stereo video quality objective evaluation model by using a stack type automatic encoder. The obtained objective evaluation result and the subjective evaluation result of the quality of the three-dimensional image have high consistency, and the quality of the three-dimensional image can be reflected more accurately.

Drawings

Fig. 1 is a flowchart of a method for evaluating quality of a stereoscopic image based on a stacked automatic encoder.

Detailed Description

The invention provides a method for evaluating objective quality of a non-reference stereo image, which is provided on the basis of a stereo image perception theory and a sum-difference channel theory. In order to extract effective stereo perception features, the invention respectively extracts a monocular diagram (I)_C) Binocular fusion picture (I)_S) And binocular disparity map (I)_D) Primary eigenvectors of the three stereo images

Then, under an unsupervised condition, three Stacked Auto-encoders (SAE) are trained by using the primary features of the three stereo images; the three trained SAEs are then used to encode the primary feature vector of the stereo image, which is converted into a deep feature vector more suitable for stereo image quality assessment

The quality score of the deep characteristic vector of the stereo image to be measured is predicted by fitting the deep characteristic vector of the stereo image and a subjective quality score (MOS) corresponding to the stereo image. The deep feature vectors can reflect the distortion degree of the stereo image, so that the quality evaluation is carried out on the distorted stereo image, and the method comprises the following steps:

the first step is as follows: monocular rendering using left and right views

Combining binocular features, fusing left and right views according to binocular competition characteristics to obtain a monocular diagram, wherein the monocular diagram is defined as:

I_C(i,j)＝W_L(i,j)I_L(i,j)+W_R((i+d),j)I_R((i+d),j) (1)

wherein I_LAnd I_RRespectively representing a left view and a right view, d is parallax, W_LAnd W_RThe weights of the left view and the right view are obtained by normalizing Gabor filter energy response assignment and are defined as:

wherein is GE_LAnd GE_RRepresenting the energy response values of the left and right views in all dimensions and directions, respectively.

The second step is that: converting left and right views into unrelated binocular fusion image I_SAnd binocular disparity map I_DDefined as:

I_S＝|I_L-I_R| (5)

the third step: extracting primary feature vectors of a monocular diagram

Calculating Mean Subtracted Contrast Normalized (MSCN) coefficients for the monocular image, and the monocular image is in horizontal (H), vertical (V), main diagonal (D)₁) And secondary diagonal (D)₂) The MSCN neighborhood coefficients in 4 directions are calculated by the following method:

(1) let I_CHas a size of BxD, MSCN coefficient

Comprises the following steps:

in the formula (I), the compound is shown in the specification,

i is 1,2, …, B, j is 1,2, …, D, γ is constant, the denominator part is added with a constant in order to avoid instability when the denominator approaches zero in image flat areas, ω is { ω ═ ω { (ω) } ω_k,l-K, …, K; l ═ L, …, L } is a two-dimensional circularly symmetric gaussian weighting function;

(2) MSCN neighborhood coefficients of the monocular diagram in 4 directions of horizontal, vertical, main diagonal and secondary diagonal are defined as:

then, fitting the MSCN coefficient histogram of the monocular diagram and the MSCN neighborhood coefficient histograms of 4 directions of the horizontal, vertical, main diagonal and secondary diagonal of the monocular diagram by using an Asymmetric Generalized Gaussian Distribution (AGGD) model; taking the mean value, the variance, the shape parameter and the size parameter of the 5 AGGD models as the characteristics of the monocular diagram, and extracting 20 characteristics; fitting the gradient amplitude of the monocular diagram by using the Weber distribution, and extracting the shape parameters and the size parameters of the Weber distribution as 2 characteristics of the monocular diagram; furthermore, the monocular image was DCT-transformed, and 5 features of block, sharpness, smoothness, kurtosis, and Jensen Shannon Divergence (JSD) in the DCT domain were extracted.

According to the steps, 27-dimensional primary feature vectors are extracted from the monocular image

The fourth step: extracting primary feature vectors of binocular fusion map

Performing combined adaptive normalization (JAN) on a gradient magnitude Graph (GM) and a Laplace transform graph (LOG) of the binocular fusion graph, respectively quantizing the normalized gradient magnitude graph and the Laplace transform graph into an M level and an N level, and calculating edge probability densities (P) of the two types of graphs_GM、P_LOG) And independent distribution (Q)_GM、Q_LOG). Here, M is 5, so 20 features are extracted. The specific calculation method is as follows:

(1) GM graph of binocular fusion graph (G)_I) And LOG transformation graph (L)_I) The joint normalization calculation method comprises the following steps:

(2) calculating the local adaptive normalization factor of each (i, j) pixel by the following method:

wherein omega_i,jIs the local center window of the (i, j) pixel, and ω (l, k) is the weight, satisfying

For G_I、L_INormalization is carried out

C ═ 0.2 is a constant for ensuring stability.

(3) Will be provided with

Quantization is M level and N level. Calculating the edge probability density P of GM and LOG_GMAnd P_LOG：

Wherein, K_m,n＝P(G＝gm,L＝l_n) Where M is 1,2, …, M, N is 1,2, …, N, where M is 5, 10 features are extracted.

(4) Calculating independent distribution function Q of GM and LOG_GMAnd Q_LOG：

Where M is 1,2, …, M, N is 1,2, …, N, M is 5, so 10 features are extracted.

In addition, wavelet decomposition is carried out on the binocular fusion map in 2-level sizes (alpha belongs to {1, 2}), 6 directions (theta belongs to {0 degrees, 30 degrees, 60 degrees, 90 degrees, 120 degrees, 150 degrees) } by utilizing the controllable pyramid; respectively fitting wavelet coefficients in 6 directions (each direction has 2-level size) by using 6 Generalized Gaussian Distributions (GGDs), and taking shape parameters of the GGDs as the characteristics of a binocular fusion diagram to obtain 6 characteristics; and fitting the wavelet coefficients of all sizes and all directions by using a GGD, and extracting shape parameters of the GGD to serve as 1 characteristic of the binocular fusion image.

According to the steps, 27-dimensional feature vectors are extracted from the binocular fusion map

The fifth step: the method for extracting the features of the binocular difference image is completely the same as that of the binocular fusion image, and 27-dimensional primary feature vectors are extracted from the binocular difference image by referring to the fourth step

And a sixth step: training stack type automatic encoder

Randomly select left and right views of 50% in the image library to train 3 SAEs. Will be viewed from left to right

And

as samples, a stacked automatic encoder (SAE-C) of a single-eye diagram, a stacked automatic encoder (SAE-S) of a double-eye fusion diagram and a stacked automatic encoder (SAE-D) of a double-eye difference diagram are trained respectively, the three SAE training methods are the same, and the procedures are as follows:

(1) the feature vector x is input into a stacked self-encoder (SAE) as input layer data, and is encoded by a first layer self-encoder in SAE (the encoding function is

) Mapping x to the first hidden layer h₁Then using a decoder (decoding function is

) And h₁Reconstructing the input layer to obtain a reconstructed vector

Training the first hidden layer h of an auto-encoder by minimizing the reconstruction error₁When the reconstruction error tends to be stable, the first layer coding function is obtained

And h₁Removing the reconstruction layer;

(2) h is to be₁Repeating the step (1) as the input of the second layer automatic encoder to obtain the second layer encoding function

And a second hidden layer h₂(ii) a After the training layer by layer, the trained stack type self-encoder with 3 hidden layers is finally obtained.

The seventh step: the trained SAE-C, SAE-S and SAE-D will

And

encoding deep features as abstractions, respectively

And

wherein the number of units of each layer of SAE-C, SAE-S and SAE-D is 27-25-20-15.

Eighth step: randomly selecting 80% of left and right views in the image library as a training set, and utilizing the left and right views in the training set

Training with corresponding subjective evaluation values (MOS)

A corresponding support vector regression machine (SVR-C); use in the same way

With corresponding MOS training

Corresponding support vector regression machine (SVR-SD).

The ninth step: and (3) taking left and right views of the left and right 20 percent of the images in the image library as a test set, and predicting local mass fractions of the monocular image, the binocular fusion image and the binocular difference image by utilizing SVR-C, SVR-SD (singular value decomposition-singular

The stereo image quality score Q is then formed by weighting⁽ⁱ⁾The weight is W, and the calculation formula is as follows:

Claims

1. a three-dimensional image quality evaluation method based on a stack type automatic encoder comprises the following steps:

the third step: extracting primary feature vectors of a monocular diagram

(1) Calculating a monocular diagram I_CMSCN coefficient of

(3) fitting the MSCN coefficient histogram of the monocular diagram and MSCN neighborhood coefficient histograms of the monocular diagram in four directions of horizontal, vertical, main diagonal and secondary diagonal by using an asymmetric generalized Gaussian distribution model (AGGD); taking the mean value, the variance, the shape parameter and the size parameter of the 5 AGGD models as the characteristics of the monocular diagram, and extracting 20 characteristics; fitting the gradient amplitude of the monocular diagram by using the Weber distribution, and extracting the shape parameters and the size parameters of the Weber distribution as 2 characteristics of the monocular diagram; furthermore, a monocular image is subjected to DCT transformation, 5 features of block, sharpness, smoothness, kurtosis and Jansen Shannon divergence JSD in a DCT domain are extracted, and a 27-dimensional primary feature vector is extracted from the monocular image

The fourth step: extracting primary feature vectors of binocular fusion map

(1) Performing joint adaptive normalization JAN on a gradient magnitude graph GM and a Laplace transform graph LOG of a binocular fusion graph, quantizing the normalized gradient magnitude graph and the Laplace transform graph into an M level and an N level respectively, and calculating the marginal probability density P of the two quantized graphs_GM、P_LOGAnd independently distributed Q_GM、Q_LOGWhere M is equal to N is equal to 5, soExtracting 20 features;

(2) performing wavelet decomposition on the binocular fusion map in 6 directions of 2-level size, theta, epsilon, 0 degrees, 30 degrees, 60 degrees, 90 degrees, 120 degrees and 150 degrees; respectively fitting 6 directions by using 6 generalized Gaussian distribution GGDs (generalized Gaussian distribution) with wavelet coefficients of 2-level sizes in each direction, and taking shape parameters of the GGDs as the characteristics of a binocular fusion graph to obtain 6 characteristics; fitting wavelet coefficients of all sizes and all directions by using a GGD, and extracting shape parameters of the GGD to serve as 1 characteristic of a binocular fusion graph; extracting 27-dimensional feature vector from common binocular fusion image