CN110351548B

CN110351548B - Stereo image quality evaluation method guided by deep learning and disparity map weighting

Info

Publication number: CN110351548B
Application number: CN201910568557.8A
Authority: CN
Inventors: 李素梅; 韩永甜; 丁义修
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2020-12-11
Anticipated expiration: 2039-06-27
Also published as: CN110351548A

Abstract

The invention discloses a stereo image quality evaluation method based on deep learning and disparity map weighting guidance, which comprises the following steps: s1, constructing a double-branch neural network through independent left and right viewpoint images in the stereo image, wherein the double-branch neural network comprises a fusion image branch and a parallax image branch; s2, extracting image characteristic information from the fused image branch and the parallax image branch respectively; s3, carrying out weighted calculation on the image features in the parallax image branch and the fusion image branch by introducing an SE module for the first time, and further completing the steps of correcting the image features in the fusion image branch and the like; the method can more accurately predict the quality and improve the efficiency of the quality evaluation work of the stereo image.

Description

Stereo image quality evaluation method guided by deep learning and disparity map weighting

Technical Field

The invention belongs to the field of image processing, and relates to application of deep learning in stereo image quality evaluation; in particular to a method for evaluating the quality of a stereo image guided by deep learning and disparity map weighting

Background

In recent years, with the development of 3D technology, attention has been paid to the study of stereoscopic images. Because the stereo image may generate certain distortion in the transmission process, the quality of the stereo image will be affected, and the result is directly reflected to the visual perception of people on the stereo image. Therefore, how to effectively evaluate the quality of a stereoscopic image has become one of the key issues in the fields of stereoscopic image processing and computer vision. Based on the current situation, the invention provides a stereo image quality evaluation model based on deep learning and disparity map weighting guidance.

Currently existing stereo image quality evaluation algorithms can be classified into three types according to the degree of dependence on a reference image: full reference, half reference, and no reference. The evaluation algorithm of the full reference mode performs quality prediction on the distorted image by using structural similarity or other indexes between the reference image and the distorted image, and the evaluation algorithm of the half reference mode does not need to know complete information of the pixel level of the reference image and has low dependence degree on the reference image. When the quality evaluation algorithm without reference carries out image quality score prediction, the final score prediction can be obtained without acquiring the information of a reference image. In practical applications, the acquisition of a distortion-free reference image is generally difficult, and therefore, the research on a reference-free stereo image quality evaluation algorithm is more concerned.

Generally, the method for reference-free stereo image quality evaluation can be divided into three categories: a feature extraction method [1-2], a sparse representation method [3-4] and a deep learning method [5-8 ]. The feature extraction-based method usually adopts a traditional mode to extract certain statistical features from the stereo image, and then predicts the quality score by using a machine learning algorithm. The sparse representation-based method generally adopts a dictionary building method to perform sparse representation on the statistical characteristics, and the method has certain advantages in the aspect of computational complexity. The two methods are based on human designed algorithm to extract the characteristics of the stereo image, but because the understanding of human visual system or natural statistical characteristics is not sufficient, the algorithm application is limited to a certain extent. By means of rapid development of artificial intelligence, in recent years, methods based on deep learning are successively appeared in the field of stereo image quality evaluation, and because the features of stereo images are extracted by the deep learning-based methods through a neural network instead of a traditional method, the limitation of artificial feature extraction is eliminated, and generally better performance can be exhibited.

The design inspiration of the invention is based on the binocular vision mechanism of human, namely binocular fusion and binocular competition mechanism in the brain, the correlation of the fused image is higher than that of independent left and right viewpoint images and the binocular vision mechanism, so that the fused image is selected as the input of one branch of the network. When the left and right viewpoint images are fused, some information is correspondingly lost, so that the disparity map is selected to compensate the fused image, namely the disparity map is used as the input of another network branch. In addition, since the features extracted from the fused image by the convolutional neural network have different degrees of importance, and it is necessary to weight the extracted features by using different weights, we choose to apply an improved compression and excitation module (SE module) to improve the representation capability of the network, where a disparity map is used as an input of the SE module to guide and weight the feature map obtained by branching from the fused image, thereby implementing the re-correction of the feature map of the fused image. Since the fused image branch and the disparity map branch both contribute to image quality prediction to a certain extent, the two branches are connected finally, and a final prediction score is obtained.

The invention provides a stereo image quality evaluation model based on deep learning and disparity map weighting guidance. Firstly, aiming at the characteristics of a human when watching a stereo image, fusing independent binocular viewpoint images to obtain a fused image, applying a parallax matching algorithm to obtain a parallax image, respectively taking the fused image and the parallax image as the input of two branches of a neural network, and performing feature learning through a convolutional neural network. Secondly, based on the fact that the features of the fused image have different degrees of importance, the feature map of the fused image is re-corrected by using the features extracted from the disparity map as the input of the improved SE module.

Disclosure of Invention

In order to solve the problems in the prior art, the invention aims to establish an effective and reasonable stereo image quality evaluation model based on deep learning and disparity map weighting guidance based on a human binocular vision mechanism as a design basis and based on the fact that features extracted by a neural network have different importance degrees. The three-dimensional image quality evaluation model is more accurate in quality prediction, does not need to depend on an original reference image, can replace a subjective evaluation result to a certain extent, improves the efficiency of three-dimensional image quality evaluation work, and can lay a foundation for subsequent work.

Aiming at the problems in the prior art, the invention adopts the following technical scheme:

a stereo image quality evaluation method based on deep learning and disparity map weighting guidance comprises the following steps:

s1, constructing a double-branch neural network through independent left and right viewpoint images in the stereo image, wherein the double-branch neural network comprises a fusion image branch and a parallax image branch;

s2, extracting the image features of the fused image branch and the parallax image branch in a first stage respectively;

s3, carrying out weighted calculation on the image features in the parallax image branch and the fusion image branch by introducing an SE module for the first time, and further completing the correction of the image features in the fusion image branch;

s4, further extracting the features extracted in the first stage of the parallax image branch and the corrected branch features of the fusion image, namely completing the feature extraction in the second stage;

s5, performing weighted calculation on the image feature information extracted in the second stage in the parallax image branch and the features extracted in the corrected fusion image branch by introducing an SE module for the second time, and finishing the correction in the second stage;

and S6, connecting the characteristics finally extracted by the two branches to further finish the quality evaluation of the stereo image.

The weighted correction of the fused image feature map in the steps S3 and S5 is realized by a modified SE module; the correction is based on the structure of an original SE module, and a new input is introduced, namely the feature graph of the parallax image branch is used as an additional input of the corrected SE module to correct the weight learning of the feature graph of the fused image branch.

Advantageous effects

The double-row dense convolutional neural network with the improved SE module is designed based on a binocular vision mechanism, the fact that the features extracted by the convolutional neural network have different importance degrees is considered, the different features are weighted in an effective mode, and experimental results show that the method provided by the invention has excellent performance in the aspect of quality evaluation of stereo images.

The stereo image quality evaluation model based on the deep learning and the parallax image weighting guidance is tested on the public stereo image database, the quality score predicted value obtained in the test is very close to the standard subjective evaluation value, and the relevance and the stability are superior to those of most of the current stereo image quality evaluation algorithms.

Drawings

FIG. 1 the present invention uses the overall framework of a network;

FIG. 2 is a block diagram of the SE module of the present invention;

FIG. 3 is a block diagram of a 3-level dense module of the present invention.

Detailed Description

The invention has been experimented with in a public stereo image database (LIVE). A stereo image database (LIVE) database comprises two separated databases of phase I and phase II, stereo images are presented together by plane images of left and right viewpoints, and the size of the stereo images is 360 multiplied by 640. The phase I comprises 20 reference image pairs and 365 distorted image pairs, and the images are mainly symmetrically distorted, that is, the distortion degrees of the left and right viewpoint images are approximately equal. The phase II includes 8 reference image pairs and 360 distorted image pairs, wherein the reference image pairs and the distorted image pairs include both symmetric distortion and asymmetric distortion type images, and the distortion degree difference of the left and right viewpoint images of the asymmetrically distorted image is large. Five different distortion types are contained in the LIVE database: gaussian blur, Jp2k compressive distortion, jpeg compressive distortion, rayleigh fast fading, and additive white gaussian noise.

The method is described in detail below with reference to the technical method.

The invention provides a stereo image quality evaluation model based on deep learning and disparity map weighting guidance, which is based on a human binocular vision mechanism as a design basis, namely a binocular fusion and binocular competition mechanism exists in the brain perception of stereo images, and based on the fact that features extracted by a neural network have different importance degrees. Firstly, a fusion image and a disparity map are respectively obtained from independent left and right viewpoint images through a specific algorithm, and a double-row neural network basic framework is constructed. And then adding an improved SE module, namely performing weighting guidance on the features extracted by the fusion image branch network by using the features extracted by the disparity map branch network, so that the training of the fusion image branch network is more efficient. And finally, connecting the two branch networks to finish final prediction of the stereo image quality. The specific flow is shown in fig. 1.

The method comprises the following specific steps:

1. a double-row neural network architecture:

the double-row neural network architecture adopted by the invention takes the fusion image and the disparity map as the input of two branch networks respectively, and the fusion image and the disparity map are obtained by left and right viewpoint images from the same stereo image through a specific algorithm. The acquisition of the fusion image is based on a binocular fusion model, and the characteristics of binocular competition, binocular fusion and visual multi-channel are met. The acquisition of the disparity map is obtained based on a stereo matching algorithm. In addition, when a network architecture is built, the basic idea adopts three layers of dense connection modules, so that the backward propagation capacity of the features can be enhanced, and the reuse of the features can be promoted. As shown in fig. 1, each of the two branch networks includes two convolution modules and two three-layer dense connection modules, where one convolution module includes a block normalization layer (BN), a convolution layer, a ReLU activation function, and a pooling layer, and one three-layer dense connection module includes two convolution layers. The first convolution module and the first three-layer dense connection module of the two branches realize the feature extraction of the first stage of the fused image and the parallax image, and the second convolution module and the second three-layer dense connection module realize the feature extraction of the second stage of the fused image and the parallax image.

2. And (3) the parallax map feature re-corrects the feature map of the fused image:

the SE module is chosen to weight different features of the image in consideration of the fact that the features extracted by the neural network have different degrees of importance. In the invention, the SE module is introduced twice, wherein the first time is after the fused image and the parallax image complete the feature extraction of the first stage, and the second time is after the two branch networks complete the feature extraction of the second stage. The original SE module structure is shown in fig. 2(a), instead of using the fused image feature map to correct the SE module itself, we improve the original SE module, and the specific structure is shown in fig. 2(b), that is, using features extracted by the disparity map branch network as one input of the SE module, the disparity feature map compresses the length and width to 1 × 1 size through global pooling, and then connects two fully-connected layers, the first fully-connected layer performs dimension reduction on the dimension of the channel, the second fully-connected layer performs dimension reduction on the dimension of the channel, and a ReLU activation function is used between the two fully-connected layers to perform nonlinear mapping. The complex correlation among the disparity map feature channels is captured by using the form, finally, the weight with the value range of (0,1) is obtained through a Sigmoid function, and weighting guidance is carried out on the fused image feature of the other branch, so that feature re-correction is realized. And guiding and weighting the feature map of the fused image, thereby completing the re-correction of the feature map. The operation in the blue dotted box is called SE channel, and the SE channel includes a global pooling operation, which is expressed by formula (1), a full-link layer with a reduction factor r, a ReLU unit and a full-link layer with an amplification factor r. Finally, an sigmoid function is used on the feature map of the fused image to generate weights between 0 and 1.

Where H × W is the size of the feature map, and f (x, y) is the value at the coordinates (x, y) in the feature map.

3. Final prediction of stereo image scores:

the fused image branch network and the disparity map branch network respectively learn the characteristics of the stereo image and have certain contribution to quality prediction. The disparity map branch network provides certain compensation for the fusion image branch network, and the combination of the disparity map branch network and the fusion image branch network provides higher reliability for the prediction of the quality fraction. Therefore, at the end of the neural network, the fused image branch network and the disparity map branch network are connected in a mode of connecting through a 'Concat' channel, and the compensation effect of the disparity map on the fused image is completed. The final prediction of the mass fraction is then performed using a fully-connected module, which is structurally similar to the convolutional module except for the fully-connected layer instead of the convolutional layer. We use the euclidean function as the loss function of the network, and the formula is shown below:

when the network is trained, the loss function is minimized through a back propagation algorithm, and the optimal network parameters can be trained.

4. Stereo image quality evaluation results and analysis

The experiments of the present invention were performed on a public stereo image database (LIVE). A stereo image database (LIVE) comprises two separated databases of phase I and phase II, and stereo images are presented together by plane images of left and right viewpoints, and the size of each stereo image is 360 multiplied by 640. The phase I comprises 20 reference image pairs and 365 distorted image pairs, and the images are mainly symmetrically distorted, that is, the distortion degrees of the left and right viewpoint images are approximately equal. The phase II includes 8 reference image pairs and 360 distorted image pairs, wherein the reference image pairs and the distorted image pairs include both symmetric distortion and asymmetric distortion type images, and the distortion degree difference of the left and right viewpoint images of the asymmetrically distorted image is large. The stereo image database (LIVE) contains five different distortion types: gaussian blur, Jp2k compressive distortion, jpeg compressive distortion, rayleigh fast fading, and additive white gaussian noise.

The method of the invention is experimentally verified in a stereo image database (LIVE), and Table 1 shows the experimental results of the invention, wherein the experimental results also comprise the experimental results of other 12 existing stereo quality evaluation algorithms with good performance.

TABLE 1 Performance on LIVE database

Table 2 lists the experimental results of three evaluation indexes under different distortion types, and it is obvious that the method provided by the inventor is excellent in phase I performance, and is still superior to partial algorithms although the method does not show the best performance on phase II, so that the inventor can adapt to stereo images of different distortion types and make accurate and efficient prediction on quality scores.

TABLE 2 representation of different distortion types on LIVE database

In order to further prove the superiority of the performance 2 of the method 3 proposed by the step 8, corresponding comparison experiments are carried out, and the results are shown in a table 3, wherein the first expression only applies a fused image branch network, a fused image feature graph is adjusted by the fused image feature graph, the second expression adds a disparity map branch network on the basis of the first expression, but the disparity map feature does not participate in guidance of the fused network feature graph and is only connected with the tail end of the network, and the third expression only participates in the re-correction work of the fused image branch network feature graph and is not combined with the fused image branch network. The experimental results given in table 3 show that the stereo image quality evaluation model based on the deep learning and disparity map weighting guidance provided by the invention realizes superior performance.

TABLE 3 comparative experimental results

It should be noted that, for those skilled in the art, without departing from the spirit of the present invention, several variations and modifications can be made, which are within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.

Claims

1. A stereoscopic image quality evaluation method based on deep learning and disparity map weighting guidance is characterized by comprising the following steps:

s6, connecting the characteristics finally extracted by the two branches to further finish the quality evaluation of the stereo image; wherein:

the weighted correction of the fused image feature map in the steps S3 and S5 is realized by a modified SE module; on the basis of the structure of an original SE module, the features extracted by the parallax image branches are used as an additional input of the module, the features extracted by the parallax image branches are converted into learnable weights, and the image features in the fused image branches are subjected to first-stage weighting correction by introducing the improved SE module for the first time.