CN110351548B - Stereo image quality evaluation method guided by deep learning and disparity map weighting - Google Patents

Stereo image quality evaluation method guided by deep learning and disparity map weighting Download PDF

Info

Publication number
CN110351548B
CN110351548B CN201910568557.8A CN201910568557A CN110351548B CN 110351548 B CN110351548 B CN 110351548B CN 201910568557 A CN201910568557 A CN 201910568557A CN 110351548 B CN110351548 B CN 110351548B
Authority
CN
China
Prior art keywords
image
branch
features
module
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910568557.8A
Other languages
Chinese (zh)
Other versions
CN110351548A (en
Inventor
李素梅
韩永甜
丁义修
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910568557.8A priority Critical patent/CN110351548B/en
Publication of CN110351548A publication Critical patent/CN110351548A/en
Application granted granted Critical
Publication of CN110351548B publication Critical patent/CN110351548B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis

Abstract

The invention discloses a stereo image quality evaluation method based on deep learning and disparity map weighting guidance, which comprises the following steps: s1, constructing a double-branch neural network through independent left and right viewpoint images in the stereo image, wherein the double-branch neural network comprises a fusion image branch and a parallax image branch; s2, extracting image characteristic information from the fused image branch and the parallax image branch respectively; s3, carrying out weighted calculation on the image features in the parallax image branch and the fusion image branch by introducing an SE module for the first time, and further completing the steps of correcting the image features in the fusion image branch and the like; the method can more accurately predict the quality and improve the efficiency of the quality evaluation work of the stereo image.

Description

Stereo image quality evaluation method guided by deep learning and disparity map weighting
Technical Field
The invention belongs to the field of image processing, and relates to application of deep learning in stereo image quality evaluation; in particular to a method for evaluating the quality of a stereo image guided by deep learning and disparity map weighting
Background
In recent years, with the development of 3D technology, attention has been paid to the study of stereoscopic images. Because the stereo image may generate certain distortion in the transmission process, the quality of the stereo image will be affected, and the result is directly reflected to the visual perception of people on the stereo image. Therefore, how to effectively evaluate the quality of a stereoscopic image has become one of the key issues in the fields of stereoscopic image processing and computer vision. Based on the current situation, the invention provides a stereo image quality evaluation model based on deep learning and disparity map weighting guidance.
Currently existing stereo image quality evaluation algorithms can be classified into three types according to the degree of dependence on a reference image: full reference, half reference, and no reference. The evaluation algorithm of the full reference mode performs quality prediction on the distorted image by using structural similarity or other indexes between the reference image and the distorted image, and the evaluation algorithm of the half reference mode does not need to know complete information of the pixel level of the reference image and has low dependence degree on the reference image. When the quality evaluation algorithm without reference carries out image quality score prediction, the final score prediction can be obtained without acquiring the information of a reference image. In practical applications, the acquisition of a distortion-free reference image is generally difficult, and therefore, the research on a reference-free stereo image quality evaluation algorithm is more concerned.
Generally, the method for reference-free stereo image quality evaluation can be divided into three categories: a feature extraction method [1-2], a sparse representation method [3-4] and a deep learning method [5-8 ]. The feature extraction-based method usually adopts a traditional mode to extract certain statistical features from the stereo image, and then predicts the quality score by using a machine learning algorithm. The sparse representation-based method generally adopts a dictionary building method to perform sparse representation on the statistical characteristics, and the method has certain advantages in the aspect of computational complexity. The two methods are based on human designed algorithm to extract the characteristics of the stereo image, but because the understanding of human visual system or natural statistical characteristics is not sufficient, the algorithm application is limited to a certain extent. By means of rapid development of artificial intelligence, in recent years, methods based on deep learning are successively appeared in the field of stereo image quality evaluation, and because the features of stereo images are extracted by the deep learning-based methods through a neural network instead of a traditional method, the limitation of artificial feature extraction is eliminated, and generally better performance can be exhibited.
The design inspiration of the invention is based on the binocular vision mechanism of human, namely binocular fusion and binocular competition mechanism in the brain, the correlation of the fused image is higher than that of independent left and right viewpoint images and the binocular vision mechanism, so that the fused image is selected as the input of one branch of the network. When the left and right viewpoint images are fused, some information is correspondingly lost, so that the disparity map is selected to compensate the fused image, namely the disparity map is used as the input of another network branch. In addition, since the features extracted from the fused image by the convolutional neural network have different degrees of importance, and it is necessary to weight the extracted features by using different weights, we choose to apply an improved compression and excitation module (SE module) to improve the representation capability of the network, where a disparity map is used as an input of the SE module to guide and weight the feature map obtained by branching from the fused image, thereby implementing the re-correction of the feature map of the fused image. Since the fused image branch and the disparity map branch both contribute to image quality prediction to a certain extent, the two branches are connected finally, and a final prediction score is obtained.
The invention provides a stereo image quality evaluation model based on deep learning and disparity map weighting guidance. Firstly, aiming at the characteristics of a human when watching a stereo image, fusing independent binocular viewpoint images to obtain a fused image, applying a parallax matching algorithm to obtain a parallax image, respectively taking the fused image and the parallax image as the input of two branches of a neural network, and performing feature learning through a convolutional neural network. Secondly, based on the fact that the features of the fused image have different degrees of importance, the feature map of the fused image is re-corrected by using the features extracted from the disparity map as the input of the improved SE module.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to establish an effective and reasonable stereo image quality evaluation model based on deep learning and disparity map weighting guidance based on a human binocular vision mechanism as a design basis and based on the fact that features extracted by a neural network have different importance degrees. The three-dimensional image quality evaluation model is more accurate in quality prediction, does not need to depend on an original reference image, can replace a subjective evaluation result to a certain extent, improves the efficiency of three-dimensional image quality evaluation work, and can lay a foundation for subsequent work.
Aiming at the problems in the prior art, the invention adopts the following technical scheme:
a stereo image quality evaluation method based on deep learning and disparity map weighting guidance comprises the following steps:
s1, constructing a double-branch neural network through independent left and right viewpoint images in the stereo image, wherein the double-branch neural network comprises a fusion image branch and a parallax image branch;
s2, extracting the image features of the fused image branch and the parallax image branch in a first stage respectively;
s3, carrying out weighted calculation on the image features in the parallax image branch and the fusion image branch by introducing an SE module for the first time, and further completing the correction of the image features in the fusion image branch;
s4, further extracting the features extracted in the first stage of the parallax image branch and the corrected branch features of the fusion image, namely completing the feature extraction in the second stage;
s5, performing weighted calculation on the image feature information extracted in the second stage in the parallax image branch and the features extracted in the corrected fusion image branch by introducing an SE module for the second time, and finishing the correction in the second stage;
and S6, connecting the characteristics finally extracted by the two branches to further finish the quality evaluation of the stereo image.
The weighted correction of the fused image feature map in the steps S3 and S5 is realized by a modified SE module; the correction is based on the structure of an original SE module, and a new input is introduced, namely the feature graph of the parallax image branch is used as an additional input of the corrected SE module to correct the weight learning of the feature graph of the fused image branch.
Advantageous effects
The double-row dense convolutional neural network with the improved SE module is designed based on a binocular vision mechanism, the fact that the features extracted by the convolutional neural network have different importance degrees is considered, the different features are weighted in an effective mode, and experimental results show that the method provided by the invention has excellent performance in the aspect of quality evaluation of stereo images.
The stereo image quality evaluation model based on the deep learning and the parallax image weighting guidance is tested on the public stereo image database, the quality score predicted value obtained in the test is very close to the standard subjective evaluation value, and the relevance and the stability are superior to those of most of the current stereo image quality evaluation algorithms.
Drawings
FIG. 1 the present invention uses the overall framework of a network;
FIG. 2 is a block diagram of the SE module of the present invention;
FIG. 3 is a block diagram of a 3-level dense module of the present invention.
Detailed Description
The invention has been experimented with in a public stereo image database (LIVE). A stereo image database (LIVE) database comprises two separated databases of phase I and phase II, stereo images are presented together by plane images of left and right viewpoints, and the size of the stereo images is 360 multiplied by 640. The phase I comprises 20 reference image pairs and 365 distorted image pairs, and the images are mainly symmetrically distorted, that is, the distortion degrees of the left and right viewpoint images are approximately equal. The phase II includes 8 reference image pairs and 360 distorted image pairs, wherein the reference image pairs and the distorted image pairs include both symmetric distortion and asymmetric distortion type images, and the distortion degree difference of the left and right viewpoint images of the asymmetrically distorted image is large. Five different distortion types are contained in the LIVE database: gaussian blur, Jp2k compressive distortion, jpeg compressive distortion, rayleigh fast fading, and additive white gaussian noise.
The method is described in detail below with reference to the technical method.
The invention provides a stereo image quality evaluation model based on deep learning and disparity map weighting guidance, which is based on a human binocular vision mechanism as a design basis, namely a binocular fusion and binocular competition mechanism exists in the brain perception of stereo images, and based on the fact that features extracted by a neural network have different importance degrees. Firstly, a fusion image and a disparity map are respectively obtained from independent left and right viewpoint images through a specific algorithm, and a double-row neural network basic framework is constructed. And then adding an improved SE module, namely performing weighting guidance on the features extracted by the fusion image branch network by using the features extracted by the disparity map branch network, so that the training of the fusion image branch network is more efficient. And finally, connecting the two branch networks to finish final prediction of the stereo image quality. The specific flow is shown in fig. 1.
The method comprises the following specific steps:
1. a double-row neural network architecture:
the double-row neural network architecture adopted by the invention takes the fusion image and the disparity map as the input of two branch networks respectively, and the fusion image and the disparity map are obtained by left and right viewpoint images from the same stereo image through a specific algorithm. The acquisition of the fusion image is based on a binocular fusion model, and the characteristics of binocular competition, binocular fusion and visual multi-channel are met. The acquisition of the disparity map is obtained based on a stereo matching algorithm. In addition, when a network architecture is built, the basic idea adopts three layers of dense connection modules, so that the backward propagation capacity of the features can be enhanced, and the reuse of the features can be promoted. As shown in fig. 1, each of the two branch networks includes two convolution modules and two three-layer dense connection modules, where one convolution module includes a block normalization layer (BN), a convolution layer, a ReLU activation function, and a pooling layer, and one three-layer dense connection module includes two convolution layers. The first convolution module and the first three-layer dense connection module of the two branches realize the feature extraction of the first stage of the fused image and the parallax image, and the second convolution module and the second three-layer dense connection module realize the feature extraction of the second stage of the fused image and the parallax image.
2. And (3) the parallax map feature re-corrects the feature map of the fused image:
the SE module is chosen to weight different features of the image in consideration of the fact that the features extracted by the neural network have different degrees of importance. In the invention, the SE module is introduced twice, wherein the first time is after the fused image and the parallax image complete the feature extraction of the first stage, and the second time is after the two branch networks complete the feature extraction of the second stage. The original SE module structure is shown in fig. 2(a), instead of using the fused image feature map to correct the SE module itself, we improve the original SE module, and the specific structure is shown in fig. 2(b), that is, using features extracted by the disparity map branch network as one input of the SE module, the disparity feature map compresses the length and width to 1 × 1 size through global pooling, and then connects two fully-connected layers, the first fully-connected layer performs dimension reduction on the dimension of the channel, the second fully-connected layer performs dimension reduction on the dimension of the channel, and a ReLU activation function is used between the two fully-connected layers to perform nonlinear mapping. The complex correlation among the disparity map feature channels is captured by using the form, finally, the weight with the value range of (0,1) is obtained through a Sigmoid function, and weighting guidance is carried out on the fused image feature of the other branch, so that feature re-correction is realized. And guiding and weighting the feature map of the fused image, thereby completing the re-correction of the feature map. The operation in the blue dotted box is called SE channel, and the SE channel includes a global pooling operation, which is expressed by formula (1), a full-link layer with a reduction factor r, a ReLU unit and a full-link layer with an amplification factor r. Finally, an sigmoid function is used on the feature map of the fused image to generate weights between 0 and 1.
Figure GDA0002636676620000041
Where H × W is the size of the feature map, and f (x, y) is the value at the coordinates (x, y) in the feature map.
3. Final prediction of stereo image scores:
the fused image branch network and the disparity map branch network respectively learn the characteristics of the stereo image and have certain contribution to quality prediction. The disparity map branch network provides certain compensation for the fusion image branch network, and the combination of the disparity map branch network and the fusion image branch network provides higher reliability for the prediction of the quality fraction. Therefore, at the end of the neural network, the fused image branch network and the disparity map branch network are connected in a mode of connecting through a 'Concat' channel, and the compensation effect of the disparity map on the fused image is completed. The final prediction of the mass fraction is then performed using a fully-connected module, which is structurally similar to the convolutional module except for the fully-connected layer instead of the convolutional layer. We use the euclidean function as the loss function of the network, and the formula is shown below:
Figure GDA0002636676620000042
when the network is trained, the loss function is minimized through a back propagation algorithm, and the optimal network parameters can be trained.
4. Stereo image quality evaluation results and analysis
The experiments of the present invention were performed on a public stereo image database (LIVE). A stereo image database (LIVE) comprises two separated databases of phase I and phase II, and stereo images are presented together by plane images of left and right viewpoints, and the size of each stereo image is 360 multiplied by 640. The phase I comprises 20 reference image pairs and 365 distorted image pairs, and the images are mainly symmetrically distorted, that is, the distortion degrees of the left and right viewpoint images are approximately equal. The phase II includes 8 reference image pairs and 360 distorted image pairs, wherein the reference image pairs and the distorted image pairs include both symmetric distortion and asymmetric distortion type images, and the distortion degree difference of the left and right viewpoint images of the asymmetrically distorted image is large. The stereo image database (LIVE) contains five different distortion types: gaussian blur, Jp2k compressive distortion, jpeg compressive distortion, rayleigh fast fading, and additive white gaussian noise.
The method of the invention is experimentally verified in a stereo image database (LIVE), and Table 1 shows the experimental results of the invention, wherein the experimental results also comprise the experimental results of other 12 existing stereo quality evaluation algorithms with good performance.
TABLE 1 Performance on LIVE database
Figure GDA0002636676620000051
Table 2 lists the experimental results of three evaluation indexes under different distortion types, and it is obvious that the method provided by the inventor is excellent in phase I performance, and is still superior to partial algorithms although the method does not show the best performance on phase II, so that the inventor can adapt to stereo images of different distortion types and make accurate and efficient prediction on quality scores.
TABLE 2 representation of different distortion types on LIVE database
Figure GDA0002636676620000052
In order to further prove the superiority of the performance 2 of the method 3 proposed by the step 8, corresponding comparison experiments are carried out, and the results are shown in a table 3, wherein the first expression only applies a fused image branch network, a fused image feature graph is adjusted by the fused image feature graph, the second expression adds a disparity map branch network on the basis of the first expression, but the disparity map feature does not participate in guidance of the fused network feature graph and is only connected with the tail end of the network, and the third expression only participates in the re-correction work of the fused image branch network feature graph and is not combined with the fused image branch network. The experimental results given in table 3 show that the stereo image quality evaluation model based on the deep learning and disparity map weighting guidance provided by the invention realizes superior performance.
TABLE 3 comparative experimental results
Figure GDA0002636676620000061
It should be noted that, for those skilled in the art, without departing from the spirit of the present invention, several variations and modifications can be made, which are within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.

Claims (1)

1. A stereoscopic image quality evaluation method based on deep learning and disparity map weighting guidance is characterized by comprising the following steps:
s1, constructing a double-branch neural network through independent left and right viewpoint images in the stereo image, wherein the double-branch neural network comprises a fusion image branch and a parallax image branch;
s2, extracting the image features of the fused image branch and the parallax image branch in a first stage respectively;
s3, carrying out weighted calculation on the image features in the parallax image branch and the fusion image branch by introducing an SE module for the first time, and further completing the correction of the image features in the fusion image branch;
s4, further extracting the features extracted in the first stage of the parallax image branch and the corrected branch features of the fusion image, namely completing the feature extraction in the second stage;
s5, performing weighted calculation on the image feature information extracted in the second stage in the parallax image branch and the features extracted in the corrected fusion image branch by introducing an SE module for the second time, and finishing the correction in the second stage;
s6, connecting the characteristics finally extracted by the two branches to further finish the quality evaluation of the stereo image; wherein:
the weighted correction of the fused image feature map in the steps S3 and S5 is realized by a modified SE module; on the basis of the structure of an original SE module, the features extracted by the parallax image branches are used as an additional input of the module, the features extracted by the parallax image branches are converted into learnable weights, and the image features in the fused image branches are subjected to first-stage weighting correction by introducing the improved SE module for the first time.
CN201910568557.8A 2019-06-27 2019-06-27 Stereo image quality evaluation method guided by deep learning and disparity map weighting Active CN110351548B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910568557.8A CN110351548B (en) 2019-06-27 2019-06-27 Stereo image quality evaluation method guided by deep learning and disparity map weighting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910568557.8A CN110351548B (en) 2019-06-27 2019-06-27 Stereo image quality evaluation method guided by deep learning and disparity map weighting

Publications (2)

Publication Number Publication Date
CN110351548A CN110351548A (en) 2019-10-18
CN110351548B true CN110351548B (en) 2020-12-11

Family

ID=68176883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910568557.8A Active CN110351548B (en) 2019-06-27 2019-06-27 Stereo image quality evaluation method guided by deep learning and disparity map weighting

Country Status (1)

Country Link
CN (1) CN110351548B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110944165B (en) * 2019-11-13 2021-02-19 宁波大学 Stereoscopic image visual comfort level improving method combining perceived depth quality
JP2021196951A (en) * 2020-06-16 2021-12-27 キヤノン株式会社 Image processing apparatus, image processing method, program, method for manufacturing learned model, and image processing system
CN111667058A (en) * 2020-06-23 2020-09-15 新疆爱华盈通信息技术有限公司 Dynamic selection method of multi-scale characteristic channel of convolutional neural network
CN111950655B (en) * 2020-08-25 2022-06-14 福州大学 Image aesthetic quality evaluation method based on multi-domain knowledge driving

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150037668A (en) * 2013-09-30 2015-04-08 시스벨 테크놀로지 에스.알.엘. Method and device for edge shape enforcement for visual enhancement of depth image based rendering of a three-dimensional video stream
CN109345502A (en) * 2018-08-06 2019-02-15 浙江大学 A kind of stereo image quality evaluation method based on disparity map stereochemical structure information extraction

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5627498B2 (en) * 2010-07-08 2014-11-19 株式会社東芝 Stereo image generating apparatus and method
CN109714592A (en) * 2019-01-31 2019-05-03 天津大学 Stereo image quality evaluation method based on binocular fusion network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150037668A (en) * 2013-09-30 2015-04-08 시스벨 테크놀로지 에스.알.엘. Method and device for edge shape enforcement for visual enhancement of depth image based rendering of a three-dimensional video stream
CN109345502A (en) * 2018-08-06 2019-02-15 浙江大学 A kind of stereo image quality evaluation method based on disparity map stereochemical structure information extraction

Also Published As

Publication number Publication date
CN110351548A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN110351548B (en) Stereo image quality evaluation method guided by deep learning and disparity map weighting
Zhang et al. Learning structure of stereoscopic image for no-reference quality assessment with convolutional neural network
CN108765296B (en) Image super-resolution reconstruction method based on recursive residual attention network
CN110111304B (en) No-reference stereoscopic image quality evaluation method based on local-global feature regression
CN111709304B (en) Behavior recognition method based on space-time attention-enhancing feature fusion network
CN110060236B (en) Stereoscopic image quality evaluation method based on depth convolution neural network
CN111402311B (en) Knowledge distillation-based lightweight stereo parallax estimation method
CN112967178B (en) Image conversion method, device, equipment and storage medium
CN112084934B (en) Behavior recognition method based on bone data double-channel depth separable convolution
CN109523513A (en) Based on the sparse stereo image quality evaluation method for rebuilding color fusion image
CN112507920B (en) Examination abnormal behavior identification method based on time displacement and attention mechanism
CN110570406A (en) local-to-global feature regression non-reference stereo image quality evaluation method
CN110070574A (en) A kind of binocular vision Stereo Matching Algorithm based on improvement PSMNet
CN111464814A (en) Virtual reference frame generation method based on parallax guide fusion
CN111882516B (en) Image quality evaluation method based on visual saliency and deep neural network
CN112017116B (en) Image super-resolution reconstruction network based on asymmetric convolution and construction method thereof
CN107169413B (en) Facial expression recognition method based on feature block weighting
CN115601282A (en) Infrared and visible light image fusion method based on multi-discriminator generation countermeasure network
CN115546162A (en) Virtual reality image quality evaluation method and system
CN114743162A (en) Cross-modal pedestrian re-identification method based on generation of countermeasure network
CN113689382A (en) Tumor postoperative life prediction method and system based on medical images and pathological images
CN105488792A (en) No-reference stereo image quality evaluation method based on dictionary learning and machine learning
CN116823647A (en) Image complement method based on fast Fourier transform and selective attention mechanism
Li et al. No-reference stereoscopic image quality assessment based on local to global feature regression
CN116189292A (en) Video action recognition method based on double-flow network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant