CN110636278A - Stereo image quality evaluation method based on sparse binocular fusion convolutional neural network - Google Patents
Stereo image quality evaluation method based on sparse binocular fusion convolutional neural network Download PDFInfo
- Publication number
- CN110636278A CN110636278A CN201910568580.7A CN201910568580A CN110636278A CN 110636278 A CN110636278 A CN 110636278A CN 201910568580 A CN201910568580 A CN 201910568580A CN 110636278 A CN110636278 A CN 110636278A
- Authority
- CN
- China
- Prior art keywords
- fusion
- layer
- branch
- convolution layer
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N17/00—Diagnosis, testing or measuring for television systems or their details
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a stereo image quality evaluation method based on a sparse binocular fusion convolutional neural network, which comprises the following steps of: s1, constructing a stereo image quality evaluation network based on a binocular fusion convolutional neural network, wherein the network comprises a left branch, a right branch and a fusion branch; s2, applying a structured sparse constraint on each layer of the binocular fusion convolutional neural network, wherein an objective function of network optimization is shown as a formula (1):the stereo image quality evaluation method is more accurate and efficient, more fits human eye perception quality, has higher operation speed, and is suitable for certain conditionsAnd promotes the development of the stereo imaging technology to a certain extent.
Description
Technical Field
The invention belongs to the field of image processing, relates to improvement and optimization of a stereo image quality evaluation method and optimization of the calculation speed of a stereo image quality evaluation convolutional neural network, and particularly relates to a stereo image quality evaluation method based on a sparse binocular fusion convolutional neural network.
Background
Since visual fatigue and dizziness are caused when a degraded stereoscopic image is viewed, stereoscopic image quality evaluation becomes an urgent issue [1 ]. The factors such as depth information, parallax information and binocular competition need to be considered for the stereo image quality evaluation, and the stereo image quality evaluation is more challenging compared with the planar image quality evaluation. Generally, the evaluation of the quality of a stereoscopic image can be divided into subjective evaluation and objective evaluation. However, the subjective evaluation method is laborious and time consuming, so that the objective quality evaluation of the stereo image becomes a hot problem for research [2 ].
In general, objective evaluation of stereo image quality can be classified into a conventional feature extraction-based method [3-4], a sparse representation-based method [5-9] and a deep learning-based method [10-13 ]. The sparse representation simulates the perception mechanism of the human visual system, and can represent most of the pixels in the image as zero, and redundant information is removed. Therefore, some people use a sparse representation-based approach to evaluate the quality of stereoscopic images. For example, document [5] sparsely represents the structure and texture features of the left and right views of a stereoscopic image, calculates the sparse feature similarity indexes of the left and right views, respectively, and combines them to obtain a final quality score. Document [6] joint sparse representation of DOG, HOG, and LBP features, and use support vector regression to obtain the quality score of a stereo image. In [7], Lin et al sparsely represent the fused image magnitude and phase maps and use support vector machine regression. M et al sparsely represent the fusion image contrast map and the fusion image phase map, and use support vector machine regression to obtain the stereo image quality score [8 ]. In document [9], Yang et al propose a non-reference stereo image quality evaluation method based on learning of color visual features of a gradient dictionary. And inputting the characteristics into the trained support vector machine model to predict the quality fraction. Since the deep learning network simulates the process of hierarchically processing images by the brain, in recent years, many people use a deep learning model to evaluate the quality of stereoscopic images. For example, document [10] extracts a natural scene statistical feature of a stereo image, and trains a DBN using the obtained feature to obtain a stereo image quality score. In document [11], Ding et al propose a reference-free stereo image quality evaluation method based on a Convolutional Neural Network (CNN). And (4) regressing the features and the parallax features extracted by the CNN network through a support vector machine to obtain the objective quality score of the stereo image. In document [12], LV et al propose a stereoscopic image quality evaluation method based on binocular self-similarity and Deep Neural Network (DNN). In document [13], Sang et al fuse left and right views of a stereoscopic image by Principal Component Analysis (PCA). And then, training a CNN network by adopting the fusion image to obtain a three-dimensional image quality score.
In the above document, key information of an image can be found based on a sparse representation method, but features need to be extracted manually. As in document [5], structural and textural features are extracted. In document [6], DOG, HOG and LBP features were extracted manually. The fused image amplitude and fused image phase characteristics are extracted in documents [7-8 ]. Document [9] extracts gradient features. The deep learning-based method can learn comprehensive characteristics through the network, so that the extracted characteristics are more comprehensive and appropriate. However, deep learning networks are generally high in computational complexity, and the requirement of the network on storage space is large. Since neural networks are highly non-convex, over-parameterization and random initialization are necessary means to overcome the negative effects of local minima in network training [14 ]. That is, deep learning networks have a high redundancy potential. Thus, some have compressed DNNs using sparse regularization. For example, in document [14], Liu et al propose a sparse convolutional neural network employing sparse decomposition. Such sparse convolutional neural networks can zero more than 90% of the parameters and reduce the accuracy of the data set at ILSVRC2012 by less than 1%. In document [15], Wen et al proposed a method of structured sparse learning (structured sparse SSL) to regularize DNN. And SSL can obtain a hardware-friendly and structured sparse DNN, thereby effectively accelerating the operation speed of the DNN. But few have applied SSL in deep learning networks for stereoscopic image quality evaluation. Under the inspiration of the document [15], a sparse binocular fusion convolutional neural network is provided to evaluate the quality of a stereo image, the CNN network is utilized to avoid manual feature extraction in a sparse representation method, SSL is applied to the convolutional neural network, the calculated amount of the network is reduced, and the operation speed of the network is accelerated.
How to process the relationship between the left and right viewpoints of the stereoscopic image is a key to the quality evaluation of the stereoscopic image, and the above documents can be roughly classified into two categories with respect to the processing modes of the left and right viewpoints. Documents [5-6] [10-12] first process the left and right viewpoints separately, and then fuse the features of the two viewpoints in view of binocular fusion and binocular competition mechanisms. Document 7-9] [13] first fuses the left and right viewpoints into a fused image, and then processes the fused image. In fact, in the human visual cortex, the fusion of left and right views is a long-term process, with fusion and processing occurring simultaneously, and the left and right views are layered and fused [16 ]. Therefore, the binocular fusion convolutional neural network is adopted, the two views are fused for four times through four concat, and long-term fusion of visual cortex and information processing are simulated.
Disclosure of Invention
In order to solve the problems of the prior art, the invention provides a sparse binocular fusion convolutional neural network for stereo image quality evaluation; and a convolutional neural network is adopted to evaluate the quality of the stereo image, so that manual feature extraction is avoided. The structured sparse regularization constraint convolutional neural network is adopted, so that the computation complexity of the network is reduced, the computation speed of the network is increased, and the network performance is improved; the binocular fusion and binocular competition mechanism of human eyes are considered, the long-term fusion process of visual cortex is simulated, the left and right two viewpoints of the stereo image are fused for four times through four concat, and information is processed at the same time; the stereoscopic image quality evaluation method is more accurate and efficient, more fits human eye perception quality, has higher operation speed, and promotes the development of the stereoscopic imaging technology to a certain extent.
The invention solves the problems in the prior art and is implemented by adopting the following technical scheme:
1. a stereo image quality evaluation method based on a sparse binocular fusion convolutional neural network is characterized by comprising the following steps:
s1, constructing a stereo image quality evaluation network based on a binocular fusion convolutional neural network, wherein the network comprises a left branch, a right branch and a fusion branch;
s2, applying a structured sparse constraint on each layer of the binocular fusion convolutional neural network, wherein the objective function of network optimization is shown as the formula (1):
wherein W represents all weights in the network; eD(W) is a loss function of the network; r (W) is an unstructured regularization constraint applied over all weights; rg(W(l)) The constraint is sparsely regularized for application to each layer.
2. The method for evaluating the quality of the stereoscopic image based on the sparse binocular fusion convolutional neural network as claimed in claim 1, wherein in the step S1, a left branch and a right branch are constructed through left and right views in the neural network;
2.1, dividing the left branch and the right branch into a first convolution layer and a first pooling layer, a second convolution layer and a second pooling layer, a third convolution layer and a fourth convolution layer;
2.2, inputting the first pooling layer after the first convolution layer in the left and right branches is subjected to structure sparsity constraint;
2.3, connecting the output end of the first pooling layer with the second convolution layer, and inputting the second pooling layer after performing structure sparsity constraint on the second convolution layer;
2.4, connecting the output end of the second pooling layer with the third convolution layer, and inputting the third convolution layer into the fourth convolution layer after performing structure sparsity constraint on the third convolution layer; and the output end of the fourth convolution layer is connected with the fusion branch for fusion processing.
3. The method for evaluating the quality of the stereoscopic image based on the sparse binocular fusion convolutional neural network as claimed in claim 1, wherein in the step S1, a fusion branch is constructed through left and right views in the neural network;
3.1, the fusion branch is divided into a first pooling layer and a first rolling layer, a second pooling layer and a second rolling layer, a third rolling layer, a fourth rolling layer and a third pooling layer and three full-connection layers, and four times of fusion operation is carried out;
3.2, performing a first fusion operation on the feature map from the first convolution layer after the sparsity constraint of the left and right branch structures through 'concat' operation, inputting the fused feature map into the first fusion branch pooling layer, then sending the fused feature map into the first fusion branch convolution layer for information processing, and simultaneously performing the structuralization sparsity constraint on the first fusion branch convolution layer;
3.3, performing second fusion operation on the feature graph of the second convolution layer after sparse constraint of the left and right branch structures and the feature graph of the first convolution layer after the first fusion of the fusion branch through 'concat' operation, inputting the fused feature graph into the second pooling layer of the fusion branch, then sending the fused feature graph into the second convolution layer of the fusion branch for information processing, and simultaneously performing structured sparse constraint on the second convolution layer of the fusion branch;
3.4, performing a third fusion operation on the feature graph of the second convolution layer after the feature graph from the third convolution layer after the sparsity constraint of the left and right branch structures and the feature graph of the second convolution layer after the second fusion of the fusion branch through a 'concat' operation, inputting the fused feature graph into the third convolution layer of the fusion branch for information processing, and simultaneously performing the structuralized sparsity constraint on the third convolution layer of the fusion branch;
3.5, performing fourth fusion operation on the feature graph of the fourth convolution layer after the sparse constraint of the left and right branch structures and the feature graph of the third convolution layer after the third fusion of the fusion branch through a 'concat' operation, inputting the fused feature graph into the fourth convolution layer of the fusion branch for information processing, and simultaneously performing the structured sparse constraint on the fourth convolution layer of the fusion branch; and sending the fused fourth convolution layer into a third pooling layer, and sending the output characteristic diagram into three full-connected layers to judge the quality of the stereo image.
Advantageous effects
The method adopts the structured sparse learning SSL to optimize the adopted convolutional neural network, so that the weight of the network is structured sparsely, the calculation complexity of the network is reduced, the operation speed of the network is increased, the evaluation performance of the network is improved, and the possibility is provided for real-time stereo image quality evaluation. Experimental results show that the network can achieve a computational speed increase of more than 2 x with improved performance. The method simulates the long-term binocular fusion process in the human brain by adopting four times of fusion in the convolutional neural network, and theoretically and experimentally shows that the model provided by the invention is suitable for symmetric and asymmetric distorted stereo images.
Drawings
Fig. 1 is a structural diagram of a sparse binocular fusion based convolutional neural network of the present invention.
FIG. 2(a) the relationship between the column sparsity (column sparsity) of each convolution layer and the overall acceleration of the network in LIVE I (b)
Relationship between line sparsity (row partition) of each convolution layer on LIVE I and overall acceleration of network
Detailed Description
The invention adopts the open three-dimensional image libraries LIVE 3D Phase I and LIVE 3D Phase II to carry out experiments. The LIVE 3D Phase I image library contains 20 original stereo image pairs and 365 symmetrically distorted stereo image pairs, the distortion types include JPEG compression, JPEG 2000 compression, gaussian blur Gblur, gaussian white noise WN and fast fading FF, and DMOS values are distributed from-10 to 60. The LIVE 3D Phase II image library comprises 8 original stereo image pairs and 360 symmetrically distorted and asymmetrically distorted stereo image pairs, wherein 120 pairs are symmetrically distorted stereo images, 240 pairs are asymmetrically distorted stereo images, distortion types comprise JPEG compression, JPEG 2000 compression, Gaussian blur Gblur, Gaussian white noise WN and fast fading FF, and DMOS values are distributed from 0 to 100.
The method is explained in detail below with reference to the technical scheme:
the quality evaluation method simulates the process of processing the stereo image by human brain, and simulates the long-term fusion and processing of left and right viewpoints by using quartic concat of a convolutional neural network, so that the network is suitable for symmetric and asymmetric distorted stereo images. SSL is applied to each convolution layer of the network, the number and the shape of the network filters are structurally sparsely constrained, the network calculation complexity is reduced, the network operation speed is increased, and the network evaluation performance is improved.
The method comprises the following specific steps:
1 structured sparse learning SSL implementation
By usingRepresents all weights in the L (1. ltoreq. l.ltoreq.L) th convolutional layer, where Nl,Cl,MlAnd KlRepresenting the number of convolutional layer filters, the number of channels, the height and width of the filter. L represents the number of convolutional layers in the network. The objective function of a convolutional neural network with structured sparsity constraints can be expressed as equation (1)
Where W represents all weights in the network. ED(W) is a loss function of the network. R (W) is an unstructured regularization constraint applied on all weights, and l is used in this application2And (4) norm. Rg(W(l)) The constraint is sparsely regularized for application to each layer. Structured sparseness is achieved in SSL using Group Lasso, which regularization enables some groups to be zero. The Group Lasso applied on the weight w can be expressed asWherein G denotes the number of groups of a packet, w(g)Is the g-th set of weights in w.Wherein | w(g)Is the packet w(g)The number of weights in (1).
In the SSL method w(g)The grouping method of (1) can be divided into grouping according to filters, grouping according to channels, grouping according to the shapes of the filters and grouping according to the layer number of the network, namely, filter-wise, channel-wise, filter shape-wise and depth-wise, which are expressed asW(l)(1≤nl≤Nl,1≤cl≤Cl,1≤ml≤Ml,1≤kl≤Kl). Wherein n isl,cl,ml,klIs the n-th layer of the l-th layerlA filter, c of the l-th layerlM of channel, filterlKth of line and filterlAnd (4) columns. In this application we use filter-wise and filter shape-wise to penalize unimportant filters in each convolutional layer and learn filters of arbitrary shape. In caffe, all filters of each layer are warped into a matrix, with one filter in each row of the matrixThe number of columns of the matrix is the number of filters. Therefore, the dimension of the weight matrix is directly reduced by changing the row or column of the weight matrix into zero by combining the sparse regularization of the filter-wise and the shape-wise. The filter-wise and shape-wise in SSL may be called row-wise and column-wise. After adding row-wise and column-wise sparse regularization, the objective function of the network can be expressed as formula (2). Wherein λ, λnAnd λsIs a2Norm, row-wise and column-wise penalty factors.
Construction of 2 binocular fusion convolutional neural network
The binocular fusion convolutional neural network adopted by the application is shown in figure 1. The binocular fusion network simulates a human brain stereoscopic vision processing mechanism and performs long-term fusion on left and right viewpoints. The fusion network is divided into three parts: left branch, right branch and fused branch. There are four convolutional layers and two pool layers in both the left and right branches. The fused branch comprises four convolutional layers, three pooling layers and three full-link layers. The network filter size and the number of filters are shown in fig. 1. In order to simulate the long-term fusion and processing of left and right views in visual cortex, two viewpoints are fused four times through four concat (such as (phi) and (phi) in fig. 1) in a network, and meanwhile, the information processing is realized through convolution operation. The method realizes the edge fusion and the edge processing of the image and simulates the visual mechanism of human eyes. Considering binocular combination and binocular competition mechanisms, different weights need to be distributed to the left view and the right view to obtain a final fusion image [17 ]. In the application, the weights of the left view and the right view are obtained through the autonomous learning of the fusion network. While the filters and filter shapes are structurally sparsely constrained using SSL on each convolutional layer.
The convolution operation in the binocular fusion network is defined as equation (3).
Fl=RELU(Wl*Flth_input+Bl) (3)
Wherein, WlAnd BlRespectively representing the weight and bias of the first convolutional layer. FlA characteristic diagram representing the output of the first layer convolution layer, Flth_inputRepresenting the input of the first convolutional layer. RELU is an activation function, representing a convolution operation.
All pooling layers in the binocular fusion network are maximal pooling. When a back propagation algorithm is used for training the network, parameters of the convolutional layer, the pooling layer and the full-link layer are learned through a minimum loss function. The loss function uses a euclidean function as shown in equation (4).
Wherein, YiAnd yiRepresenting the expected output and the actual output of sample i, respectively. n represents the batch size.
3 evaluation results and analysis of stereoscopic image quality
The experiments of this patent were performed on LIVE 3D Phase I and LIVE 3D Phase II as published. LIVE 3D Phase I and LIVE 3D Phase II both contain 5 distortion types, JPEG compression, JPEG 2000 compression, gaussian blur Gblur, gaussian white noise WN and fast fading FF. The LIVE 3D Phase I image library contains 20 original stereo image pairs and 365 symmetrically distorted stereo image pairs. The LIVE 3D Phase II image library contains 8 original stereo image pairs and 360 symmetrically and asymmetrically distorted stereo image pairs, 120 of which are symmetrically distorted and 240 of which are asymmetrically distorted. The method adopts Pearson correlation coefficient (PLCC) and Spearman grade correlation coefficient (SROCC) as a measuring method for evaluating the consistency of results subjectively and objectively. The closer to 1 the PLCC and SROCC are, the better the evaluation effect.
In table 1, we compared the proposed method with eight stereoscopic image quality evaluation methods. The best results are highlighted in bold. Among them, paper [6-9] is a sparse representation-based method, paper [10-13] is a deep learning-based method, and our method combines sparsity and CNN. For the relation between left and right viewpoints of a stereo image, documents [5-6] [11-12] firstly process the two viewpoints and then fuse the characteristics of left and right views; in document [7-9] [13], two viewpoints are fused first, and then the fused image is processed as a planar image; the method adopts long-term fusion for two viewpoints, and processes the two viewpoints while fusing. As can be seen from Table 1, the network evaluation effect of the method is greatly superior to that of other methods. Only PLCC was slightly below M [8] on LIVE II. However, both SROCC and RMSE exceed M [8] in LIVE II. Our PLCC and SROCC both exceeded 0.96 on LIVE I and 0.95 on LIVEI II. Our method has better performance than both sparse representation and deep learning. Meanwhile, no matter a method of fusion before processing or a method of fusion after processing, the sparse fusion network of the people surpasses the methods. The network of the patent has good processing effect on symmetrical and asymmetrical distorted stereo images.
To demonstrate the effect of SSL on the proposed network, we compared networks at different structured sparsity. net0(baseline) is a network that does not use structured sparse regularization. FIG. 2 shows the relationship between LIVE I upstream sparsity, column sparsity, and network acceleration, with LIVE II the same trend for sparsity and network acceleration as LIVE I. We set the acceleration of the reference network net0 to 1, with progressively increasing rarefactions on the nets 1, 2 (deployed method) and 3. We can see that the more sparse the network, the greater the network acceleration. In table 2, we compare the network performance at different structured sparsity strengths. When sparsity is low, such as net 1. Due to the fact that
The sparsity of the network is slightly accelerated, and the performance is slightly reduced. In net3, the magnitude of the performance degradation is greater than net1 when the sparsity is higher. The acceleration of net3 is much greater than net 1. When the sparsity is appropriate, the performance improves instead, as with net2 (propedmehod). This may be due to the fact that the insignificant redundancy weights in the network are constrained to 0, i.e., the structural sparse regularization helps to improve network performance. In addition, the speed of the method proposed by the inventor is greatly improved. Acceleration was 2.0 times on LIVE I and 2.3 times on LIVE II. When the row sparsity and the column sparsity are high (such as net3), the network evaluation effect is only reduced by about 0.01, but the network is accelerated by about 3 times, and meanwhile, the evaluation effect of net3 is still higher than that of most methods.
It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Reference to the literature
[1]L.Xing,J.You,T.Ebrahimi and A.Perkis,"Assessment of Stereoscopic Crosstalk Perception,"in IEEE Transactions on Multimedia,vol.14,no.2,pp.326-337,April 2012.
[2]M.Chen,L.K.Cormack and A.C.Bovik,"No-Reference Quality Assessment of Natural Stereopairs,"in IEEE Transactions on Image Processing,vol.22,no.9,pp. 3379-3391,Sept.2013.
[3]Xu,Xiaogang,Y.Zhao,and Y.Ding,“No-reference stereoscopic image quality assessment based on saliency-guided binocular feature consolidation,”Electronics Letters vol.53,no.22,pp.1468-1470,2017.
[4]J.Ma,P.An,L.Shen and K.Li,"Reduced-Reference Stereoscopic Image Quality Assessment Using Natural Scene Statistics and StructuralDegradation,"in IEEE Access,vol.6,pp.2768-2780,2018.
[5]K.Li,F.Shao,G.Jiang and M.Yu,"Joint structure–texture sparse coding for quality prediction of stereoscopic images,"in Electronics Letters,vol.51,no. 24,pp.1994-1995,19 11 2015.
[6]F.Shao,K.Li,W.Lin,G.Jiang and Q.Dai,"Learning Blind Quality Evaluator for Stereoscopic Images Using Joint Sparse Representation,"in IEEETransactions on Multimedia,vol.18,no.10,pp.2104-2114,Oct.2016.
[7]Y.Lin,J.Yang,W.Lu,Q.Meng,Z.Lv and H.Song,"Quality Index for Stereoscopic Images by Jointly Evaluating Cyclopean Amplitude and CyclopeanPhase,"in IEEE Journal of Selected Topics in Signal Processing,vol.11,no.1,pp.89-101,Feb. 2017.
[8]M.Karimi,M.Nejati,S.M.R.Soroushmehr,S.Samavi,N.Karimi and K.Najarian, "Blind Stereo Quality Assessment Based on Learned Features FromBinocular Combined Images,"in IEEE Transactions on Multimedia,vol.19,no.11,pp.2475-2489,Nov. 2017.
[9]J.Yang,P.An,J.Ma,K.Li and L.Shen,"No-reference stereo image quality assessment by learning gradient dictionary-based color visualcharacteristics," 2018 IEEE International Symposium on Circuits and Systems(ISCAS),Florence,2018, pp.1-5.
[10]J.Yang,B.Jiang,H.Song,X.Yang,W.Lu and H.Liu,"No-Reference Stereoimage Quality Assessment for Multimedia Analysis Towards Internet-of-Things,"in IEEE Access,vol.6,pp.7631-7640,2018.
[11]Y.Ding et al.,"No-Reference Stereoscopic Image Quality Assessment Using Convolutional Neural Network for Adaptive Feature Extraction,"in IEEEAccess, vol.6,pp.37595-37603,2018.
[12]Lv Y,Yu M,Jiang G et al.,“No-reference Stereoscopic Image Quality Assessment Using Binocular Self-similarity and Deep Neural Network,”SignalProcessing:Image Communication,vol.47,pp.346-357,2016.
[13]Q.Sang,T.Gu,C.Li and X.Wu,"Stereoscopic image quality assessment via convolutional neural networks,"2017 International Smart Cities Conference(ISC2), Wuxi,2017,pp.1-2.
[14]Baoyuan Liu,Min Wang,H.Foroosh,M.Tappen and M.Penksy,"Sparse Convolutional Neural Networks,"2015 IEEE Conference on Computer Vision andPattern Recognition(CVPR),Boston,MA,2015,pp.806-814.
[15]Wen,Wei&Wu,Chunpeng&Wang,Yandan&Chen,Yiran&Li,Hai.(2016).Learning Structured Sparsity in Deep Neural Networks.
[16]Hubel D H,Wiesel T N,“Receptive fields of single neurones in the cat\"s striate cortex,”The Journal of Physiology,vol.148,no.3,pp.574-591,1959.
Claims (3)
1. The stereo image quality evaluation method based on the sparse binocular fusion convolutional neural network is characterized by comprising the following steps of:
s1, constructing a stereo image quality evaluation network based on a binocular fusion convolutional neural network, wherein the network comprises a left branch, a right branch and a fusion branch;
s2, applying a structured sparse constraint on each layer of the binocular fusion convolutional neural network, wherein an objective function of network optimization is shown as a formula (1):
wherein W represents all weights in the network; eD(W) is a loss function of the network; r (W) is an unstructured regularization constraint applied over all weights; rg(W(l)) The constraint is sparsely regularized for application to each layer.
2. The stereoscopic image quality evaluation method based on the sparse binocular fusion convolutional neural network of claim 1, wherein in S1, a left-right branch step is constructed through left and right views in the neural network;
2.1, dividing the left branch and the right branch into a first convolution layer and a first pooling layer, a second convolution layer and a second pooling layer, a third convolution layer and a fourth convolution layer;
2.2, inputting the first pooling layer after the first convolution layer in the left and right branches is subjected to structure sparsity constraint;
2.3, connecting the output end of the first pooling layer with the second convolution layer, and inputting the second pooling layer after performing structure sparsity constraint on the second convolution layer;
2.4, connecting the output end of the second pooling layer with the third convolution layer, performing structure sparse constraint on the third convolution layer, and inputting the third convolution layer into the fourth convolution layer; and the output end of the fourth convolution layer is connected with the fusion branch for fusion processing.
3. The stereoscopic image quality evaluation method based on the sparse binocular fusion convolutional neural network of claim 1, wherein in S1, a fusion branch step is constructed through left and right views in the neural network;
3.1, the fusion branch is divided into a first pooling layer and a first rolling layer, a second pooling layer and a second rolling layer, a third rolling layer, a fourth rolling layer and a third pooling layer and three full-connection layers, and four times of fusion operation is carried out;
3.2, performing a first fusion operation on the feature map from the first convolution layer after the sparsity constraint of the left and right branch structures through a 'concat' operation, inputting the fused feature map into a first fusion branch pooling layer, then sending the fused feature map into the first fusion branch convolution layer for information processing, and simultaneously performing the structuralization sparsity constraint on the first fusion branch convolution layer;
3.3, performing second fusion operation on the feature graph of the second convolution layer after sparse constraint of the left and right branch structures and the feature graph of the first convolution layer after the first fusion of the fusion branch through 'concat' operation, inputting the fused feature graph into the second fusion branch pooling layer, then sending the fused feature graph into the second fusion branch convolution layer for information processing, and simultaneously performing structured sparse constraint on the second fusion branch convolution layer;
3.4, performing a third fusion operation on the feature graph of the second convolution layer after the feature graph from the third convolution layer after the sparsity constraint of the left and right branch structures and the feature graph of the second convolution layer after the second fusion of the fusion branch through a 'concat' operation, inputting the fused feature graph into the third convolution layer of the fusion branch for information processing, and simultaneously performing the structuralization sparsity constraint on the third convolution layer of the fusion branch;
3.5, performing fourth fusion operation on the feature graph of the fourth convolution layer after the sparsity constraint of the left and right branch structures and the feature graph of the third convolution layer after the third fusion of the fusion branch through a 'concat' operation, inputting the fused feature graph into the fourth convolution layer of the fusion branch for information processing, and performing the structuralization sparsity constraint on the fourth convolution layer of the fusion branch; and sending the fused fourth convolution layer into a third pooling layer, and sending the output characteristic diagram into three full-connected layers to judge the quality of the stereo image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910568580.7A CN110636278A (en) | 2019-06-27 | 2019-06-27 | Stereo image quality evaluation method based on sparse binocular fusion convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910568580.7A CN110636278A (en) | 2019-06-27 | 2019-06-27 | Stereo image quality evaluation method based on sparse binocular fusion convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110636278A true CN110636278A (en) | 2019-12-31 |
Family
ID=68968903
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910568580.7A Pending CN110636278A (en) | 2019-06-27 | 2019-06-27 | Stereo image quality evaluation method based on sparse binocular fusion convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110636278A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111696090A (en) * | 2020-06-08 | 2020-09-22 | 电子科技大学 | Method for evaluating quality of face image in unconstrained environment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105959684A (en) * | 2016-05-26 | 2016-09-21 | 天津大学 | Stereo image quality evaluation method based on binocular fusion |
CN109167996A (en) * | 2018-09-21 | 2019-01-08 | 浙江科技学院 | It is a kind of based on convolutional neural networks without reference stereo image quality evaluation method |
CN109714592A (en) * | 2019-01-31 | 2019-05-03 | 天津大学 | Stereo image quality evaluation method based on binocular fusion network |
-
2019
- 2019-06-27 CN CN201910568580.7A patent/CN110636278A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105959684A (en) * | 2016-05-26 | 2016-09-21 | 天津大学 | Stereo image quality evaluation method based on binocular fusion |
CN109167996A (en) * | 2018-09-21 | 2019-01-08 | 浙江科技学院 | It is a kind of based on convolutional neural networks without reference stereo image quality evaluation method |
CN109714592A (en) * | 2019-01-31 | 2019-05-03 | 天津大学 | Stereo image quality evaluation method based on binocular fusion network |
Non-Patent Citations (1)
Title |
---|
WEN WEI, WU CHUNPENG, WANG YANDAN, CHEN YIRAN, LI HAI: "Learning Structured Sparsity in Deep Neural Networks", 《ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29(NIPS 2016)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111696090A (en) * | 2020-06-08 | 2020-09-22 | 电子科技大学 | Method for evaluating quality of face image in unconstrained environment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108391121B (en) | No-reference stereo image quality evaluation method based on deep neural network | |
CN110060236B (en) | Stereoscopic image quality evaluation method based on depth convolution neural network | |
Zhao et al. | Invertible image decolorization | |
CN109800710B (en) | Pedestrian re-identification system and method | |
CN108769671B (en) | Stereo image quality evaluation method based on self-adaptive fusion image | |
CN114820341A (en) | Image blind denoising method and system based on enhanced transform | |
CN110136057B (en) | Image super-resolution reconstruction method and device and electronic equipment | |
CN109523513B (en) | Stereoscopic image quality evaluation method based on sparse reconstruction color fusion image | |
CN109714592A (en) | Stereo image quality evaluation method based on binocular fusion network | |
CN111915589A (en) | Stereo image quality evaluation method based on hole convolution | |
CN110351548B (en) | Stereo image quality evaluation method guided by deep learning and disparity map weighting | |
CN112634238A (en) | Image quality evaluation method based on attention module | |
CN116091313A (en) | Image super-resolution network model and reconstruction method | |
CN115546060A (en) | Reversible underwater image enhancement method | |
CN114627035A (en) | Multi-focus image fusion method, system, device and storage medium | |
Chen et al. | Image denoising via deep network based on edge enhancement | |
CN115660955A (en) | Super-resolution reconstruction model, method, equipment and storage medium for efficient multi-attention feature fusion | |
CN115170944A (en) | NR underwater enhanced image quality evaluation method based on CNN | |
Jo et al. | Multi-scale selective residual learning for non-homogeneous dehazing | |
Feng et al. | Multi-scale feature-guided stereoscopic video quality assessment based on 3D convolutional neural network | |
Pham et al. | End-to-end image patch quality assessment for image/video with compression artifacts | |
CN110636278A (en) | Stereo image quality evaluation method based on sparse binocular fusion convolutional neural network | |
CN109272450A (en) | A kind of image oversubscription method based on convolutional neural networks | |
CN106960432B (en) | A kind of no reference stereo image quality evaluation method | |
CN115965844B (en) | Multi-focus image fusion method based on visual saliency priori knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191231 |