CN110458802A - Based on the projection normalized stereo image quality evaluation method of weight - Google Patents
Based on the projection normalized stereo image quality evaluation method of weight Download PDFInfo
- Publication number
- CN110458802A CN110458802A CN201910580586.6A CN201910580586A CN110458802A CN 110458802 A CN110458802 A CN 110458802A CN 201910580586 A CN201910580586 A CN 201910580586A CN 110458802 A CN110458802 A CN 110458802A
- Authority
- CN
- China
- Prior art keywords
- image
- normalization
- convolutional neural
- network
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000013441 quality evaluation Methods 0.000 title claims abstract description 32
- 238000010606 normalization Methods 0.000 claims abstract description 52
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 16
- 230000008569 process Effects 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims description 22
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 210000002569 neuron Anatomy 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 5
- 238000005520 cutting process Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000035945 sensitivity Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 12
- 238000011160 research Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000003384 imaging method Methods 0.000 abstract description 3
- 238000011161 development Methods 0.000 abstract description 2
- 238000002156 mixing Methods 0.000 abstract 1
- 238000011156 evaluation Methods 0.000 description 22
- 238000004422 calculation algorithm Methods 0.000 description 12
- 238000002474 experimental method Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000004927 fusion Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000000638 stimulation Effects 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000010587 phase diagram Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30168—Image quality inspection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to field of image processing, to propose new image quality evaluating method, and the ill-conditioning problem in network training process can be solved with being consistent property of human eye subjective assessment.Deep learning method for stereo image quality evaluation provides Research Thinking, and the development of stereoscopic imaging technology is pushed on the basis of certain.For this purpose, it is of the invention, based on the projection normalized stereo image quality evaluation method of weight, the left and right viewpoint figure of stereo-picture is merged, single width blending image is obtained, then single image is pre-processed: stripping and slicing and normalization;Build profound convolutional neural networks model, using pretreated image slice as the input of profound convolutional neural networks, and profound convolutional neural networks structure is optimized using the normalization of projection weight and batch data normalization, the quality evaluation result of stereo-picture is obtained by the output of profound convolutional neural networks.Present invention is mainly applied to image procossing occasions.
Description
Technical Field
The invention belongs to the field of image processing, and relates to application and optimization of image fusion and deep learning in stereo image quality evaluation.
Background
The stereo imaging technology can bring better visual body to peopleExperience shows that the quality degradation problem is generated from the acquisition to the display of the stereo image[1-2]The degraded image may affect the perception of the stereo content, so how to reasonably and efficiently evaluate the quality of the stereo image has become one of the research hotspots in the field of stereo information. The stereo image quality evaluation method mainly comprises subjective evaluation and objective evaluation. However, the subjective evaluation experiment is time-consuming and labor-consuming, and has a large cost. And the objective evaluation has stronger operability. Therefore, establishing a reasonable and efficient objective evaluation mechanism for the quality of the stereo image has very important practical significance.
Until now, researchers have proposed various methods for evaluating the quality of stereo images, which can be roughly classified into conventional methods and methods using artificial neural networks. Most of traditional methods respectively extract the features of the left view and the right view, and then weight the quality scores of the left view and the right view to obtain the final objective evaluation value[3-7]. However, the features extracted by the conventional method do not necessarily reflect the essential features of the image. To better simulate the mechanism of human eye feature extraction, researchers have applied artificial neural networks to stereoscopic image quality assessment, e.g. [8-10 ]]And the shallow neural network is applied to the objective quality evaluation of the stereo image, but the number of layers of the network is small, the structure is simple, and the process of processing information by levels of a human visual system cannot be simulated more accurately. Compared with a shallow neural network, deep learning can simulate the mode of human brain processing information, and features can be extracted layer by layer through a deep network. Convolutional Neural Network (CNN) is a classic Network in deep learning, and is applicable to the fields of computer vision, natural language processing, and the like. Zhang Wei et al applies convolutional neural network to stereo image quality evaluation, performs feature extraction with 2 convolutional layers and 2 pooling layers, introduces Multi-layer perceptron (MLP) at the end of the network, and connects the learned features to obtain quality scores[11](ii) a Chen Hui et al adopts a convolutional neural network model with 12 convolutional layers[12]Ding et al, have higher objective evaluation scores and subjective evaluation scores of human eyes using a convolutional neural network model with 5 convolutional layersConsistency[13]. The structure of the deep neural network adopted in the field of stereo image quality evaluation at present has certain limitations: on one hand, the arrangement modes of convolution kernels in the network are simple, the convolution kernels are connected in sequence, and the extracted features are single; on the other hand, the layers forming the network are the most basic convolutional layer, the pooling layer and the full-connection layer, the functions are few, and normalization is not carried out, so that the problem of gradient dispersion cannot be handled by the network.
In addition, in practical research, it is found that when the human brain perceives a stereoscopic image, the left and right views are fused first, and then the fused image is processed hierarchically[14]. Lin et al perform quality evaluation on the fused stereo image by using the traditional method, but only fuse a phase diagram and an amplitude diagram[15]. To better simulate this feature, deep learning is used to evaluate the stereo image quality (see [16]]) Processing with fused images has also begun, but the fusion method of this document does not take into account the thresholds at which gain enhancement and gain control occur[17]。
Aiming at the problems, the invention provides a three-dimensional image quality evaluation model based on a deep convolutional neural network, and the preprocessed fusion image is used as the input of the network, so that the learning process of the network is more consistent with the visual characteristics of human eyes. A Batch Normalization layer (BN) is introduced into the model to ensure that network output data and input data are distributed at the same time and avoid gradient disappearance; a Projection Weight Normalization (PBWN) layer is introduced to normalize parameters with different magnitudes, and the ill-conditioned phenomenon of a Hessian matrix is relieved, so that the learning capability of the network is improved. The first stage of the model is a convolution kernel parallel connection module, the second stage is that convolution kernels are connected with the module in sequence, a residual error unit is introduced to avoid network degradation, and finally a full connection layer is introduced to finish quality evaluation of the stereo image.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a stereo image quality evaluation method based on projection weight normalization based on a deep convolutional neural network. The method has good performance, keeps consistency with human eye subjective evaluation, introduces data batch normalization and projection weight normalization, and solves the problem of ill-condition in the network training process. The method provides a research idea for a deep learning method for evaluating the quality of the stereo image, and promotes the development of the stereo imaging technology on a certain basis. Therefore, the invention adopts the technical scheme that a three-dimensional image quality evaluation method based on projection weight normalization fuses left and right viewpoint images of a three-dimensional image to obtain a single fused image, and then preprocesses the single image: cutting and normalizing; and constructing a deep convolutional neural network model, taking the preprocessed image blocks as the input of the deep convolutional neural network, optimizing the structure of the deep convolutional neural network by adopting projection weight normalization and data batch normalization, and obtaining the quality evaluation result of the stereo image through the output of the deep convolutional neural network.
The specific steps of obtaining the fused image
Using a Gabor filter having 6 dimensions fsE {1.5,2.5,3.5,5,7,10} } and 8 directions theta e { k pi/8 | k ═ 0,1 … 7}, and fusing the Gabor-filtered left and right views into an image according to formula (1).
Wherein, Il(x, y) and Ir(x, y) denotes a pixel value at a position (x, y) in the left and right views, respectively, C (x, y) denotes a pixel value of the fused image, TCE denotes an enhancement component to the present viewpoint, TCE denotes*Representing the suppressed component for the other viewpoint, the calculation is as shown in equations (2) and (3):
wherein t represents a left viewpoint or a right viewpoint, gc represents an enhancement threshold, ge represents a control threshold, 48 images are obtained after Gabor filtering,frequency information of the nth image representing the t viewpoint filtered by the contrast sensitivity function,weights of the n-th image representing t-viewpoint, i, j representing 6 scales f of Gabor filtering, respectivelysE {1.5,2.5,3.5,5,7,10} (cycles/degree) and 8 directions θ e { k pi/8 | k ═ 0,1 … 7 };
image pre-processing
The normalization calculation process is shown in equation (5):
where I (x, y) represents a pixel value at the (x, y) coordinate point, μ (x, y) is an average value of the pixel values, σ (x, y) is a standard deviation of the pixel values, and ∈ is an arbitrary positive number approaching 0 infinitely.
Convolutional neural network model
Based on multi-scale extraction of feature inclusion structure and residual network structure Block, build a deep convolutional neural network model with two convolutional kernel arrangement modes, the input of the model is a small Block after cutting, the model comprises 1 inclusion structure, 1 convolutional layer, 3 Block structures, 1 pooling layer and 1 full-connection layer, in the same layer in the network inclusion structure, through convolutional kernel parallel operation of different sizes, the features of different scales of the image are extracted, and the convolutional kernel of 1 × 1 size is introduced to reduce network parameters, so that the computational complexity is reduced.
(1) Projection weight normalization
In the planning problem of seeking the optimal solution by the network, adding the constraint on the weight matrix W of each layer:
min l(y,f(x;W))
wherein W ═ { W ═ WiI is 1,2 … L, the elements in the set are the weight matrices for each layer from layer 1 to L, L (y, f (x; W)) represents the loss function, with y being the desired output and f (x; W) being the actual output.Representation reservation matrixAnd the main diagonal elements of the matrixAll off-diagonal elements become 0.
The constraint defines the weight matrix of each layer in a subspace of the manifold space, namely, the weight matrix w of each layer satisfies
ddiag(wwT)=E (7)
Solving the constraint by using Riemann Riemannian optimization theory to obtain Riemannian gradient in manifold space
Wherein,the gradient is obtained without constraint. When the weight matrix omega of each neuron meets the unit normalization, namely omegaTThe riemann gradient is obtained based on equation (8) as 1:
riemann gradient is reduced compared with original gradientAn itemThe norm of this reduced term is analyzed:
the original gradient is adopted for calculation to reduce the calculation amount, and the formula (11) is adopted for weight updating:
(2) batch normalization of data
The batch normalization method of data is shown in formula (12), and during the training process, the mean value mu and the variance sigma are calculated for the data of each batch2For each feature xiProcessing to obtain the activation y after data batch normalizationi。
During the test, the mean value of all training batchs is used to represent E [ x ], the unbiased estimation of the variance of all training batchs is used to represent var [ x ], and m is the size of each batch as shown in the formulas (13) and (14).
E[x]=EB[μB] (13)
Therefore, in the testing stage, the formula of data batch normalization is shown in formula (15), the function of the parameter gamma and beta is zooming and translation, the expression capability of the model is restored, and the network generalization performance is improved:
the invention has the characteristics and beneficial effects that:
the invention provides a stereo image quality evaluation method introducing projection weight normalization based on a deep convolutional neural network, and the identification rate of stereo image quality evaluation is high. The CNN model extracts the characteristics of the preprocessed fused stereo image through two modules of direct motion and parallel motion of a convolution kernel, so that the network can learn the image more fully. Compared with the existing deep learning evaluation algorithm, the invention introduces BN and PBWN to carry out network optimization, solves the ill-conditioned problem in the network training process and effectively improves the network evaluation accuracy.
The stereo image quality evaluation method takes human vision mechanism into consideration, takes the preprocessed fusion image as the input of the network, introduces the optimization of the deep convolutional neural network structure, and effectively improves the performance of the network. Experiments show that the evaluation result of the invention has better consistency with subjective quality.
Description of the drawings:
FIG. 1 is a detailed flow diagram of the process.
Detailed Description
The method adopts the technical scheme that the left and right viewpoint images of the stereo image are fused to obtain a single fused image, and then the single image is cut into blocks and normalized. And constructing a deep convolutional neural network model, taking the preprocessed image small blocks as the input of the deep convolutional neural network, optimizing the structure of the deep convolutional neural network by adopting projection weight normalization and data batch normalization, and obtaining the quality of the stereo image through the output of the deep convolutional neural network.
1. Fusing images
Inspired by the binocular competition phenomenon of Human eyes in the Human Visual System (HVS) of the biological science, the invention adopts a method of fusing images and adopts a Gabor filter to filter the images. The Gabor filter has six dimensions fsE {1.5,2.5,3.5,5,7,10} (cycles/hierarchy) } and eight directions theta e { k pi/8 | k ═ 0,1, … 7 }. After filtering, 48 characteristic maps of each channel of each viewpoint can be obtained. According to a binocular competition mechanism, the increase of the viewpoint is combinedAnd calculating the strong component and the inhibition component of the other viewpoint to obtain a final fusion image.
2. Deep learning
And a convolutional neural network algorithm which starts earlier in deep learning and develops more mature is selected. Based on Incepton structure and Block structure[18-19]The invention builds a deep convolutional neural network model with the two convolutional kernel arrangement modes simultaneously.
3. Optimization of network architecture
A Batch Normalization (BN) layer is introduced into the network to ensure that network output data and input data are distributed at the same time and avoid gradient disappearance; a Projection weight normalization (PBWN) layer is introduced to normalize parameters with different magnitudes, and the ill-conditioned phenomenon of a Hessian matrix is relieved, so that the learning capability of the network is improved.
The purpose of projection weight normalization is to solve the problem of network training ill-condition caused by the space symmetry of the scaling weight in the deep learning nonlinear network[20]. The scaling weight value space symmetry causes the Hessian matrix to be involved in a ill-conditioned state, so that the network is easy to be involved in a local optimal value in training, and the network is not favorable for seeking a global optimal solution[21]. In order to alleviate the problem, the unit normalization is performed on the weight values, so that the same magnitude of the weight values of each layer is ensured.
The batch normalization of the data can avoid the gradual deviation of data distribution and effectively solve the problem that the original space and the target space are not distributed uniformly[22]And for the output after neuron activation, normalization processing is carried out, and then the output is sent to the next layer of neurons for activation, so that gradient dispersion and gradient explosion are avoided. And the learnable reconstruction parameters are introduced, so that the network learning ability and the generalization ability are improved.
The invention performs experiments on the disclosed stereoscopic image libraries LIVE I and LIVE II. LIVE-3 dii has 20 pairs of original images, 365 symmetrically distorted images containing 5 kinds of distortion (Gblur, WN, JPEG, JP2K and FF), LIVE-3 dii has 8 pairs of original images, 360 symmetrically and asymmetrically distorted images containing 5 kinds of distortion (Gblur, WN, JPEG, JP2K and FF).
The method is described in detail below with reference to specific examples.
The invention provides a stereo image quality evaluation method introducing projection weight normalization based on a deep convolutional neural network, and the identification rate of stereo image quality evaluation is high. The CNN model extracts the characteristics of the preprocessed fused stereo image through two modules of direct motion and parallel motion of a convolution kernel, so that the network can learn the image more fully. Compared with the existing deep learning evaluation algorithm, the invention introduces BN and PBWN to carry out network optimization, solves the ill-conditioned problem in the network training process and effectively improves the network evaluation accuracy. The specific flow chart of the method provided by the invention is shown in figure 1.
The method comprises the following specific steps:
1. acquisition of fused images
Using a Gabor filter having 6 dimensions fsE {1.5,2.5,3.5,5,7,10} (cycles/hierarchy) } and 8 directions theta e { k pi/8 | k ═ 0,1 … 7 }. The Gabor filtered left and right views are fused into one image according to formula (16).
Wherein, Il(x, y) and Ir(x, y) represents a pixel value at a position (x, y) in the left and right views, respectively, and C (x, y) represents a pixel value of the fused image. TCE denotes the enhancement component for this view, TCE*Representing the suppressed component for the other viewpoint, the calculation is as shown in equations (17) and (18):
wherein t represents a left viewPoint or right viewpoint, gc represents enhancement threshold, and ge represents control threshold. After Gabor filtering, 48 images are obtained,frequency information of the nth image representing the t viewpoint filtered by the contrast sensitivity function,weights of the n-th image representing t-viewpoint, i, j representing 6 scales f of Gabor filtering, respectivelysE {1.5,2.5,3.5,5,7,10} (cycles/degree) and 8 directions theta e { k pi/8 | k ═ 0,1 … 7 }.
2. Image pre-processing
The size of the single fused image is large, so that the original image is cut into image blocks of 32 x 32, the network operation amount is reduced, and then normalization calculation is performed. The normalization calculation process is shown in equation (20):
where I (x, y) represents a pixel value at the (x, y) coordinate point, μ (x, y) is an average value of the pixel values, σ (x, y) is a standard deviation of the pixel values, and ∈ is an arbitrary positive number approaching 0 infinitely.
3. Convolutional neural network model
Based on the Incepton structure and the Block structure, the invention builds a deep convolutional neural network model with two convolutional kernel arrangement modes simultaneously, and the input of the model is a cut small Block. The model comprises 1 inclusion structure, 1 convolutional layer, 3 Block structures, 1 pooling layer and 1 full-connection layer, as shown in fig. 1.
In the same layer in the network inclusion structure, features of different scales of the image can be extracted through convolution kernel parallel operation of different sizes, so that the extraction process is more comprehensive and sufficient, and the convolution kernels of 1 × 1 size are introduced to reduce network parameters and reduce the computational complexity.
TABLE 1 network model parameter settings
The Block structure introduces the concept of 'residual error', and the input of the upper layer is directly connected and output by adding a channel, so that the problem of network degradation is solved.
4. Optimization of network architecture
In the CNN network adopted by the invention, a projection weight normalization layer (PBWN) and a data batch normalization layer (BN) are respectively introduced after each convolution layer to normalize weight parameters and input data of each layer.
(1) Projection weight normalization
In the planning problem of seeking the optimal solution by the network, adding the constraint on the weight matrix W of each layer:
min l(y,f(x;W))
wherein W ═ { W ═ WiI is 1,2 … L, the elements in the set are the weight matrices for each layer from layer 1 to L, L (y, f (x; W)) represents the loss function, with y being the desired output and f (x; W) being the actual output.Representation reservation matrixAnd the main diagonal elements of the matrixAll off-diagonal elements become 0.
The constraint defines the weight matrix of each layer in a subspace of the manifold space, namely, the weight matrix w of each layer satisfies
ddiag(wwT)=E (22)
Solving the constraint by adopting a Riemann optimization theory to obtain a Riemann gradient in manifold space
Wherein,the gradient is obtained without constraint. When the weight matrix omega of each neuron meets the unit normalization, namely omegaTThe riemann gradient is obtained based on equation (8) as 1:
riemann gradient is reduced by one term compared with original gradientThe norm of this reduced term is analyzed:
it is stated that this term is not the dominant term in equation (24) and that the Riemann gradient has been shown by experiment to be nearly as effective as the original gradient. Therefore, the method adopts the original gradient to calculate so as to reduce the calculation amount.
Therefore, the present invention adopts the formula (26) to update the weight.
(2) Batch normalization of data
The batch normalization method of data is shown in formula (27), and during the training process, the mean value mu and the variance sigma are calculated for each batch2For each feature xiProcessing to obtain the activation y after data batch normalizationi。
During testing, E [ x ] is represented by the mean of all training batchs, var [ x ] is represented by the unbiased estimate of the variance of all training batchs, and m is the size of each batch as shown in equations (28) and (29).
E[x]=EB[μB] (28)
In the test stage, the formula of data batch normalization is shown in formula (30), and the function of the parameters gamma and beta is zooming and translation, so that the expression capability of the model is restored, and the network generalization performance is improved.
5. Stereo image quality evaluation results and analysis
The experiments of the invention were performed in two open stereo image libraries, LIVE I and LIVE II databases, respectively. The evaluation indexes selected by the method are Pearson Linear Correlation Coefficient (PLCC), Spearman Rank Order Correlation Coefficient (SROCC) and Root Mean Square Error (RMSE). The larger the values of PLCC and SROCC are, the smaller the value of RMSE is, the stronger the consistency between the evaluation result of the model and the subjective result is, and the better the effect is.
Table 2 shows the performance of the algorithm of the present invention compared to other methods on the LIVE-I, LIVE-II database.
TABLE 2 Overall Performance comparison of the evaluation methods
Chen 12 gives no overall evaluation index value of LIVE-II database, only gives index values of the respective distortion types, and comparison with Chen 12 is performed in tables 3 and 4. Table 2 shows that the performance of the algorithm of the invention is significantly better than the Heeseok [16] performance, because the invention takes the linear and nonlinear conditions into full consideration in the process of fusing images, i.e. when the stimulation received by both eyes is very small, the stimulation received by the left and right eyes is weighted linearly, and when the stimulation reaches the threshold of generating gain enhancement and gain control, the nonlinear weighting is adopted. Compared with Lin 15, the invention fuses the original image when fusing the image, 15 fuses only the bottom layer characteristic of the image, so the index obtained by the invention is better than the index of 15. Compared with other deep learning methods [11,13] and traditional methods [5-7] which do not fuse images, the PLCC and the SROCC obtained by the method are obviously improved. On LIVE-II, the PLCC bit-line obtained by the present invention is suboptimal, 0.0122% lower than Ding 13. Compared with other algorithms, the RMSE calculated by the model on LIVE-I and LIVE-II databases is smaller, and the algorithm has better performance on the quality evaluation of three-dimensional images of symmetric distortion and asymmetric distortion by integrating three indexes.
The evaluation effects of the algorithm of the invention on different distortion types are analyzed, as shown in tables 3 and 4.
TABLE 3 Performance comparison of evaluation methods for quality evaluation of stereo images of different distortion types in LIVE-3 DII database
TABLE 4 Performance comparison of the evaluation methods for quality evaluation of stereo images of different distortion types in LIVE-3D II database
When the network is tested, the indexes of PLCC and SROCC in the tables 3 and 4 are generally lower than that of the existing algorithm, because the experiment of the invention is 2 classifications, even if only 1 graph is judged incorrectly in the test, the PLCC is also greatly influenced. Experiments show that the algorithm has a good overall evaluation effect on 5 distortion types, and for FF distortion types in a LIVE-I database and FF and BLUR distortion types in a LIVE-II database, the recognition rate reaches 100%, so that the values of PLCC and SROCC also reach 1, and the value of RMSE is 0.
Table 5 shows the effect of adding PBWN layer after each convolutional layer and not on model performance. The result shows that the experimental result is obviously improved after the PBWN is added. The identification rate of LIVE-I image quality evaluation is improved by 2.833% and reaches 98.113%, and the identification rate of LIVE-II image quality evaluation is improved by 5.88% and reaches 96.47%.
TABLE 5 recognition rate of this algorithm for stereo image quality evaluation
TABLE 6 time (unit: second) required for this algorithm test
Table 6 shows the effect of comparing the presence or absence of PBWN on the test time. The PBWN ensures that the magnitude of each layer of weight parameter is the same, and the weight parameters are normalized by single positions, thereby effectively avoiding the ill-conditioned phenomenon of a Hessian matrix in the training process, improving the learning capability and generalization capability of the network, accelerating the network convergence and shortening the time required by network test.
Reference to the literature
[1]Zilly F,Kluger J,Kauff P.Production rules for stereo acquisition[J].Proceedings of the IEEE,2011,99(4):590-606.
[2]Urey H,Chellappan K V,Erden E,et al.State of the art in stereoscopic and autostereoscoic displays[J].Proceedings of the IEEE,2011,99(4):540-555.
[3] Objective evaluation model of stereoscopic image quality based on structural distortion analysis [ J ] computer aided design and graphics bulletin, 2012,24 (8): 1047-1056.
[4] Xushuning, prunus mume, stereoscopic image quality evaluation method based on visual saliency [ J ] information technology, 2016, 2016 (10): 91-93.
[5]Bensalma Rafik,Larabi Mohamed-Chaker.A perceptual metric for stereoscopic image quality assessment based on the binocular energy[J].Multidimensional Systems and Signal Processing,2013,24(2):281-316.
[6]Shao Feng,Jiang Gangyi,Yu Mei,et al.Binocular energy response based quality assessment of stereoscopic images[J].Digital Signal Processing,2014,29:45-53.
[7]Shao Feng,Lin Weisi,Wang Shanshan,et al.Learning Receptive Fields and Quality Lookups for Blind Quality Assessment of Stereoscopic Images[J].IEEE Transactions on Cybernetics,2016,46(3):730-743.
[8] Application of extreme learning machine in objective evaluation of stereoscopic image quality [ J ]. photoelectron, laser, 2014, 2014 (9): 1837-1842.
[9] Consider Shanbo, Shafeng, Jiangxian, etc. a support vector regression-based stereoscopic image objective quality evaluation model [ J ] electronic and informatics newspaper, 2012, 34 (2): 368-374.
[10] Wu-guang, lisuride, chengjincui objective evaluation of stereoscopic images based on genetic neural networks [ J ] information technology, 2013, 2013 (5): 148-153.
[11]Zhang Wei,Qu Cchenfei,Lin Ma,et al.Learning structure of stereoscopic image for no-reference quality assessment with convolutionalneural network[J].Pattern Recognition,2016,59(C):176-187.
[12] Chenhui, lie Chaofeng, stereoscopic color image quality evaluation of deep convolutional neural network [ J ] computer science and exploration, 2018, 12 (08): 1315-1322
[13]Ding Yong,Deng Ruizhe,Xie Xin,et al.Reference stereoscopic image quality assessment using convolutional neural network for adaptive featureextraction[J].IEEE Access,2018,2018(6):37595-37603.
[14]Hubel D.H.,Wiesel T.N.Receptive fields of singleneurones in the cat’s striate cortex[J].The Journal of Physiology,1959,148(3):574-591.
[15]Lin Yancong,Yang Jiachen,Lu Wen,et al.Quality index for stereoscopic images by jointly evaluating cyclopean amplitude and cyclopeanphase[J].IEEE Journal of Selected Topics in Signal Processing,2017,11(11):89-101.
[16]Oh Heeseok,Ahn Sewoong,Kim Jongyoo,et al.Blind deep S3D image quality evaluation via local to global feature aggregation[J].IEEE Transactions on Image Processing,2017,26(10):4923-4936.
[17]Ding Jian,Klein S.A.,Levi D.M.Binocular combination of phaseand contrast explained by a gain-control and gain-enhancement model[J].Journal of Vision,2013,13(2):13.
[18]Szegedy C,Liu W,Jia Y,et al.Going deeper with convolutions[J].2014:1-9.
[19]He K,Zhang X,Ren S,et al.Deep Residual Learning for Image Recognition[J].2015:770-778.
[20]L.Huang,X.Liu,B.Lang,and B.Li.Projection based weight normalization for deep neural networks.CoRR,abs/1710.02338,2017.
[21]Ian Goodfellow,Yoshua Bengio,and Aaron Courville.Deep Learning.MIT Press,2016.
[22]S.Ioffe and C.Szegedy.Batch normalization:Accelerating deep network training by reducing internal covariate shift.In Proceedings of the32nd International Conference on Machine Learning,ICML 2015。
Claims (3)
1. A three-dimensional image quality evaluation method based on projection weight normalization is characterized in that left and right viewpoint images of a three-dimensional image are fused to obtain a single fused image, and then the single image is preprocessed: cutting and normalizing; and constructing a deep convolutional neural network model, taking the preprocessed image blocks as the input of the deep convolutional neural network, optimizing the structure of the deep convolutional neural network by adopting projection weight normalization and data batch normalization, and obtaining the quality evaluation result of the stereo image through the output of the deep convolutional neural network.
2. The method of claim 1, wherein the projection weight normalization-based stereo image quality evaluation method,
the specific steps of obtaining the fused image
Using a Gabor filter having 6 dimensions fsE {1.5,2.5,3.5,5,7,10} } and 8 directions theta e { k pi/8 | k ═ 0,1 … 7}, and fusing the Gabor-filtered left and right views into an image according to formula (1).
Wherein, Il(x, y) and Ir(x, y) denotes a pixel value at a position (x, y) in the left and right views, respectively, C (x, y) denotes a pixel value of the fused image, TCE denotes an enhancement component to the present viewpoint, TCE denotes*Representing the suppressed component for the other viewpoint, the calculation is as shown in equations (2) and (3):
wherein t represents a left viewpoint or a right viewpoint, gc represents an enhancement threshold, ge represents a control threshold, 48 images are obtained after Gabor filtering,frequency information of the nth image representing the t viewpoint filtered by the contrast sensitivity function,weights of the n-th image representing t-viewpoint, i, j representing 6 scales f of Gabor filtering, respectivelysE {1.5,2.5,3.5,5,7,10} (cycles/degree) and 8 directions θ e { k pi/8 | k ═ 0,1 … 7 };
image pre-processing
The normalization calculation process is shown in equation (5):
wherein I (x, y) represents a pixel value at the (x, y) coordinate point, μ (x, y) is an average value of the pixel values, σ (x, y) is a standard deviation of the pixel values, and ∈ is an arbitrary positive number approaching 0 infinitely;
convolutional neural network model
Based on multi-scale extraction of feature inclusion structure and residual network structure Block, build a deep convolutional neural network model with two convolutional kernel arrangement modes, the input of the model is a small Block after cutting, the model comprises 1 inclusion structure, 1 convolutional layer, 3 Block structures, 1 pooling layer and 1 full-connection layer, in the same layer in the network inclusion structure, through convolutional kernel parallel operation of different sizes, the features of different scales of the image are extracted, and the convolutional kernel of 1 × 1 size is introduced to reduce network parameters, so that the computational complexity is reduced.
3. The method of claim 1, wherein the projection weight normalization-based stereo image quality evaluation method,
(1) projection weight normalization
In the planning problem of seeking the optimal solution by the network, adding the constraint on the weight matrix W of each layer:
min l(y,f(x;W))
wherein W ═ { W ═ WiI-1, 2 … L represents a set of network weight matrices, the elements in the set being layers 1 to LL (y, f (x; W)) represents the loss function, with y being the desired output and f (x; W) being the actual output.Representation reservation matrixAnd the main diagonal elements of the matrixAll off-diagonal elements become 0.
The constraint defines the weight matrix of each layer in a subspace of the manifold space, namely, the weight matrix w of each layer satisfies
ddiag(wwT)=E (7)
Solving the constraint by using Riemann Riemannian optimization theory to obtain Riemannian gradient in manifold space
Wherein,the gradient is obtained without constraint. When the weight matrix omega of each neuron meets the unit normalization, namely omegaTThe riemann gradient is obtained based on equation (8) as 1:
riemann gradient is reduced by one term compared with original gradientThe norm of this reduced term is analyzed:
the original gradient is adopted for calculation to reduce the calculation amount, and the formula (11) is adopted for weight updating:
(2) batch normalization of data
The batch normalization method of data is shown in formula (12), and during the training process, the mean value mu and the variance sigma are calculated for the data of each batch2For each feature xiProcessing to obtain the activation y after data batch normalizationi。
During the test, the mean value of all training batchs is used to represent E [ x ], the unbiased estimation of the variance of all training batchs is used to represent var [ x ], and m is the size of each batch as shown in the formulas (13) and (14).
E[x]=EB[μB] (13)
Therefore, in the testing stage, the formula of data batch normalization is shown in formula (15), the function of the parameter gamma and beta is zooming and translation, the expression capability of the model is restored, and the network generalization performance is improved:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910580586.6A CN110458802A (en) | 2019-06-28 | 2019-06-28 | Based on the projection normalized stereo image quality evaluation method of weight |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910580586.6A CN110458802A (en) | 2019-06-28 | 2019-06-28 | Based on the projection normalized stereo image quality evaluation method of weight |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110458802A true CN110458802A (en) | 2019-11-15 |
Family
ID=68481840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910580586.6A Pending CN110458802A (en) | 2019-06-28 | 2019-06-28 | Based on the projection normalized stereo image quality evaluation method of weight |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458802A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111583377A (en) * | 2020-06-10 | 2020-08-25 | 江苏科技大学 | Volume rendering viewpoint evaluation and selection method for improving wind-driven optimization |
CN111915589A (en) * | 2020-07-31 | 2020-11-10 | 天津大学 | Stereo image quality evaluation method based on hole convolution |
CN112164056A (en) * | 2020-09-30 | 2021-01-01 | 南京信息工程大学 | No-reference stereo image quality evaluation method based on interactive convolution neural network |
CN112257709A (en) * | 2020-10-23 | 2021-01-22 | 北京云杉世界信息技术有限公司 | Signboard photo auditing method and device, electronic equipment and readable storage medium |
CN113205503A (en) * | 2021-05-11 | 2021-08-03 | 宁波海上鲜信息技术股份有限公司 | Satellite coastal zone image quality evaluation method |
CN117269992A (en) * | 2023-08-29 | 2023-12-22 | 中国民航科学技术研究院 | Satellite navigation multipath signal detection method and system based on convolutional neural network |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108389192A (en) * | 2018-02-11 | 2018-08-10 | 天津大学 | Stereo-picture Comfort Evaluation method based on convolutional neural networks |
CN108537777A (en) * | 2018-03-20 | 2018-09-14 | 西京学院 | A kind of crop disease recognition methods based on neural network |
CN108769671A (en) * | 2018-06-13 | 2018-11-06 | 天津大学 | Stereo image quality evaluation method based on adaptive blending image |
CN109360178A (en) * | 2018-10-17 | 2019-02-19 | 天津大学 | Based on blending image without reference stereo image quality evaluation method |
CN109671023A (en) * | 2019-01-24 | 2019-04-23 | 江苏大学 | A kind of secondary method for reconstructing of face image super-resolution |
CN109714592A (en) * | 2019-01-31 | 2019-05-03 | 天津大学 | Stereo image quality evaluation method based on binocular fusion network |
CN109902202A (en) * | 2019-01-08 | 2019-06-18 | 国家计算机网络与信息安全管理中心 | A kind of video classification methods and device |
-
2019
- 2019-06-28 CN CN201910580586.6A patent/CN110458802A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108389192A (en) * | 2018-02-11 | 2018-08-10 | 天津大学 | Stereo-picture Comfort Evaluation method based on convolutional neural networks |
CN108537777A (en) * | 2018-03-20 | 2018-09-14 | 西京学院 | A kind of crop disease recognition methods based on neural network |
CN108769671A (en) * | 2018-06-13 | 2018-11-06 | 天津大学 | Stereo image quality evaluation method based on adaptive blending image |
CN109360178A (en) * | 2018-10-17 | 2019-02-19 | 天津大学 | Based on blending image without reference stereo image quality evaluation method |
CN109902202A (en) * | 2019-01-08 | 2019-06-18 | 国家计算机网络与信息安全管理中心 | A kind of video classification methods and device |
CN109671023A (en) * | 2019-01-24 | 2019-04-23 | 江苏大学 | A kind of secondary method for reconstructing of face image super-resolution |
CN109714592A (en) * | 2019-01-31 | 2019-05-03 | 天津大学 | Stereo image quality evaluation method based on binocular fusion network |
Non-Patent Citations (2)
Title |
---|
LEI HUANG.ET AL: ""Projection Based Weight Normalization for Deep Neural Networks"", 《ARXIV》 * |
SERGEY IOFFE.ET AL: ""Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"", 《PROCEEDINGS OF THE 32ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111583377A (en) * | 2020-06-10 | 2020-08-25 | 江苏科技大学 | Volume rendering viewpoint evaluation and selection method for improving wind-driven optimization |
CN111583377B (en) * | 2020-06-10 | 2024-01-09 | 江苏科技大学 | Improved wind-driven optimized volume rendering viewpoint evaluation and selection method |
CN111915589A (en) * | 2020-07-31 | 2020-11-10 | 天津大学 | Stereo image quality evaluation method based on hole convolution |
CN112164056A (en) * | 2020-09-30 | 2021-01-01 | 南京信息工程大学 | No-reference stereo image quality evaluation method based on interactive convolution neural network |
CN112164056B (en) * | 2020-09-30 | 2023-08-29 | 南京信息工程大学 | No-reference stereoscopic image quality evaluation method based on interactive convolutional neural network |
CN112257709A (en) * | 2020-10-23 | 2021-01-22 | 北京云杉世界信息技术有限公司 | Signboard photo auditing method and device, electronic equipment and readable storage medium |
CN112257709B (en) * | 2020-10-23 | 2024-05-07 | 北京云杉世界信息技术有限公司 | Signboard photo auditing method and device, electronic equipment and readable storage medium |
CN113205503A (en) * | 2021-05-11 | 2021-08-03 | 宁波海上鲜信息技术股份有限公司 | Satellite coastal zone image quality evaluation method |
CN117269992A (en) * | 2023-08-29 | 2023-12-22 | 中国民航科学技术研究院 | Satellite navigation multipath signal detection method and system based on convolutional neural network |
CN117269992B (en) * | 2023-08-29 | 2024-04-19 | 中国民航科学技术研究院 | Satellite navigation multipath signal detection method and system based on convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458802A (en) | Based on the projection normalized stereo image quality evaluation method of weight | |
CN107633513B (en) | 3D image quality measuring method based on deep learning | |
CN107563422B (en) | A kind of polarization SAR classification method based on semi-supervised convolutional neural networks | |
CN109360178B (en) | Fusion image-based non-reference stereo image quality evaluation method | |
Zhou et al. | Blind quality estimator for 3D images based on binocular combination and extreme learning machine | |
CN109376787B (en) | Manifold learning network and computer vision image set classification method based on manifold learning network | |
CN108389192A (en) | Stereo-picture Comfort Evaluation method based on convolutional neural networks | |
CN111429402B (en) | Image quality evaluation method for fusion of advanced visual perception features and depth features | |
Liu et al. | No-reference quality assessment for contrast-distorted images | |
Wang et al. | GKFC-CNN: Modified Gaussian kernel fuzzy C-means and convolutional neural network for apple segmentation and recognition | |
CN108389189B (en) | Three-dimensional image quality evaluation method based on dictionary learning | |
Jiang et al. | Learning a referenceless stereopair quality engine with deep nonnegativity constrained sparse autoencoder | |
CN108875655A (en) | A kind of real-time target video tracing method and system based on multiple features | |
Sun et al. | Learning local quality-aware structures of salient regions for stereoscopic images via deep neural networks | |
CN109788275A (en) | Naturality, structure and binocular asymmetry are without reference stereo image quality evaluation method | |
Niu et al. | Siamese-network-based learning to rank for no-reference 2D and 3D image quality assessment | |
CN111882516B (en) | Image quality evaluation method based on visual saliency and deep neural network | |
CN111915589A (en) | Stereo image quality evaluation method based on hole convolution | |
Chang et al. | Blind image quality assessment by visual neuron matrix | |
Liu et al. | A multiscale approach to deep blind image quality assessment | |
Li et al. | MCANet: Multi-channel attention network with multi-color space encoder for underwater image classification | |
CN108428226B (en) | Distortion image quality evaluation method based on ICA sparse representation and SOM | |
CN113810683A (en) | No-reference evaluation method for objectively evaluating underwater video quality | |
CN116664462B (en) | Infrared and visible light image fusion method based on MS-DSC and I_CBAM | |
Guan et al. | No-reference stereoscopic image quality assessment on both complex contourlet and spatial domain via Kernel ELM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191115 |