CN114187261A - Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism - Google Patents
Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism Download PDFInfo
- Publication number
- CN114187261A CN114187261A CN202111507792.8A CN202111507792A CN114187261A CN 114187261 A CN114187261 A CN 114187261A CN 202111507792 A CN202111507792 A CN 202111507792A CN 114187261 A CN114187261 A CN 114187261A
- Authority
- CN
- China
- Prior art keywords
- image
- feature map
- network
- attention
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000007246 mechanism Effects 0.000 title claims abstract description 22
- 238000013441 quality evaluation Methods 0.000 title claims abstract description 18
- 230000004927 fusion Effects 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000011176 pooling Methods 0.000 claims abstract description 15
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 24
- 230000009467 reduction Effects 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 238000005096 rolling process Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 description 21
- 230000000007 visual effect Effects 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000012733 comparative method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30168—Image quality inspection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a non-reference stereo image quality evaluation method based on a multidimensional attention mechanism, which comprises the following steps: preprocessing an original stereo image for training, converting the original stereo image into a gray image and dividing the gray image into non-overlapping small image blocks, giving a real quality fraction of the image to each image block, and randomly selecting a plurality of image blocks as the input of a network model; training a convolutional neural network based on multi-dimensional attention, wherein the method comprises the following steps: (1) extracting primary features from left and right views by using a group of CCP modules for convolution and pooling operation, and processing the left and right views to obtain a primary feature map; (2) sending the primary feature maps of the left view and the right view into a view fusion sub-network, and calculating a fusion feature map; (3) inputting the fusion feature map into a multi-scale feature enhancement sub-network based on multi-dimensional attention, and predicting the quality score of the image; (4) calculating a loss function of the network, and performing iterative training; and (5) carrying out image quality evaluation by using the trained network.
Description
Technical Field
The invention relates to the field of stereo image quality evaluation, in particular to a non-reference evaluation algorithm for simulating binocular competition and visual attention mechanism.
Background
The stereo image quality evaluation algorithm can be divided into subjective evaluation and objective evaluation according to different evaluation subjects. The subjective evaluation algorithm requires the tested personnel to score the image quality according to various given indexes under a certain experimental environment, and then the average score of the image is calculated. Usually, the subjective evaluation will result in both a subjective Opinion Score (MOS) and a Differential Opinion Score (DMOS). The objective evaluation algorithm simulates a human visual system by means of a mathematical model, and then evaluates the image quality. Since a person is the ultimate recipient of an image, subjective evaluation algorithms are typically more accurate. However, subjective evaluation has the disadvantages of time consumption, incapability of real-time evaluation, high cost and the like, and is easily influenced by the testee. Compared with subjective evaluation, objective quality evaluation based on the algorithm does not need a large amount of manual participation, and only needs to design a corresponding prediction model, and the quality score of the image can be obtained through the processes of stereo image feature extraction, model training and the like, so that the objective quality evaluation method becomes a research key point.
The objective stereoscopic Image Quality evaluation (SIQA) method can be classified into three types, i.e., Full Reference (FR), half Reference (RR), and No Reference (No Reference, NR), according to the degree of dependence on a Reference Image in Image Quality evaluation. In practical environments, since a reference image may not be available or is difficult to obtain, NR-SIQA that does not depend on the reference image has a wider application range and is gradually becoming the mainstream research direction.
Early NR-SIQA applied a sophisticated planar image quality evaluation algorithm directly to individual views of a stereoscopic image, and then expressed the quality score of the stereoscopic image using the average of the left and right views. However, these algorithms do not consider binocular vision characteristics, and thus cannot accurately evaluate the quality of a stereoscopic image. With the deep understanding of the mechanism of human brain vision, some methods based on parallax response and binocular vision characteristics are proposed. And partial methods are combined with the visual saliency model to further simulate a human visual information processing mechanism. However, due to the hierarchical structure of the Human Visual System (HVS) and its complexity, the performance of the current SIQA method based on manual feature extraction is not ideal.
With the rise of deep learning, in recent years, attempts have been made to solve the problem of image quality evaluation by using deep learning. Unlike manual feature extraction, deep learning methods typically use a Convolutional Neural Network (CNN) model to automatically extract features. Due to a large number of parameters and self-learning capability in the network, the SIQA method based on the CNN obtains accurate evaluation performance.
Disclosure of Invention
The invention provides a non-reference stereo image quality evaluation algorithm based on a multidimensional attention mechanism, which can better simulate binocular competition and the visual attention mechanism of a human visual system, and the technical scheme is as follows:
a non-reference stereo image quality evaluation method based on a multi-dimensional attention mechanism is characterized by comprising the following steps:
firstly, preprocessing an original stereo image for training, converting the original stereo image into a gray image and dividing the gray image into non-overlapping small image blocks, giving a real quality fraction of the image to each image block, and randomly selecting a plurality of image blocks as the input of a network model;
secondly, training a convolutional neural network based on multidimensional attention, wherein the method comprises the following steps:
(1) extracting primary features from left and right views by using a group of CCP modules for convolution and pooling operation, and processing the left and right views to obtain a primary feature map;
(2) sending the primary feature maps of the left view and the right view into a view fusion sub-network, and calculating a fusion feature map: the view fusion sub-network comprises a multidimensional attention module, and the module consists of a channel attention module and a space attention module; in the channel attention module, an input primary feature map passes through two same branches, channel dimensionality reduction is carried out in each branch, then global average pooling is carried out, the weight of each channel is obtained through a full-connection layer and a Sigmoid activation function, and the feature map of each channel is weighted to obtain a feature map weighted by channel attention; in a space attention module, performing dimension transformation on the feature maps weighted by the attention of two parallel channels, performing matrix multiplication operation, obtaining the weight of each view combining channel and space attention through a Softmax activation function, and weighting the primary feature maps of the left view and the right view by using the weight to obtain a fusion feature map;
(3) inputting the fused feature map into a multi-scale feature enhancement sub-network based on multi-dimensional attention, and predicting the quality scores of the images: extracting feature maps of the fusion feature map subjected to dimension transformation on three different scales by three groups of CCP modules which are operated in rolling and pooling on the basis of the multi-dimensional attention multi-scale feature enhancement sub-network, wherein the feature map on the minimum scale is called as an original deep fusion feature map; inputting the feature map on each scale into a multi-dimensional attention module, and reducing dimensions through a channel to obtain three dimension-reduced feature maps; the two feature maps after dimensionality reduction are subjected to channel weighting through a channel attention module, then subjected to dimensionality transformation, matrix multiplication operation and Softmax activation function processing to obtain a multidimensional attention weight, and the weight is used for weighting a third feature map in the three feature maps after dimensionality reduction to obtain a feature map based on a multidimensional attention mechanism; fusing the feature maps obtained on three scales based on the multi-dimensional attention mechanism by an up-sampling method, and performing feature extraction on the fused feature maps by using three CCP modules to obtain a deep multi-scale feature enhancement feature map based on the multi-dimensional attention; adding the deep multi-scale feature enhancement feature map based on multi-dimensional attention and the original deep fusion feature map to obtain an enhanced feature map, and sending the enhanced feature map into a full-connection layer to predict the image quality score;
(4) calculating a loss function of the network, and performing iterative training: after the predicted image quality fraction is obtained, calculating a loss function of the network, wherein the loss function adopts Root Mean Square Error (RMSE) and adds L2 regularization for preventing an overfitting phenomenon, so that the difference between the image quality fraction predicted by the network and the real image quality fraction is measured, and through multiple iterations in the training process, the network parameters are continuously updated to minimize the loss function, so that the image quality fraction predicted by the network is closer to the real fraction, and a trained network model is obtained.
And thirdly, evaluating the image quality by using the trained network.
Wherein the CCP module comprises two 3 x 3 convolutional layers and one pooling layer.
Further, in the step (2) of the second step, channel dimensionality reduction is performed in each branch through a convolution kernel with the size of 1 × 1, then global average pooling is performed, the weight of each channel is obtained through a full connection layer and a Sigmoid activation function, and a feature map of each channel is weighted by using Scale operation to obtain a feature map weighted by channel attention.
Further, in the step (3) of the second step, the feature map on each scale is input into a multidimensional attention module, and the channel dimensionality reduction is performed through three parallel 1 × 1 convolution operations to obtain three dimensionality-reduced feature maps.
Further, the method in the third step is as follows: and preprocessing the stereo image to be evaluated, inputting the preprocessed stereo image into a network, and averaging the quality scores of the image blocks output by the network to obtain the quality score of the whole image.
The technical scheme provided by the invention has the beneficial effects that: the invention fully utilizes HVS, calculates the weight of the left view and the right view through a multidimensional attention mechanism, and is used for weighting the left view and the right view to obtain a fused view, thereby simulating the binocular fusion and binocular competition mechanism of the HVS. By performing multi-scale feature extraction on the fusion view and performing feature enhancement on different scale features by using multi-dimensional attention so as to distribute weights to different scale information, the visual attention mechanism of the HVS can be simulated. The characteristics ensure that the method can be used in technical practice, such as in the transmission performance evaluation of new media such as 3D televisions, 3D movies and the like, the algorithm evaluation result has high consistency with the subjective evaluation result of human eyes, and has important value.
Drawings
FIG. 1 Algorithm Overall Block diagram
FIG. 2 Multi-dimensional attention Module Block diagram
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
The embodiment of the invention provides a non-reference stereo image quality evaluation algorithm based on a multi-dimensional attention mechanism, and the invention is further explained by combining the attached drawings. The invention is realized by the following steps:
first, the original stereo image used for training is preprocessed.
The original stereo image for training is converted into a grayscale image, the left view and the right view are divided into 220 32 × 32 image blocks, each image block is given a real quality fraction of the image, and 30% of the image blocks, namely 66 image blocks, are selected as the input of the network. The random selection mode can reduce the time complexity of image preprocessing and improve the generalization capability of the network.
In a second step, a multidimensional attention-based convolutional neural network is trained, the network comprising a multidimensional attention-based view fusion sub-network and a multi-scale feature enhancement sub-network. The multi-dimensional attention-based view fusion sub-network comprises a multi-dimensional attention module used for calculating the weight of the left view and the right view and weighting the left view and the right view to obtain a fusion view. The fusion characteristic diagram is subjected to multi-scale and attention weighting to obtain an enhanced characteristic diagram through a multi-scale characteristic enhancement sub-network based on multi-dimensional attention, and finally the quality score of the image is obtained by adopting a full connection layer.
(1) Primary features are extracted for left and right views using a set of CCP modules that perform convolution and pooling operations.
The left and right views are initially processed using the CCP module to obtain a 16 × 16 × 32 primary feature map. Wherein the CCP module includes two 3 x 3 convolutional layers and one pooling layer.
(2) And sending the primary feature maps of the left view and the right view into a view fusion sub-network, and calculating a fusion feature map.
A network structure of a multi-dimensional attention-based view converged sub-network is shown in fig. 1. The sub-network is composed of two parts, namely a channel attention module and a space attention module. In the channel attention module, an input primary feature map passes through two same branches, channel dimensionality reduction of the feature map is carried out in each branch by using a convolution kernel with the size of 1 × 1, a 1 × 1 × 16 feature map is obtained by using global average pooling, the weight of each channel is obtained through two full-connection layers and a Sigmoid activation function, and the feature map of each channel is weighted by Scale operation to obtain a 16 × 16 × 16 feature map with weighted channel attention. In the spatial attention module, dimension transformation is carried out on the parallel two-channel attention weighted feature map, matrix multiplication operation is carried out, and weights of the left view, the right view, the combined channel and the spatial attention are obtained through a Softmax activation function. And performing weighted fusion on the primary characteristic graphs of the left view and the right view and the weight, thereby simulating the binocular fusion and binocular competition mechanism of the HVS. The calculation formula is as follows:
where FMC、FMLAnd FMRRespectively showing a fused feature map, a left view primary feature map and a right view primary feature map, WLAnd WRRespectively left and right view weights calculated based on a multidimensional attention mechanism,andrespectively representing matrix addition and matrix multiplication.
(3) And inputting the fused feature map into a multi-scale feature enhancement sub-network based on multi-dimensional attention, and predicting the quality score of the image.
The network structure of the multi-scale feature enhancement subnetwork based on multi-dimensional attention is shown in fig. 1. Extracting feature maps of a 16 × 16 × 32 fusion feature map with changed dimensions on three scales by the sub-network through three groups of convolution and pooling operations (CCP), wherein the feature maps are respectively 8 × 8 × 64, 4 × 4 × 128 and 2 × 2 × 256; the 2 × 2 × 256 feature map is referred to as an original deep feature map. The feature map at each scale is passed through a multi-dimensional attention module, the structure of which is shown in FIG. 2.
In the module, the input feature map is subjected to channel dimensionality reduction through three parallel 1 × 1 convolution operations to obtain three dimensionality-reduced feature maps. And weighting the channels of the two feature maps through a channel attention module, carrying out dimension transformation, executing matrix multiplication operation and a Softmax activation function to obtain multidimensional attention weight, weighting the third feature map in the three dimension-reduced feature maps by using the weight, and carrying out 1 x 1 convolution operation to obtain the feature map based on the multidimensional attention mechanism.
8 × 8 × 64, 4 × 4 × 128 and 2 × 2 × 256 feature maps based on a multi-dimensional attention mechanism are obtained on three scales, and the three feature maps are fused by an up-sampling method to obtain a 16 × 16 × 32 feature map after attention weighting. The operation can assign corresponding weights to feature maps of different scales, so as to simulate the attention degree of HVS to objects of different sizes in the image. And performing deep feature extraction on the feature map subjected to attention weighting by using three CCP modules, adding the deep feature map and the original deep feature map to obtain an enhanced 2 x 256 feature map, and obtaining a predicted value of the image quality score through a full connection layer.
(4) A loss function of the network is calculated.
During the network training process, the loss function of the network model adopts Root Mean Square Error (RMSE) and adds L2 regularization for preventing the over-fitting phenomenon, and the loss function is calculated as follows:
where N denotes the number of image blocks, qiTrue Differential Mean Opinion Score (DMOS), q representing an imageiRepresenting the predicted values of the network model, the second part of the formula is the L2 regularization term, α represents the regularization coefficient, and ω is the weight vector in the network training. The calculated loss function can reflect the difference between the prediction result and the true value, and the network parameters are continuously updated to minimize the loss function through multiple iterations in the training process, so that the image quality score predicted by the network is closer to the true score, namely the network performance is better.
And thirdly, evaluating the image quality by using the trained network.
(1) And preprocessing the stereo image to be evaluated.
The stereo image to be evaluated is converted into a gray image, the left view and the right view are divided into 220 32 × 32 image blocks, and 30% of the image blocks, namely 66 image blocks, are selected as the input of the network.
(2) And calculating the quality score of the stereo image to be evaluated.
The image block quality scores of the same stereo image output by the network are averaged to obtain the quality score of the whole stereo image, and the calculation formula is as follows:
where Q represents the quality score of the entire stereoscopic image. And obtaining the SROCC and the PLCC by using the final prediction result and the real DMOS value of the stereo image so as to evaluate the network performance.
The parameters of the whole network are detailed in table 1.
TABLE 1 network architecture parameters
Example 3
The feasibility of the protocol of example 1 was verified in conjunction with specific experiments, as described in detail below:
this experiment used LIVE 3D, two public 3D image databases of hydroloo IVC 3D to test the performance. Each database contains a number of images with different distortion types. The quality of the image is described by Mean Opinion Scores (MOSs) or Differential Mean Opinion Scores (DMOS), where a larger MOS value indicates a better image quality and a lower DMOS value indicates a better image quality.
In the process of measuring whether the objective evaluation algorithm has accuracy, monotonicity and consistency, the following two common indexes are generally adopted, which are respectively: the spearman rank order correlation coefficient SROCC and the pearson linear correlation coefficient PLCC. SROCC describes the monotonicity of an image quality assessment algorithm, and the expression is as follows:
in equation 4, the parameter diRepresenting the difference between the objective score of the ith image and its subjective quality score ranking. I then represents the total number of images contained in the database. PLCC is a linear correlation coefficient between objective scores obtained by an algorithm and subjective quality scores of images after nonlinear regression processing, and the calculation formula is as follows:
in the formula 5, qiAnd SFiRespectively representing the subjective score and the predicted value, mu, of the ith imageqAndrespectively, represent the mean of the two. Both correlation coefficients have values in the range of-1 to 1, with larger values indicating better network performance.
The consistency of the score of the objective quality evaluation algorithm and the subjective score DMOS in the database is measured by adopting the spearman grade order correlation coefficient and the pearson linear correlation coefficient. The higher the correlation between the subjective score and the objective score, the better the performance of the algorithm.
In order to verify the performance of the invention, 7 mainstream non-reference stereo image quality evaluation algorithms are selected as comparison on a LIVE 3D database. These algorithms include 4 traditional evaluation algorithms (3D-AdaBoost, BVCDP, BSFML, and SA) and 3 CNN-based evaluation algorithms (DCNN, RM-CNN, and VSM-CNN). On the Waterloo IVC 3D database, since evaluation algorithms based on deep learning are rarely tested, we chose 4 traditional evaluation algorithms as comparisons, including SINQ, DBN, BSIQE and BVCDP. The results are shown in tables 2 and 3.
TABLE 2 LIVE 3D image database-based algorithmic Performance comparison
TABLE 3 comparison of Algorithm Performance based on Waterloo IVC 3D image database
TABLE 4 specific distortion type Performance comparison based on LIVE 3D Phase I image database
TABLE 5 specific distortion type Performance comparison based on LIVE 3D Phase II image database
Tables 4 and 5 show the results of the present invention for specific distortion types in LIVE 3D Phase i and LIVE 3D Phase ii databases. In each column, the best performing results are shown in bold. As can be seen from tables 4 and 5, the present invention outperforms all comparative methods in both SROCC and PLCC on multiple distortion type images, and the SROCC and PLCC mean values on these particular distortion types exceed 0.9. In general, compared with other networks, the method can adapt to various different distortion conditions, and has high consistency with human subjective evaluation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (5)
1. A non-reference stereo image quality evaluation method based on a multi-dimensional attention mechanism is characterized by comprising the following steps:
firstly, preprocessing an original stereo image for training, converting the original stereo image into a gray image and dividing the gray image into non-overlapping small image blocks, giving a real quality fraction of the image to each image block, and randomly selecting a plurality of image blocks as the input of a network model;
in the second step, a convolutional neural network based on multidimensional attention is trained. The method comprises the following steps:
(1) extracting primary features from left and right views by using a group of CCP modules for convolution and pooling operation, and processing the left and right views to obtain a primary feature map;
(2) sending the primary feature maps of the left view and the right view into a view fusion sub-network, and calculating a fusion feature map: the view fusion sub-network comprises a multidimensional attention module, and the module consists of a channel attention module and a space attention module; in the channel attention module, an input primary feature map passes through two same branches, channel dimensionality reduction is carried out in each branch, then global average pooling is carried out, the weight of each channel is obtained through a full-connection layer and a Sigmoid activation function, and the feature map of each channel is weighted to obtain a feature map weighted by channel attention; in a space attention module, performing dimension transformation on the feature maps weighted by the attention of two parallel channels, performing matrix multiplication operation, obtaining the weight of each view combining channel and space attention through a Softmax activation function, and weighting the primary feature maps of the left view and the right view by using the weight to obtain a fusion feature map;
(3) inputting the fused feature map into a multi-scale feature enhancement sub-network based on multi-dimensional attention, and predicting the quality scores of the images: extracting feature maps of the fusion feature map subjected to dimension transformation on three different scales by three groups of CCP modules which are operated in rolling and pooling on the basis of the multi-dimensional attention multi-scale feature enhancement sub-network, wherein the feature map on the minimum scale is called as an original deep fusion feature map; inputting the feature map on each scale into a multi-dimensional attention module, and reducing dimensions through a channel to obtain three dimension-reduced feature maps; the two feature maps after dimensionality reduction are subjected to channel weighting through a channel attention module, then subjected to dimensionality transformation, matrix multiplication operation and Softmax activation function processing to obtain a multidimensional attention weight, and the weight is used for weighting a third feature map in the three feature maps after dimensionality reduction to obtain a feature map based on a multidimensional attention mechanism; fusing the feature maps obtained on three scales based on the multi-dimensional attention mechanism by an up-sampling method, and performing feature extraction on the fused feature maps by using three CCP modules to obtain a deep multi-scale feature enhancement feature map based on the multi-dimensional attention; adding the deep multi-scale feature enhancement feature map based on multi-dimensional attention and the original deep fusion feature map to obtain an enhanced feature map, and sending the enhanced feature map into a full-connection layer to predict the image quality score;
(4) calculating a loss function of the network, and performing iterative training: after the predicted image quality fraction is obtained, calculating a loss function of the network, wherein the loss function adopts Root Mean Square Error (RMSE) and adds L2 regularization for preventing an overfitting phenomenon, so that the difference between the image quality fraction predicted by the network and the real image quality fraction is measured, and through multiple iterations in the training process, the network parameters are continuously updated to minimize the loss function, so that the image quality fraction predicted by the network is closer to the real fraction, and a trained network model is obtained.
And thirdly, evaluating the image quality by using the trained network.
2. The method of claim 1, wherein the CCP module includes two 3 × 3 convolutional layers and one pooling layer.
3. The method for evaluating the quality of the non-reference stereo image according to claim 1, wherein in the step (2) of the second step, channel dimensionality reduction is performed in each branch through a convolution kernel with the size of 1 x 1, then global average pooling is performed, the weight of each channel is obtained through a full connection layer and a Sigmoid activation function, and a Scale operation is used for weighting the feature map of each channel to obtain a channel attention weighted feature map.
4. The method for evaluating the quality of the non-reference stereo image according to claim 1, wherein in the step (3) of the second step, the feature map on each scale is input into a multi-dimensional attention module, and the three feature maps after dimension reduction are obtained by performing channel dimension reduction through three parallel 1 x 1 convolution operations.
5. The method for evaluating the quality of a reference-free stereoscopic image according to claim 1, wherein the third step is a method comprising: and preprocessing the stereo image to be evaluated, inputting the preprocessed stereo image into a network, and averaging the quality scores of the image blocks output by the network to obtain the quality score of the whole image.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111487714 | 2021-12-07 | ||
CN2021114877146 | 2021-12-07 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114187261A true CN114187261A (en) | 2022-03-15 |
CN114187261B CN114187261B (en) | 2024-08-27 |
Family
ID=80543149
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111507792.8A Active CN114187261B (en) | 2021-12-07 | 2021-12-10 | Multi-dimensional attention mechanism-based non-reference stereoscopic image quality evaluation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114187261B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114897854A (en) * | 2022-05-20 | 2022-08-12 | 辽宁大学 | No-reference stereo image quality evaluation method based on double-current interactive network |
CN115272776A (en) * | 2022-09-26 | 2022-11-01 | 山东锋士信息技术有限公司 | Hyperspectral image classification method based on double-path convolution and double attention and storage medium |
CN115661911A (en) * | 2022-12-23 | 2023-01-31 | 四川轻化工大学 | Face feature extraction method, device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110060236A (en) * | 2019-03-27 | 2019-07-26 | 天津大学 | Stereo image quality evaluation method based on depth convolutional neural networks |
CN112183645A (en) * | 2020-09-30 | 2021-01-05 | 深圳龙岗智能视听研究院 | Image aesthetic quality evaluation method based on context-aware attention mechanism |
CN112634238A (en) * | 2020-12-25 | 2021-04-09 | 武汉大学 | Image quality evaluation method based on attention module |
CN112884682A (en) * | 2021-01-08 | 2021-06-01 | 福州大学 | Stereo image color correction method and system based on matching and fusion |
CN113706386A (en) * | 2021-09-04 | 2021-11-26 | 大连钜智信息科技有限公司 | Super-resolution reconstruction method based on attention mechanism |
-
2021
- 2021-12-10 CN CN202111507792.8A patent/CN114187261B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110060236A (en) * | 2019-03-27 | 2019-07-26 | 天津大学 | Stereo image quality evaluation method based on depth convolutional neural networks |
CN112183645A (en) * | 2020-09-30 | 2021-01-05 | 深圳龙岗智能视听研究院 | Image aesthetic quality evaluation method based on context-aware attention mechanism |
CN112634238A (en) * | 2020-12-25 | 2021-04-09 | 武汉大学 | Image quality evaluation method based on attention module |
CN112884682A (en) * | 2021-01-08 | 2021-06-01 | 福州大学 | Stereo image color correction method and system based on matching and fusion |
CN113706386A (en) * | 2021-09-04 | 2021-11-26 | 大连钜智信息科技有限公司 | Super-resolution reconstruction method based on attention mechanism |
Non-Patent Citations (1)
Title |
---|
富振奇;费延佳;杨艳;邵枫;: "基于深层特征学习的无参考立体图像质量评价", 光电子・激光, no. 05, 15 May 2018 (2018-05-15) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114897854A (en) * | 2022-05-20 | 2022-08-12 | 辽宁大学 | No-reference stereo image quality evaluation method based on double-current interactive network |
CN115272776A (en) * | 2022-09-26 | 2022-11-01 | 山东锋士信息技术有限公司 | Hyperspectral image classification method based on double-path convolution and double attention and storage medium |
CN115272776B (en) * | 2022-09-26 | 2023-01-20 | 山东锋士信息技术有限公司 | Hyperspectral image classification method based on double-path convolution and double attention and storage medium |
CN115661911A (en) * | 2022-12-23 | 2023-01-31 | 四川轻化工大学 | Face feature extraction method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114187261B (en) | 2024-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114187261B (en) | Multi-dimensional attention mechanism-based non-reference stereoscopic image quality evaluation method | |
CN111182292B (en) | No-reference video quality evaluation method and system, video receiver and intelligent terminal | |
CN109360178B (en) | Fusion image-based non-reference stereo image quality evaluation method | |
CN110060236B (en) | Stereoscopic image quality evaluation method based on depth convolution neural network | |
CN110728656A (en) | Meta-learning-based no-reference image quality data processing method and intelligent terminal | |
CN110516716B (en) | No-reference image quality evaluation method based on multi-branch similarity network | |
CN109872305B (en) | No-reference stereo image quality evaluation method based on quality map generation network | |
CN110570363A (en) | Image defogging method based on Cycle-GAN with pyramid pooling and multi-scale discriminator | |
Yue et al. | Blind stereoscopic 3D image quality assessment via analysis of naturalness, structure, and binocular asymmetry | |
CN108389192A (en) | Stereo-picture Comfort Evaluation method based on convolutional neural networks | |
CN111429402B (en) | Image quality evaluation method for fusion of advanced visual perception features and depth features | |
CN108235003B (en) | Three-dimensional video quality evaluation method based on 3D convolutional neural network | |
CN112489164B (en) | Image coloring method based on improved depth separable convolutional neural network | |
CN112767385B (en) | No-reference image quality evaluation method based on significance strategy and feature fusion | |
CN112419242A (en) | No-reference image quality evaluation method based on self-attention mechanism GAN network | |
CN109816646B (en) | Non-reference image quality evaluation method based on degradation decision logic | |
Si et al. | A no-reference stereoscopic image quality assessment network based on binocular interaction and fusion mechanisms | |
CN109859166A (en) | It is a kind of based on multiple row convolutional neural networks without ginseng 3D rendering method for evaluating quality | |
CN113554599A (en) | Video quality evaluation method based on human visual effect | |
CN115205196A (en) | No-reference image quality evaluation method based on twin network and feature fusion | |
Jiang et al. | Stereoscopic image quality assessment by learning non-negative matrix factorization-based color visual characteristics and considering binocular interactions | |
CN111667407A (en) | Image super-resolution method guided by depth information | |
CN114066812B (en) | No-reference image quality evaluation method based on spatial attention mechanism | |
CN114972232A (en) | No-reference image quality evaluation method based on incremental meta-learning | |
CN106022362A (en) | Reference-free image quality objective evaluation method for JPEG2000 compression distortion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |