CN114187261A - Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism - Google Patents

Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism Download PDF

Info

Publication number
CN114187261A
CN114187261A CN202111507792.8A CN202111507792A CN114187261A CN 114187261 A CN114187261 A CN 114187261A CN 202111507792 A CN202111507792 A CN 202111507792A CN 114187261 A CN114187261 A CN 114187261A
Authority
CN
China
Prior art keywords
image
feature map
network
attention
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111507792.8A
Other languages
Chinese (zh)
Other versions
CN114187261B (en
Inventor
沈丽丽
李昕彤
潘兆庆
陈雄飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Publication of CN114187261A publication Critical patent/CN114187261A/en
Application granted granted Critical
Publication of CN114187261B publication Critical patent/CN114187261B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a non-reference stereo image quality evaluation method based on a multidimensional attention mechanism, which comprises the following steps: preprocessing an original stereo image for training, converting the original stereo image into a gray image and dividing the gray image into non-overlapping small image blocks, giving a real quality fraction of the image to each image block, and randomly selecting a plurality of image blocks as the input of a network model; training a convolutional neural network based on multi-dimensional attention, wherein the method comprises the following steps: (1) extracting primary features from left and right views by using a group of CCP modules for convolution and pooling operation, and processing the left and right views to obtain a primary feature map; (2) sending the primary feature maps of the left view and the right view into a view fusion sub-network, and calculating a fusion feature map; (3) inputting the fusion feature map into a multi-scale feature enhancement sub-network based on multi-dimensional attention, and predicting the quality score of the image; (4) calculating a loss function of the network, and performing iterative training; and (5) carrying out image quality evaluation by using the trained network.

Description

Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism
Technical Field
The invention relates to the field of stereo image quality evaluation, in particular to a non-reference evaluation algorithm for simulating binocular competition and visual attention mechanism.
Background
The stereo image quality evaluation algorithm can be divided into subjective evaluation and objective evaluation according to different evaluation subjects. The subjective evaluation algorithm requires the tested personnel to score the image quality according to various given indexes under a certain experimental environment, and then the average score of the image is calculated. Usually, the subjective evaluation will result in both a subjective Opinion Score (MOS) and a Differential Opinion Score (DMOS). The objective evaluation algorithm simulates a human visual system by means of a mathematical model, and then evaluates the image quality. Since a person is the ultimate recipient of an image, subjective evaluation algorithms are typically more accurate. However, subjective evaluation has the disadvantages of time consumption, incapability of real-time evaluation, high cost and the like, and is easily influenced by the testee. Compared with subjective evaluation, objective quality evaluation based on the algorithm does not need a large amount of manual participation, and only needs to design a corresponding prediction model, and the quality score of the image can be obtained through the processes of stereo image feature extraction, model training and the like, so that the objective quality evaluation method becomes a research key point.
The objective stereoscopic Image Quality evaluation (SIQA) method can be classified into three types, i.e., Full Reference (FR), half Reference (RR), and No Reference (No Reference, NR), according to the degree of dependence on a Reference Image in Image Quality evaluation. In practical environments, since a reference image may not be available or is difficult to obtain, NR-SIQA that does not depend on the reference image has a wider application range and is gradually becoming the mainstream research direction.
Early NR-SIQA applied a sophisticated planar image quality evaluation algorithm directly to individual views of a stereoscopic image, and then expressed the quality score of the stereoscopic image using the average of the left and right views. However, these algorithms do not consider binocular vision characteristics, and thus cannot accurately evaluate the quality of a stereoscopic image. With the deep understanding of the mechanism of human brain vision, some methods based on parallax response and binocular vision characteristics are proposed. And partial methods are combined with the visual saliency model to further simulate a human visual information processing mechanism. However, due to the hierarchical structure of the Human Visual System (HVS) and its complexity, the performance of the current SIQA method based on manual feature extraction is not ideal.
With the rise of deep learning, in recent years, attempts have been made to solve the problem of image quality evaluation by using deep learning. Unlike manual feature extraction, deep learning methods typically use a Convolutional Neural Network (CNN) model to automatically extract features. Due to a large number of parameters and self-learning capability in the network, the SIQA method based on the CNN obtains accurate evaluation performance.
Disclosure of Invention
The invention provides a non-reference stereo image quality evaluation algorithm based on a multidimensional attention mechanism, which can better simulate binocular competition and the visual attention mechanism of a human visual system, and the technical scheme is as follows:
a non-reference stereo image quality evaluation method based on a multi-dimensional attention mechanism is characterized by comprising the following steps:
firstly, preprocessing an original stereo image for training, converting the original stereo image into a gray image and dividing the gray image into non-overlapping small image blocks, giving a real quality fraction of the image to each image block, and randomly selecting a plurality of image blocks as the input of a network model;
secondly, training a convolutional neural network based on multidimensional attention, wherein the method comprises the following steps:
(1) extracting primary features from left and right views by using a group of CCP modules for convolution and pooling operation, and processing the left and right views to obtain a primary feature map;
(2) sending the primary feature maps of the left view and the right view into a view fusion sub-network, and calculating a fusion feature map: the view fusion sub-network comprises a multidimensional attention module, and the module consists of a channel attention module and a space attention module; in the channel attention module, an input primary feature map passes through two same branches, channel dimensionality reduction is carried out in each branch, then global average pooling is carried out, the weight of each channel is obtained through a full-connection layer and a Sigmoid activation function, and the feature map of each channel is weighted to obtain a feature map weighted by channel attention; in a space attention module, performing dimension transformation on the feature maps weighted by the attention of two parallel channels, performing matrix multiplication operation, obtaining the weight of each view combining channel and space attention through a Softmax activation function, and weighting the primary feature maps of the left view and the right view by using the weight to obtain a fusion feature map;
(3) inputting the fused feature map into a multi-scale feature enhancement sub-network based on multi-dimensional attention, and predicting the quality scores of the images: extracting feature maps of the fusion feature map subjected to dimension transformation on three different scales by three groups of CCP modules which are operated in rolling and pooling on the basis of the multi-dimensional attention multi-scale feature enhancement sub-network, wherein the feature map on the minimum scale is called as an original deep fusion feature map; inputting the feature map on each scale into a multi-dimensional attention module, and reducing dimensions through a channel to obtain three dimension-reduced feature maps; the two feature maps after dimensionality reduction are subjected to channel weighting through a channel attention module, then subjected to dimensionality transformation, matrix multiplication operation and Softmax activation function processing to obtain a multidimensional attention weight, and the weight is used for weighting a third feature map in the three feature maps after dimensionality reduction to obtain a feature map based on a multidimensional attention mechanism; fusing the feature maps obtained on three scales based on the multi-dimensional attention mechanism by an up-sampling method, and performing feature extraction on the fused feature maps by using three CCP modules to obtain a deep multi-scale feature enhancement feature map based on the multi-dimensional attention; adding the deep multi-scale feature enhancement feature map based on multi-dimensional attention and the original deep fusion feature map to obtain an enhanced feature map, and sending the enhanced feature map into a full-connection layer to predict the image quality score;
(4) calculating a loss function of the network, and performing iterative training: after the predicted image quality fraction is obtained, calculating a loss function of the network, wherein the loss function adopts Root Mean Square Error (RMSE) and adds L2 regularization for preventing an overfitting phenomenon, so that the difference between the image quality fraction predicted by the network and the real image quality fraction is measured, and through multiple iterations in the training process, the network parameters are continuously updated to minimize the loss function, so that the image quality fraction predicted by the network is closer to the real fraction, and a trained network model is obtained.
And thirdly, evaluating the image quality by using the trained network.
Wherein the CCP module comprises two 3 x 3 convolutional layers and one pooling layer.
Further, in the step (2) of the second step, channel dimensionality reduction is performed in each branch through a convolution kernel with the size of 1 × 1, then global average pooling is performed, the weight of each channel is obtained through a full connection layer and a Sigmoid activation function, and a feature map of each channel is weighted by using Scale operation to obtain a feature map weighted by channel attention.
Further, in the step (3) of the second step, the feature map on each scale is input into a multidimensional attention module, and the channel dimensionality reduction is performed through three parallel 1 × 1 convolution operations to obtain three dimensionality-reduced feature maps.
Further, the method in the third step is as follows: and preprocessing the stereo image to be evaluated, inputting the preprocessed stereo image into a network, and averaging the quality scores of the image blocks output by the network to obtain the quality score of the whole image.
The technical scheme provided by the invention has the beneficial effects that: the invention fully utilizes HVS, calculates the weight of the left view and the right view through a multidimensional attention mechanism, and is used for weighting the left view and the right view to obtain a fused view, thereby simulating the binocular fusion and binocular competition mechanism of the HVS. By performing multi-scale feature extraction on the fusion view and performing feature enhancement on different scale features by using multi-dimensional attention so as to distribute weights to different scale information, the visual attention mechanism of the HVS can be simulated. The characteristics ensure that the method can be used in technical practice, such as in the transmission performance evaluation of new media such as 3D televisions, 3D movies and the like, the algorithm evaluation result has high consistency with the subjective evaluation result of human eyes, and has important value.
Drawings
FIG. 1 Algorithm Overall Block diagram
FIG. 2 Multi-dimensional attention Module Block diagram
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
Example 1
The embodiment of the invention provides a non-reference stereo image quality evaluation algorithm based on a multi-dimensional attention mechanism, and the invention is further explained by combining the attached drawings. The invention is realized by the following steps:
first, the original stereo image used for training is preprocessed.
The original stereo image for training is converted into a grayscale image, the left view and the right view are divided into 220 32 × 32 image blocks, each image block is given a real quality fraction of the image, and 30% of the image blocks, namely 66 image blocks, are selected as the input of the network. The random selection mode can reduce the time complexity of image preprocessing and improve the generalization capability of the network.
In a second step, a multidimensional attention-based convolutional neural network is trained, the network comprising a multidimensional attention-based view fusion sub-network and a multi-scale feature enhancement sub-network. The multi-dimensional attention-based view fusion sub-network comprises a multi-dimensional attention module used for calculating the weight of the left view and the right view and weighting the left view and the right view to obtain a fusion view. The fusion characteristic diagram is subjected to multi-scale and attention weighting to obtain an enhanced characteristic diagram through a multi-scale characteristic enhancement sub-network based on multi-dimensional attention, and finally the quality score of the image is obtained by adopting a full connection layer.
(1) Primary features are extracted for left and right views using a set of CCP modules that perform convolution and pooling operations.
The left and right views are initially processed using the CCP module to obtain a 16 × 16 × 32 primary feature map. Wherein the CCP module includes two 3 x 3 convolutional layers and one pooling layer.
(2) And sending the primary feature maps of the left view and the right view into a view fusion sub-network, and calculating a fusion feature map.
A network structure of a multi-dimensional attention-based view converged sub-network is shown in fig. 1. The sub-network is composed of two parts, namely a channel attention module and a space attention module. In the channel attention module, an input primary feature map passes through two same branches, channel dimensionality reduction of the feature map is carried out in each branch by using a convolution kernel with the size of 1 × 1, a 1 × 1 × 16 feature map is obtained by using global average pooling, the weight of each channel is obtained through two full-connection layers and a Sigmoid activation function, and the feature map of each channel is weighted by Scale operation to obtain a 16 × 16 × 16 feature map with weighted channel attention. In the spatial attention module, dimension transformation is carried out on the parallel two-channel attention weighted feature map, matrix multiplication operation is carried out, and weights of the left view, the right view, the combined channel and the spatial attention are obtained through a Softmax activation function. And performing weighted fusion on the primary characteristic graphs of the left view and the right view and the weight, thereby simulating the binocular fusion and binocular competition mechanism of the HVS. The calculation formula is as follows:
Figure BDA0003403906050000041
where FMC、FMLAnd FMRRespectively showing a fused feature map, a left view primary feature map and a right view primary feature map, WLAnd WRRespectively left and right view weights calculated based on a multidimensional attention mechanism,
Figure BDA0003403906050000044
and
Figure BDA0003403906050000043
respectively representing matrix addition and matrix multiplication.
(3) And inputting the fused feature map into a multi-scale feature enhancement sub-network based on multi-dimensional attention, and predicting the quality score of the image.
The network structure of the multi-scale feature enhancement subnetwork based on multi-dimensional attention is shown in fig. 1. Extracting feature maps of a 16 × 16 × 32 fusion feature map with changed dimensions on three scales by the sub-network through three groups of convolution and pooling operations (CCP), wherein the feature maps are respectively 8 × 8 × 64, 4 × 4 × 128 and 2 × 2 × 256; the 2 × 2 × 256 feature map is referred to as an original deep feature map. The feature map at each scale is passed through a multi-dimensional attention module, the structure of which is shown in FIG. 2.
In the module, the input feature map is subjected to channel dimensionality reduction through three parallel 1 × 1 convolution operations to obtain three dimensionality-reduced feature maps. And weighting the channels of the two feature maps through a channel attention module, carrying out dimension transformation, executing matrix multiplication operation and a Softmax activation function to obtain multidimensional attention weight, weighting the third feature map in the three dimension-reduced feature maps by using the weight, and carrying out 1 x 1 convolution operation to obtain the feature map based on the multidimensional attention mechanism.
8 × 8 × 64, 4 × 4 × 128 and 2 × 2 × 256 feature maps based on a multi-dimensional attention mechanism are obtained on three scales, and the three feature maps are fused by an up-sampling method to obtain a 16 × 16 × 32 feature map after attention weighting. The operation can assign corresponding weights to feature maps of different scales, so as to simulate the attention degree of HVS to objects of different sizes in the image. And performing deep feature extraction on the feature map subjected to attention weighting by using three CCP modules, adding the deep feature map and the original deep feature map to obtain an enhanced 2 x 256 feature map, and obtaining a predicted value of the image quality score through a full connection layer.
(4) A loss function of the network is calculated.
During the network training process, the loss function of the network model adopts Root Mean Square Error (RMSE) and adds L2 regularization for preventing the over-fitting phenomenon, and the loss function is calculated as follows:
Figure BDA0003403906050000042
where N denotes the number of image blocks, qiTrue Differential Mean Opinion Score (DMOS), q representing an imageiRepresenting the predicted values of the network model, the second part of the formula is the L2 regularization term, α represents the regularization coefficient, and ω is the weight vector in the network training. The calculated loss function can reflect the difference between the prediction result and the true value, and the network parameters are continuously updated to minimize the loss function through multiple iterations in the training process, so that the image quality score predicted by the network is closer to the true score, namely the network performance is better.
And thirdly, evaluating the image quality by using the trained network.
(1) And preprocessing the stereo image to be evaluated.
The stereo image to be evaluated is converted into a gray image, the left view and the right view are divided into 220 32 × 32 image blocks, and 30% of the image blocks, namely 66 image blocks, are selected as the input of the network.
(2) And calculating the quality score of the stereo image to be evaluated.
The image block quality scores of the same stereo image output by the network are averaged to obtain the quality score of the whole stereo image, and the calculation formula is as follows:
Figure BDA0003403906050000051
where Q represents the quality score of the entire stereoscopic image. And obtaining the SROCC and the PLCC by using the final prediction result and the real DMOS value of the stereo image so as to evaluate the network performance.
The parameters of the whole network are detailed in table 1.
TABLE 1 network architecture parameters
Figure BDA0003403906050000052
Figure BDA0003403906050000061
Example 3
The feasibility of the protocol of example 1 was verified in conjunction with specific experiments, as described in detail below:
this experiment used LIVE 3D, two public 3D image databases of hydroloo IVC 3D to test the performance. Each database contains a number of images with different distortion types. The quality of the image is described by Mean Opinion Scores (MOSs) or Differential Mean Opinion Scores (DMOS), where a larger MOS value indicates a better image quality and a lower DMOS value indicates a better image quality.
In the process of measuring whether the objective evaluation algorithm has accuracy, monotonicity and consistency, the following two common indexes are generally adopted, which are respectively: the spearman rank order correlation coefficient SROCC and the pearson linear correlation coefficient PLCC. SROCC describes the monotonicity of an image quality assessment algorithm, and the expression is as follows:
Figure BDA0003403906050000062
in equation 4, the parameter diRepresenting the difference between the objective score of the ith image and its subjective quality score ranking. I then represents the total number of images contained in the database. PLCC is a linear correlation coefficient between objective scores obtained by an algorithm and subjective quality scores of images after nonlinear regression processing, and the calculation formula is as follows:
Figure BDA0003403906050000063
in the formula 5, qiAnd SFiRespectively representing the subjective score and the predicted value, mu, of the ith imageqAnd
Figure BDA0003403906050000064
respectively, represent the mean of the two. Both correlation coefficients have values in the range of-1 to 1, with larger values indicating better network performance.
The consistency of the score of the objective quality evaluation algorithm and the subjective score DMOS in the database is measured by adopting the spearman grade order correlation coefficient and the pearson linear correlation coefficient. The higher the correlation between the subjective score and the objective score, the better the performance of the algorithm.
In order to verify the performance of the invention, 7 mainstream non-reference stereo image quality evaluation algorithms are selected as comparison on a LIVE 3D database. These algorithms include 4 traditional evaluation algorithms (3D-AdaBoost, BVCDP, BSFML, and SA) and 3 CNN-based evaluation algorithms (DCNN, RM-CNN, and VSM-CNN). On the Waterloo IVC 3D database, since evaluation algorithms based on deep learning are rarely tested, we chose 4 traditional evaluation algorithms as comparisons, including SINQ, DBN, BSIQE and BVCDP. The results are shown in tables 2 and 3.
TABLE 2 LIVE 3D image database-based algorithmic Performance comparison
Figure BDA0003403906050000071
TABLE 3 comparison of Algorithm Performance based on Waterloo IVC 3D image database
Figure BDA0003403906050000072
TABLE 4 specific distortion type Performance comparison based on LIVE 3D Phase I image database
Figure BDA0003403906050000081
TABLE 5 specific distortion type Performance comparison based on LIVE 3D Phase II image database
Figure BDA0003403906050000082
Tables 4 and 5 show the results of the present invention for specific distortion types in LIVE 3D Phase i and LIVE 3D Phase ii databases. In each column, the best performing results are shown in bold. As can be seen from tables 4 and 5, the present invention outperforms all comparative methods in both SROCC and PLCC on multiple distortion type images, and the SROCC and PLCC mean values on these particular distortion types exceed 0.9. In general, compared with other networks, the method can adapt to various different distortion conditions, and has high consistency with human subjective evaluation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. A non-reference stereo image quality evaluation method based on a multi-dimensional attention mechanism is characterized by comprising the following steps:
firstly, preprocessing an original stereo image for training, converting the original stereo image into a gray image and dividing the gray image into non-overlapping small image blocks, giving a real quality fraction of the image to each image block, and randomly selecting a plurality of image blocks as the input of a network model;
in the second step, a convolutional neural network based on multidimensional attention is trained. The method comprises the following steps:
(1) extracting primary features from left and right views by using a group of CCP modules for convolution and pooling operation, and processing the left and right views to obtain a primary feature map;
(2) sending the primary feature maps of the left view and the right view into a view fusion sub-network, and calculating a fusion feature map: the view fusion sub-network comprises a multidimensional attention module, and the module consists of a channel attention module and a space attention module; in the channel attention module, an input primary feature map passes through two same branches, channel dimensionality reduction is carried out in each branch, then global average pooling is carried out, the weight of each channel is obtained through a full-connection layer and a Sigmoid activation function, and the feature map of each channel is weighted to obtain a feature map weighted by channel attention; in a space attention module, performing dimension transformation on the feature maps weighted by the attention of two parallel channels, performing matrix multiplication operation, obtaining the weight of each view combining channel and space attention through a Softmax activation function, and weighting the primary feature maps of the left view and the right view by using the weight to obtain a fusion feature map;
(3) inputting the fused feature map into a multi-scale feature enhancement sub-network based on multi-dimensional attention, and predicting the quality scores of the images: extracting feature maps of the fusion feature map subjected to dimension transformation on three different scales by three groups of CCP modules which are operated in rolling and pooling on the basis of the multi-dimensional attention multi-scale feature enhancement sub-network, wherein the feature map on the minimum scale is called as an original deep fusion feature map; inputting the feature map on each scale into a multi-dimensional attention module, and reducing dimensions through a channel to obtain three dimension-reduced feature maps; the two feature maps after dimensionality reduction are subjected to channel weighting through a channel attention module, then subjected to dimensionality transformation, matrix multiplication operation and Softmax activation function processing to obtain a multidimensional attention weight, and the weight is used for weighting a third feature map in the three feature maps after dimensionality reduction to obtain a feature map based on a multidimensional attention mechanism; fusing the feature maps obtained on three scales based on the multi-dimensional attention mechanism by an up-sampling method, and performing feature extraction on the fused feature maps by using three CCP modules to obtain a deep multi-scale feature enhancement feature map based on the multi-dimensional attention; adding the deep multi-scale feature enhancement feature map based on multi-dimensional attention and the original deep fusion feature map to obtain an enhanced feature map, and sending the enhanced feature map into a full-connection layer to predict the image quality score;
(4) calculating a loss function of the network, and performing iterative training: after the predicted image quality fraction is obtained, calculating a loss function of the network, wherein the loss function adopts Root Mean Square Error (RMSE) and adds L2 regularization for preventing an overfitting phenomenon, so that the difference between the image quality fraction predicted by the network and the real image quality fraction is measured, and through multiple iterations in the training process, the network parameters are continuously updated to minimize the loss function, so that the image quality fraction predicted by the network is closer to the real fraction, and a trained network model is obtained.
And thirdly, evaluating the image quality by using the trained network.
2. The method of claim 1, wherein the CCP module includes two 3 × 3 convolutional layers and one pooling layer.
3. The method for evaluating the quality of the non-reference stereo image according to claim 1, wherein in the step (2) of the second step, channel dimensionality reduction is performed in each branch through a convolution kernel with the size of 1 x 1, then global average pooling is performed, the weight of each channel is obtained through a full connection layer and a Sigmoid activation function, and a Scale operation is used for weighting the feature map of each channel to obtain a channel attention weighted feature map.
4. The method for evaluating the quality of the non-reference stereo image according to claim 1, wherein in the step (3) of the second step, the feature map on each scale is input into a multi-dimensional attention module, and the three feature maps after dimension reduction are obtained by performing channel dimension reduction through three parallel 1 x 1 convolution operations.
5. The method for evaluating the quality of a reference-free stereoscopic image according to claim 1, wherein the third step is a method comprising: and preprocessing the stereo image to be evaluated, inputting the preprocessed stereo image into a network, and averaging the quality scores of the image blocks output by the network to obtain the quality score of the whole image.
CN202111507792.8A 2021-12-07 2021-12-10 Multi-dimensional attention mechanism-based non-reference stereoscopic image quality evaluation method Active CN114187261B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111487714 2021-12-07
CN2021114877146 2021-12-07

Publications (2)

Publication Number Publication Date
CN114187261A true CN114187261A (en) 2022-03-15
CN114187261B CN114187261B (en) 2024-08-27

Family

ID=80543149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111507792.8A Active CN114187261B (en) 2021-12-07 2021-12-10 Multi-dimensional attention mechanism-based non-reference stereoscopic image quality evaluation method

Country Status (1)

Country Link
CN (1) CN114187261B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897854A (en) * 2022-05-20 2022-08-12 辽宁大学 No-reference stereo image quality evaluation method based on double-current interactive network
CN115272776A (en) * 2022-09-26 2022-11-01 山东锋士信息技术有限公司 Hyperspectral image classification method based on double-path convolution and double attention and storage medium
CN115661911A (en) * 2022-12-23 2023-01-31 四川轻化工大学 Face feature extraction method, device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060236A (en) * 2019-03-27 2019-07-26 天津大学 Stereo image quality evaluation method based on depth convolutional neural networks
CN112183645A (en) * 2020-09-30 2021-01-05 深圳龙岗智能视听研究院 Image aesthetic quality evaluation method based on context-aware attention mechanism
CN112634238A (en) * 2020-12-25 2021-04-09 武汉大学 Image quality evaluation method based on attention module
CN112884682A (en) * 2021-01-08 2021-06-01 福州大学 Stereo image color correction method and system based on matching and fusion
CN113706386A (en) * 2021-09-04 2021-11-26 大连钜智信息科技有限公司 Super-resolution reconstruction method based on attention mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060236A (en) * 2019-03-27 2019-07-26 天津大学 Stereo image quality evaluation method based on depth convolutional neural networks
CN112183645A (en) * 2020-09-30 2021-01-05 深圳龙岗智能视听研究院 Image aesthetic quality evaluation method based on context-aware attention mechanism
CN112634238A (en) * 2020-12-25 2021-04-09 武汉大学 Image quality evaluation method based on attention module
CN112884682A (en) * 2021-01-08 2021-06-01 福州大学 Stereo image color correction method and system based on matching and fusion
CN113706386A (en) * 2021-09-04 2021-11-26 大连钜智信息科技有限公司 Super-resolution reconstruction method based on attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
富振奇;费延佳;杨艳;邵枫;: "基于深层特征学习的无参考立体图像质量评价", 光电子・激光, no. 05, 15 May 2018 (2018-05-15) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897854A (en) * 2022-05-20 2022-08-12 辽宁大学 No-reference stereo image quality evaluation method based on double-current interactive network
CN115272776A (en) * 2022-09-26 2022-11-01 山东锋士信息技术有限公司 Hyperspectral image classification method based on double-path convolution and double attention and storage medium
CN115272776B (en) * 2022-09-26 2023-01-20 山东锋士信息技术有限公司 Hyperspectral image classification method based on double-path convolution and double attention and storage medium
CN115661911A (en) * 2022-12-23 2023-01-31 四川轻化工大学 Face feature extraction method, device and storage medium

Also Published As

Publication number Publication date
CN114187261B (en) 2024-08-27

Similar Documents

Publication Publication Date Title
CN114187261B (en) Multi-dimensional attention mechanism-based non-reference stereoscopic image quality evaluation method
CN111182292B (en) No-reference video quality evaluation method and system, video receiver and intelligent terminal
CN109360178B (en) Fusion image-based non-reference stereo image quality evaluation method
CN110060236B (en) Stereoscopic image quality evaluation method based on depth convolution neural network
CN110728656A (en) Meta-learning-based no-reference image quality data processing method and intelligent terminal
CN110516716B (en) No-reference image quality evaluation method based on multi-branch similarity network
CN109872305B (en) No-reference stereo image quality evaluation method based on quality map generation network
CN110570363A (en) Image defogging method based on Cycle-GAN with pyramid pooling and multi-scale discriminator
Yue et al. Blind stereoscopic 3D image quality assessment via analysis of naturalness, structure, and binocular asymmetry
CN108389192A (en) Stereo-picture Comfort Evaluation method based on convolutional neural networks
CN111429402B (en) Image quality evaluation method for fusion of advanced visual perception features and depth features
CN108235003B (en) Three-dimensional video quality evaluation method based on 3D convolutional neural network
CN112489164B (en) Image coloring method based on improved depth separable convolutional neural network
CN112767385B (en) No-reference image quality evaluation method based on significance strategy and feature fusion
CN112419242A (en) No-reference image quality evaluation method based on self-attention mechanism GAN network
CN109816646B (en) Non-reference image quality evaluation method based on degradation decision logic
Si et al. A no-reference stereoscopic image quality assessment network based on binocular interaction and fusion mechanisms
CN109859166A (en) It is a kind of based on multiple row convolutional neural networks without ginseng 3D rendering method for evaluating quality
CN113554599A (en) Video quality evaluation method based on human visual effect
CN115205196A (en) No-reference image quality evaluation method based on twin network and feature fusion
Jiang et al. Stereoscopic image quality assessment by learning non-negative matrix factorization-based color visual characteristics and considering binocular interactions
CN111667407A (en) Image super-resolution method guided by depth information
CN114066812B (en) No-reference image quality evaluation method based on spatial attention mechanism
CN114972232A (en) No-reference image quality evaluation method based on incremental meta-learning
CN106022362A (en) Reference-free image quality objective evaluation method for JPEG2000 compression distortion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant