CN114494044A

CN114494044A - Attention mechanism-based image enhancement method and system and related equipment

Info

Publication number: CN114494044A
Application number: CN202210020052.XA
Authority: CN
Inventors: 闫潇宁; 郭肖勇; 陈晓艳; 陈文海
Original assignee: Shenzhen Anruan Huishi Technology Co ltd; Shenzhen Anruan Technology Co Ltd
Current assignee: Shenzhen Anruan Huishi Technology Co ltd; Shenzhen Anruan Technology Co Ltd
Priority date: 2022-01-06
Filing date: 2022-01-06
Publication date: 2022-05-13

Abstract

The invention is suitable for the application field of artificial intelligence technology, and provides an attention mechanism-based image enhancement method, an attention mechanism-based image enhancement system and related equipment, wherein the method comprises the following steps: acquiring video data of an actual scene, and acquiring a scene image by a method of splitting according to frames; preprocessing the scene image to obtain a paired image data set, and dividing the paired image data set into a training set and a verification set; constructing a neural network model containing an attention mechanism, and performing iterative training on the low-illumination image enhancement model by using the training set to obtain a low-illumination image enhancement model; and inputting the verification set into the low-illumination image enhancement model, obtaining quantitative index evaluation, and outputting the low-illumination image enhancement model. The invention realizes the brightness enhancement of the image with low illumination and uneven illumination.

Description

Attention mechanism-based image enhancement method and system and related equipment

Technical Field

The invention belongs to the field of artificial intelligence technology application, and particularly relates to an attention mechanism-based image enhancement method, system and related equipment.

Background

Images and videos carry rich and detailed information of real scenes, and by capturing and processing image and video data, image-related data analysis systems can be developed for various tasks to improve work efficiency. However, data analysis systems in practical use depend largely on the quality of the input image or video, which may perform well with high quality input data, but not well with images taken in low-light environments, because a camera may create dark areas on the image when it does not receive enough light during the taking process, and information loss and noise may be caused in the dark areas. Therefore, enhancing the image is an essential step before using the image for relevant data analysis and utilization.

The existing mainstream methods for enhancing images can be divided into two main directions, namely methods based on histogram equalization and methods based on Retinex theory, wherein the former method optimizes the brightness of pixels based on the idea of histogram equalization, and the latter method restores the illumination pattern of a scene and correspondingly enhances the areas of different illumination scenes. The two methods often rely on some assumptions of pixel statistics or visual mechanisms, but these assumptions are not applicable to real scenes, and besides optimization of brightness or contrast, other factors such as artifacts in dark areas and image noise due to low-light photographing are not well handled by the existing mainstream methods.

Disclosure of Invention

The embodiment of the invention provides an attention mechanism-based image enhancement method, an attention mechanism-based image enhancement system and related equipment, and aims to solve the problem that the brightness of an image with low illumination and uneven illumination cannot be enhanced by the conventional image enhancement method.

In a first aspect, an embodiment of the present invention provides an attention mechanism-based image enhancement method, where the method includes:

acquiring video data of an actual scene, and acquiring a scene image by a method of splitting according to frames;

preprocessing the scene image to obtain a paired image data set, and dividing the paired image data set into a training set and a verification set;

constructing a neural network model containing an attention mechanism, and performing iterative training on the neural network model by using the training set to obtain a low-illumination image enhancement model;

and inputting the verification set into the low-illumination image enhancement model, obtaining quantitative index evaluation, and outputting the low-illumination image enhancement model.

Further, the method for preprocessing the scene image specifically comprises the following steps:

and carrying out duplication removal and deblurring screening on the scene image, then carrying out gamma correction on an RGB channel of the scene image, and carrying out random gamma value selection on the scene image in a preset gamma interval.

Still further, the neural network model including an attention mechanism includes an input layer, a feature extraction layer, an attention mechanism layer, a feature enhancement layer, a feature fusion layer, and an output layer, wherein:

the input layer is used for inputting the scene images in the training set to the neural network model containing the attention mechanism;

the feature extraction layer is used for extracting features of the scene images in the training set and outputting a feature extraction image;

the attention mechanism layer is used for refining and optimizing the features in the feature extraction diagram and outputting an optimized feature extraction diagram;

the feature enhancement layer is used for performing feature enhancement on the optimized feature extraction graph and outputting a feature enhanced image;

the characteristic fusion layer is used for carrying out characteristic fusion on the characteristic enhanced image to obtain a characteristic fusion image, and outputting the obtained characteristic fusion image through the output layer.

Furthermore, the feature extraction layer comprises a plurality of layers, and each feature extraction layer performs feature extraction with different sizes according to the sequence of the network layers;

the attention mechanism layer comprises a plurality of layers, each attention mechanism layer is positioned behind one feature extraction layer and takes the output of the previous feature extraction layer as the input of the attention mechanism layer, and besides the first feature extraction layer, each feature extraction layer takes the output of the previous attention mechanism layer as the input of the attention mechanism layer;

the feature enhancement layers and the attention mechanism layer have the same number of layers, and each feature enhancement layer takes the output of the attention mechanism layer at the same layer as the input of the feature enhancement layer;

the feature fusion layer comprises a layer for feature fusion of the outputs of all the feature enhancement layers.

Further, the attention mechanism layer includes a spatial attention mechanism and a channel attention mechanism, the spatial attention mechanism and the channel attention mechanism respectively calculate the feature extraction map to obtain a spatial feature weight and a channel feature weight, and perform a vector weight calculation on each feature vector in the feature extraction map by combining the spatial feature weight and the channel feature weight, the calculation of the vector weight satisfies the following formula (1):

s_i＝f(c_i，v) (1)；

wherein, c_iRepresenting the ith feature vector in the feature extraction graph, v representing the spatial feature weight and the channel feature weight obtained by learning calculation, s_iIs represented by c_iThe corresponding vector weight, and:

if the brightness of the area corresponding to the feature vector is greater than a preset brightness threshold value, s_iHas a value of not more than 1;

if the brightness of the region corresponding to the feature vector is less than or equal to the preset brightness threshold value, s_iIs greater than 1.

Furthermore, for each feature vector in the feature extraction map, the attention mechanism layer performs multiplication calculation according to the vector weight corresponding to the feature vector to obtain an optimized refined feature vector c_i', saidRefined feature vector c_iThe calculation of' satisfies the following formula (2):

c_i’＝c_i×s_i (2)

still further, the quantitative index evaluation comprises at least one of average absolute error, mean square error, peak signal-to-noise ratio, structural similarity, and average brightness.

In a second aspect, an embodiment of the present invention further provides an attention-based image enhancement system, including:

the data acquisition module is used for acquiring video data of an actual scene and obtaining a scene image by a method of splitting according to frames;

the preprocessing module is used for preprocessing the scene image to obtain a paired image data set, and dividing the paired image data set into a training set and a verification set;

the model training module is used for constructing a neural network model containing an attention mechanism and performing iterative training on the neural network model by using the training set to obtain a low-illumination image enhancement model;

and the model quantitative evaluation module is used for inputting the verification set into the low-illumination image enhancement model, acquiring quantitative index evaluation and outputting the low-illumination image enhancement model.

In a third aspect, an embodiment of the present invention further provides a computer device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the image enhancement method as described in any of the above embodiments when executing the computer program.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the image enhancement method as described in any one of the above embodiments.

The method has the advantages that as the attention mechanism is used for refining the features between the steps of feature extraction and feature enhancement, the constructed low-illumination image enhancement model focuses more on the low-brightness area in the input image, the part with low illumination and uneven illumination is further optimized, and the image enhancement effect is improved.

Drawings

FIG. 1 is a block flow diagram of an attention-based image enhancement method provided by an embodiment of the invention;

FIG. 2 is a schematic structural diagram of a neural network model provided by an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an image enhancement system 200 according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Specifically, referring to fig. 1, fig. 1 is a flowchart of an image enhancement method based on attention mechanism according to an embodiment of the present invention, including the following steps:

s101, acquiring video data of an actual scene, and obtaining a scene image by a method of splitting according to frames.

In the embodiment of the present invention, the actual scene may be a scene with a complex environment and many moving objects, such as a campus or a traffic segment, and the method for acquiring video data of the actual scene may call existing monitoring data, or obtain the video data by shooting with other cameras with fixed positions and viewing angles, in order to obtain images of different scenes for contrasting dark light and sufficient illumination, the video data at least includes multiple segments of videos in different illumination time periods, split the video data into the scene images at certain time intervals by a method of splitting by frames after obtaining the video data, preferably, collect the video data at key points such as an entrance and an exit of a street and a campus, acquire each segment of 3 seconds and 1000 segments of video data covering 24 hours in a day, and split the frames of the video data at intervals of every 0.5 seconds, to acquire the scene image.

S102, preprocessing the scene image to obtain a matched image data set, and dividing the matched image data set into a training set and a verification set.

In the embodiment of the present invention, the method for preprocessing the scene image specifically includes:

The de-duplication is to remove the scene image with no change of the picture in the video data in the same time period, the de-blurring is to remove the scene image with unobvious objects and no confirmed details in the picture, the preset gamma interval is set to be 1.5 to 5 in the embodiment of the invention, in the interval, the exposure degree of the scene image is different from the original performance, but the darker and brighter parts in the scene image are more prominent, and the paired image data set is obtained after the preprocessing method is carried out, wherein, the paired image with the darker brightness and the image with the brighter brightness are paired to compare the effect of the darker image after enhancement. For the paired image dataset, it is divided into the training set and the validation set. In addition, preferably, 200 nighttime images in the scene images which are not preprocessed are selected as a test set of the neural network model.

S103, constructing a neural network model containing an attention mechanism, and performing iterative training on the neural network model by using the training set to obtain a low-illumination image enhancement model.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a neural network model provided in an embodiment of the present invention, the neural network model constructed in the embodiment of the present invention is based on mblen (multi-branch low-light enhancement network) and is optimized on the basis of mblen, the neural network model includes an input layer, a feature extraction layer, an attention mechanism layer, a feature enhancement layer, a feature fusion layer, and an output layer, where:

the input layer is used for inputting the scene images in the training set into the neural network model containing the attention mechanism;

the Feature extraction layer (FEM) is configured to perform Feature extraction on the scene images in a training set and output a Feature extraction map;

the Attention mechanism layer (Attention) is used for refining and optimizing the features in the feature extraction map and outputting an optimized feature extraction map;

the feature Enhancement layer (EM) is configured to perform feature Enhancement on the optimized feature extraction map and output a feature enhanced image;

the feature Fusion layer (FM) is configured to perform feature Fusion on the feature enhanced image to obtain a feature Fusion map, and output the obtained feature Fusion map through the output layer.

The feature extraction layer comprises a plurality of layers, each feature extraction layer performs feature extraction on the input end image in a convolution mode, and each feature extraction layer performs feature extraction with different sizes according to the sequence of network layers, namely, the size of a convolution block used by the feature extraction layer is smaller according to the difference of the network layers, and on the basis, the size of the feature extraction image obtained by processing different convolution blocks is also different.

The attention mechanism layer comprises a plurality of layers, each attention mechanism layer is positioned behind one feature extraction layer, the output of the previous feature extraction layer is used as the input of the attention mechanism layer, and besides the first feature extraction layer, the output of the previous attention mechanism layer is used as the input of each feature extraction layer.

Specifically, the attention mechanism layer includes a spatial attention mechanism and a channel attention mechanism, the spatial attention mechanism and the channel attention mechanism respectively calculate the feature extraction graph to obtain a spatial feature weight and a channel feature weight, and perform vector weight calculation on each feature vector in the feature extraction graph by combining the spatial feature weight and the channel feature weight, where the vector weight calculation satisfies the following formula (1):

s_i＝f(c_i，v) (1)

c_irepresenting the ith feature vector in the feature extraction graph, v representing the spatial feature weight and the channel feature weight obtained by learning calculation, s_iIs represented by_iCorresponding said vector weight, for which:

if the brightness of the area corresponding to the feature vector is not greater than the preset brightness threshold, s_iIs greater than 1.

In this embodiment of the present invention, the preset brightness threshold is a self-set value, and the preset brightness threshold may be set according to the average brightness of the current scene image, or may be set according to the scene image with another brightness in the paired image data set.

The feature enhancement layers and the attention mechanism have the same number of layers, namely, the feature enhancement layers comprise n layers, and each feature enhancement layer takes the output of the attention mechanism layer at the same layer as the input.

For each feature vector in the feature extraction diagram, the attention mechanism layer performs multiplication calculation according to the vector weight corresponding to the feature vector to obtain an optimized refined feature vector c_i', said refined feature vector c_iThe calculation of' satisfies the following formula (2):

c_i’＝c_i×s_i (2)

in particular, for said vectorWeight s_iAfter the multiplication calculation is carried out on the feature vectors which are not more than 1, the attention mechanism layer carries out reduction processing on the brightness of the feature vector position; for the vector weight s_iAnd after the multiplication calculation is carried out on the feature vectors larger than 1, the attention mechanism layer carries out improvement processing on the brightness of the feature vector position.

Specifically, the outputs of all the feature enhancement layers are feature enhancement graphs with different sizes after feature enhancement, and for the output sizes with different feature extraction layers, the output of each feature enhancement graph is subjected to upsampling and superposition according to network levels, and finally is fused into a feature fusion graph with the same size as the scene image originally input into the neural network model, and is output through the output layer.

Preferably, in the embodiment of the present invention, the neural network model is trained at least 200 times by using the training set, and the trained neural network model is output as the low-illumination image enhancement model.

And S104, inputting the verification set into the low-illumination image enhancement model, obtaining quantitative index evaluation, and outputting the low-illumination image enhancement model.

In this step, the evaluation of the low-illumination image enhancement model is mainly performed through the verification set, and in the embodiment of the present invention, the quantitative index evaluation includes Mean Absolute Error (MAE), Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM), and Average Brightness (AB), and the images of different scenes in the verification set are divided into different scenes, such as Street-View and Campus-View, and the quantitative index evaluations of the low-illumination image enhancement model in different scenes in the verification set are shown in tables 1 and 2 below.

TABLE 1 quantitative index evaluation in Street-View

	MAE	MSE	PSNR	SSIM	AB
						MBLLEN	0.0907	0.0148	18.5514	0.9273	0.0221
MBLLEN+attention	0.0557	0.0062	23.0472	0.9530	0.0213

TABLE 2 evaluation of quantitative indices in Campus

	MAE	MSE	PSNR	SSIM	AB
						MBLLEN	0.2214	0.1003	10.0190	0.6182	0.1700
MBLLEN+attention	0.0578	0.0063	22.9384	0.9554	0.0215

According to the results of the quantitative evaluation indexes in tables 1 and 2, the low-illumination image enhancement model constructed in the embodiment of the invention has a better low-illumination enhancement effect than the original MBLLEN model. And then, outputting the low-illumination image enhancement model meeting the preset evaluation standard, and verifying the visual effect of the low-illumination image enhancement model on the test set.

It should be noted that, although the low-illuminance image enhancement model in the embodiment of the present invention is based on the mblen model, the method for deploying the attention mechanism in the embodiment of the present invention may be used for any depth model that uses a convolutional neural network as a basic structure, and therefore, a neural network model for low-illuminance image enhancement is constructed based on which model is used, which is not limited in the present invention.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an image enhancement system 200 according to an embodiment of the present invention, which includes:

a data obtaining module 201, configured to obtain video data of an actual scene, and obtain a scene image by a frame splitting method;

a preprocessing module 202, configured to preprocess the scene image to obtain a paired image data set, and divide the paired image data set into a training set and a verification set;

the model training module 203 is used for constructing a neural network model containing an attention mechanism, and performing iterative training on the neural network model by using the training set to obtain a low-illumination image enhancement model;

and the model quantitative evaluation module 204 is configured to input the verification set into the low-illumination image enhancement model, obtain quantitative index evaluation, and output the low-illumination image enhancement model.

The image enhancement system 200 can implement the steps in the image enhancement method based on attention mechanism in the above embodiment, and can implement the same technical effects, which are not described herein again with reference to the description in the above embodiment.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a computer device provided in an embodiment of the present invention, where the computer device 300 includes: a memory 302, a processor 301, and a computer program stored on the memory 302 and executable on the processor 301.

The processor 301 calls the computer program stored in the memory 302 to execute the steps in the park management method provided by the embodiment of the present invention, please refer to fig. 1, which specifically includes:

s_i＝f(c_i，v) (1)；

wherein, c_iRepresenting the ith feature vector in the feature extraction graph, v representing the spatial feature weight and the channel feature weight obtained by learning calculation, s_iIs represented by_iThe corresponding vector weight, and:

Furthermore, for each feature vector in the feature extraction map, the attention mechanism layer performs multiplication calculation according to the vector weight corresponding to the feature vector to obtain an optimized refined feature vector c_i', said refined feature vector c_iThe calculation of' satisfies the following formula (2):

c_i’＝c_i×s_i (2)

and S104, inputting the verification set into the low-illumination image enhancement model, acquiring quantitative index evaluation, and outputting the low-illumination image enhancement model.

The computer device 300 according to the embodiment of the present invention can implement the steps in the image enhancement method based on attention mechanism in the above embodiments, and can implement the same technical effects, and reference is made to the description in the above embodiments, and details are not repeated here.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process and step in the image enhancement method based on the attention mechanism provided in the embodiment of the present invention, and can implement the same technical effect, and in order to avoid repetition, details are not repeated here.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, which are illustrative, but not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An attention-based image enhancement method, the method comprising:

2. The image enhancement method according to claim 1, wherein the method for preprocessing the scene image specifically comprises:

3. The image enhancement method of claim 1, wherein the neural network model that includes an attention mechanism comprises an input layer, a feature extraction layer, an attention mechanism layer, a feature enhancement layer, a feature fusion layer, and an output layer, wherein:

the feature enhancement layer is used for performing feature enhancement according to the optimized feature extraction image and outputting a feature enhanced image;

4. The image enhancement method according to claim 3, wherein the feature extraction layer comprises a plurality of layers, and each feature extraction layer performs feature extraction of different sizes according to the precedence order of the network layers;

5. The image enhancement method according to claim 3, wherein the attention mechanism layer comprises a spatial attention mechanism and a channel attention mechanism, the spatial attention mechanism and the channel attention mechanism respectively calculate the feature extraction map to obtain a spatial feature weight and a channel feature weight, and perform a vector weight calculation on each feature vector in the feature extraction map by combining the spatial feature weight and the channel feature weight, the calculation of the vector weight satisfying the following formula (1):

s_i＝f(c_i，v)(1)；

if it is as describedIf the brightness of the region corresponding to the feature vector is greater than the preset brightness threshold value, s_iHas a value of not more than 1;

6. The image enhancement method of claim 5, wherein for each of the feature vectors in the feature extraction map, the attention mechanism layer performs multiplication calculation according to the vector weight corresponding to the feature vector to obtain an optimized refined feature vector c_i', said refined feature vector c_iThe calculation of' satisfies the following formula (2):

c_i’＝c_i×s_i(2)。

7. the image enhancement method of claim 1, wherein the quantitative index evaluation comprises at least one of mean absolute error, mean square error, peak signal-to-noise ratio, structural similarity, and average brightness.

8. An attention-based image enhancement system, comprising:

9. A computer device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing the steps in the image enhancement method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the image enhancement method according to any one of claims 1 to 7.