CN112184566B

CN112184566B - Image processing method and system for removing adhered water mist and water drops

Info

Publication number: CN112184566B
Application number: CN202010875220.4A
Authority: CN
Inventors: 罗家佳; 何达
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2023-09-01
Anticipated expiration: 2040-08-27
Also published as: CN112184566A

Abstract

The embodiment of the application discloses an image processing method, an image processing system, an image processing device and a readable storage medium, wherein the method comprises the following steps: inputting the image to a classified convolutional neural network; the classification convolutional neural network classifies the images, generates a airspace attention mask according to a target classification result, superimposes the airspace attention mask with an original input image, and inputs the superimposed airspace attention mask to the feature coding module; the feature coding module filters useless features in the superimposed image, and inputs the filtered feature images to the smooth cavity convolution module; the smooth cavity convolution module extracts features aiming at the filtered feature images, enhances target retention features and inputs the processed feature images to a decoding module; the decoding module performs an operation of recovering an image according to the processed feature map; further, the restored image is output. The blurred image may be sharpened, for example, an unclear image disturbed by the adhering mist and adhering beads may be restored to a sharp image.

Description

Image processing method and system for removing adhered water mist and water drops

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to an image processing method and system for removing adhered water mist and water drops.

Background

In recent decades, with the rapid development of artificial intelligence technologies represented by deep learning, many vision-based artificial intelligence applications have been popular (such as face recognition, intelligent beauty, etc.), and some applications that will produce great changes to social life production have been perfected (such as automatic driving, gesture recognition, etc.). These high-level applications are based on a clear input image. However, in a large number of outdoor scenes, the camera lens (or the glass in front of it, which plays a protective role) that collects images may be attached by mist or water droplets, which in turn results in a serious degradation of the quality of the collected images that cannot be used by subsequent algorithms.

This phenomenon is often caused by differences in the temperature inside and outside the glass, weather changes, or severe working environments. Take the application of vision-based autopilot algorithms as an example: if the air conditioner is not started, the temperature in the vehicle is higher than the temperature outside the vehicle after the vehicle occupant breathes slightly in the vehicle for one or two minutes in winter, the water vapor in the vehicle is liquefied into water mist when meeting cold on the windshield, and the automatic driving algorithm cannot acquire clear images, so that serious traffic accidents are most likely to be caused. Taking an outdoor security camera as an example: when the rain suddenly occurs, the protective glass in front of the lens can be splashed with raindrops, the glass is cooled down rapidly, the temperature inside the camera is temporarily relatively high, at the moment, the phenomenon that the inner side of the glass is adhered with liquefied water mist and the outer side of the glass is adhered with water drops can occur, and meanwhile, the performance of the security algorithm is reduced due to the reduction of the image quality.

However, although outdoor visual algorithms have had considerable application scenarios, the industry and academia have not proposed solutions at the algorithm level specifically for the phenomenon of mist adhesion and water mist coexistence of the adhered water beads for the past several years.

Disclosure of Invention

Therefore, the embodiment of the application provides an image processing method and an image processing system for removing water mist and water drops, which can be used for sharpening a blurred image, for example, recovering an unclear image interfered by the water mist and the water drops into a clear image.

In order to achieve the above object, the embodiment of the present application provides the following technical solutions:

according to a first aspect of an embodiment of the present application, there is provided an image processing method for removing adhered water droplets, the method including:

inputting the image to a classified convolutional neural network;

the classification convolutional neural network classifies the images, generates a airspace attention mask according to a target classification result, superimposes the airspace attention mask with an original input image, and inputs the superimposed airspace attention mask to the feature coding module;

the feature coding module filters useless features in the superimposed image, and inputs the filtered feature images to the smooth cavity convolution module;

The smooth cavity convolution module extracts features aiming at the filtered feature images, enhances target retention features and inputs the processed feature images to a decoding module;

the decoding module performs an operation of recovering an image according to the processed feature map;

and outputting the restored image.

Optionally, the classifying convolutional neural network classifies the image, including:

judging the image to be clear or affected by the adhered water drops and/or the adhered water mist;

the classified convolutional neural network extracts image features by a convolutional layer with continuously reduced seven-layer feature map size, converts two-dimensional features into one-dimensional features by a global pooling layer, and generates classified output by a full-connection layer containing three neurons.

Optionally, the spatial attention mask is generated according to the target classification result and the following formula:

wherein t is the corresponding classification result, M is the calculation result of the classification activation diagram, namely the airspace attention mask, f is the characteristic diagram of the last layer of convolution layer, k is the kth channel of the characteristic diagram, omega _k Is the weight of the subsequent full-connection layer corresponding to the channel, and (x, y) refers to the spatial domain position coordinate.

Optionally, the feature encoding module includes three convolution layers and three dual-attention machine sub-modules, each followed by a dual-attention mechanism; the convolution step length of the first two convolution layers is 1, and the convolution step length of the last convolution layer is 2, so that the length and width of the feature map transmitted in the network are respectively reduced to half of the original size.

Optionally, the smoothing cavity convolution module extracts features for the filtered feature map and enhances target retention features, including:

the smooth hole convolution inputs the filtered characteristic diagram into a hole convolution layer and is smoothed by a common convolution layer; then a cavity convolution layer and a common convolution layer are arranged; finally, superposing the processed characteristic diagram and the input filtered characteristic diagram to form a residual error learning mechanism; the smooth hole convolution module comprises six smooth hole convolutions, wherein the hole rate of the first three is 2, and the hole rate of the second three is 4; each smooth hole convolution operation is followed by a dual-attention machine sub-module.

Optionally, the dual-attention machine sub-module is a module combining a channel attention mechanism and a airspace attention mechanism;

in a channel attention mechanism, an input feature map is respectively converted into two one-dimensional features through global maximum pooling and global average pooling, and the two one-dimensional features are processed through two shared full-connection layers, so that different channels of the original input feature map are given different weights; in the airspace attention mechanism, the input feature map is subjected to maximum value pooling and average pooling along a channel axis to obtain two single-channel two-dimensional features, and after the two single-channel two-dimensional features are overlapped, the two single-channel two-dimensional features are processed by a 7×7 large convolution kernel convolution layer to generate airspace weight distribution of the original input feature map.

Optionally, the decoding module includes a transposed convolutional layer that is twice as large and two normal convolutional layers; the transpose convolution layer and the first normal convolution layer are each followed by a dual-attention machine sub-module.

According to a second aspect of an embodiment of the present application, there is provided an image processing system for adhered water mist and water drop removal, the system comprising:

the image input module is used for inputting images to the classified convolutional neural network;

the additional airspace attention mask generation module is used for classifying the images by the classification convolutional neural network, generating airspace attention masks according to target classification results, superposing the airspace attention masks with the original input images and inputting the airspace attention masks to the feature coding module;

the feature coding module is used for filtering the useless features in the superimposed image and inputting the filtered feature images to the smooth cavity convolution module;

the smooth cavity convolution module is used for extracting features aiming at the filtered feature images, enhancing target retention features and inputting the processed feature images to the decoding module;

the decoding module is used for carrying out image restoration operation on the processed feature images;

And the image output module is used for outputting the restored image.

According to a third aspect of embodiments of the present application, there is provided an apparatus comprising: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform the method of any of the first aspects.

According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having embodied therein one or more program instructions for performing the method of any of the first aspects.

In summary, the embodiments of the present application provide an image processing method, system, device and readable storage medium for removing water droplets attached to a water droplet by inputting an image into a classified convolutional neural network; the classification convolutional neural network classifies the images, generates a airspace attention mask according to a target classification result, superimposes the airspace attention mask with an original input image, and inputs the superimposed airspace attention mask to the feature coding module; the feature coding module filters useless features in the superimposed image, and inputs the filtered feature images to the smooth cavity convolution module; the smooth cavity convolution module extracts features aiming at the filtered feature images, enhances target retention features and inputs the processed feature images to a decoding module; the decoding module performs an operation of recovering an image according to the processed feature map; further, the restored image is output. The blurred image may be sharpened, for example, an unclear image disturbed by the adhering mist and adhering beads may be restored to a sharp image.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the application, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present application, should fall within the scope of the application.

Fig. 1 is a schematic view of a scene interfered by water mist and water drops attached in an embodiment of the present application;

FIG. 2 is a diagram illustrating an exemplary recovery effect provided by an embodiment of the present application;

fig. 3 is a schematic flow chart of an image processing method for removing water droplets of an attached water mist according to an embodiment of the present application;

FIG. 4 is a general flow chart provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a classified convolutional neural network for generating additional spatial attention masks according to an embodiment of the present application;

FIG. 6 is an example of additional spatial attention masks generated for two samples provided by an embodiment of the present application;

FIG. 7 is a block diagram of a smooth hole convolution operation according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a dual attention mechanism submodule according to an embodiment of the present application;

fig. 9 is a block diagram of an image processing system for removing water droplets attached according to an embodiment of the present application.

Detailed Description

Other advantages and advantages of the present application will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Technical terms related to the embodiment of the present application are explained below:

Deep learning: deep learning is a growing direction of research in the field of artificial intelligence for the last decade. The method utilizes a multi-layer artificial neural network structure to extract information features layer by layer from a large amount of data, and realizes nonlinear fitting prediction on high-level features. The process of implementing the deep learning algorithm is often divided into two stages, training and reasoning.

Training: training of the deep learning algorithm refers to the process of learning from the algorithm data and continuously adjusting the weight parameters within the various neural network layers to make the error function as small as possible. The weight parameters of all the neural network layers after training are fixed, and the whole neural network structure and the fixed weight parameters jointly form a trained deep learning model file.

Reasoning: training of deep learning algorithms is often a research and development process, while reasoning refers to applying a well developed (trained) model. At this time, for any input data, the weight in the artificial neural network is not changed any more, but the input data is affected in sequence according to the network structure, and finally nonlinear weighted output is obtained.

Convolutional Neural Network (CNN): convolutional neural networks are a large class of methods in deep learning, consisting of several, tens, or even hundreds of convolutional layers stacked. Convolutional neural networks are particularly suitable for processing data in image format. By utilizing the convolutional neural network, the computer algorithm can have the capability of partial human vision, such as image classification and target detection, and can also realize image enhancement functions of image denoising, image amplification and the like.

Feature map: the feature map refers to the input or output of each convolutional layer in the convolutional neural network. The feature map generated by the convolution layer at a shallow position in the convolutional neural network may be low-level features such as edges, corner points and the like of the input image, and the feature map generated by the convolution layer at a deep position may be high-level features which are difficult to intuitively interpret. The output feature map of most convolution layers often has tens or even hundreds of image channels (most color pictures shot by mobile phones contain three channels of red, green and blue).

Attention mechanism: the attention mechanism of convolutional neural networks is a newly proposed method in recent years for improving network performance and promoting efficient convergence of the training process. It often is outside the original convolutional network layer, and different features are additionally given different weights again, so that the convolutional neural network can pay more attention to certain features and pay less attention to unimportant features. Attention modules that draw attention mechanisms are also often trainable neural network layers.

Airspace attention mechanism: spatial attention mechanisms refer to performing attention mechanisms on the image space domain. By applying different attention weights to different locations of the input image or feature map, the algorithm can strengthen the training for important locations of the image or feature map. For example, to train a deep learning model that can recognize faces in an image, we can design and train the spatial attention module so that it will give little attention to the area of sky in the image.

Channel attention mechanism: feature maps often contain tens or even hundreds of channels, each channel may not have exactly the same characteristics, some of which are important for improving performance, and some of which may not be of great importance. The channel attention mechanism is to assign different attention weights to different channels of the feature map by using a trainable channel attention module. In this way the algorithm may focus more on important features, helping to improve the final model performance.

Although outdoor visual algorithms have had considerable application, the industry and academia have not been dedicated to solving the problem of co-existence of mist and water mist of adhered water beads at the algorithm level for some years.

On the one hand, most of the rain-related image quality restoration algorithms focus on removing rain streaks in images captured in rainy days, but thin line-shaped rain streaks are obviously far different from the scene shown in fig. 1 due to the influence of factors such as imaging principle, focal length of a camera and the like. Academic staff in 2013 explore the influence of water drops on a shot image for the first time, but are limited by the fact that methods such as deep learning are not mature at the time, and the effect of an image restoration algorithm proposed by the academic staff is quite limited. Until 2018, academic staff adopted an algorithm DeRaindrop with both a cyclic neural network and a convolutional neural network structure to effectively remove water drops attached to the surface of a lens.

In the academic personal program, they first skillfully construct a dataset with glass plates of different states. That is, for the same scene, under the condition of keeping different cameras, a clean glass plate and a glass plate with attached water drops are respectively arranged in front of the cameras, so that images which are clear and are disturbed by the attached water drops under the same scene are respectively shot. Next, they subtract the two pictures to obtain the true value of the spatial attention mechanism mask, and train a long and short memory network to predict the spatial attention mask from the input unclear image. Finally, the mask is overlapped with the original unclear image to be used as the input of a convolutional neural network with a attention mechanism, and by training the convolutional neural network, the method successfully removes the adhered water drops on the unclear image and obtains a clearer image.

In terms of defogging, all image defogging algorithms (such as GCANet, FFANet) are directed to only fog in the atmosphere, such as restoring a photograph taken on a foggy weather where the street opposite is difficult to see to a clear image that can identify the sign of the opposite store. Therefore, the first algorithm for removing the interference caused by the attached water mist of the shooting device and the coexistence of the attached water mist and the attached water beads is possible in the embodiment of the application.

The prior related art mainly comprises two types, one is an adhering water bead removal algorithm (DeRaindrop) proposed by academic personnel; both are common image defogging algorithms (e.g., GCANet, FFANet).

The method of academic personnel has better effect on removing the attached water drops in a simple image, but is difficult to be used in the condition that the attached water mist and the attached water drops exist simultaneously and is difficult to be used in the condition that only the attached water mist exists. The high performance of the academic personal approach relies heavily on additional attention masks before recovering the network. However, because the mask generation process is based on an intensive supervision mechanism, an academic person generates a true value of the mask by taking the difference value (and setting the difference threshold value to be 30) between the pixel values of the clear image and the unclear image, and then the long-time memory network can be trained to generate the mask. However, in the case of containing the adhering mist, the difference in pixel values between the clear and non-clear images in the area corresponding to the adhering beads may be large or small, and even a case may occur in which the difference in minimum pixel value of the entire image is larger than 30 (the adhering mist is distributed throughout the entire image). In other words, the academic personnel's approach cannot effectively create additional attention masks that have a significant impact on their performance in the context for which the present technology is aimed.

On the other hand, the common image defogging algorithm mainly aims at fog evenly distributed in the atmosphere of a large fog weather. The fog in the atmosphere is relatively uniformly distributed, and the optical signal of the distant object is attenuated according to a simpler rule, namely, the image characteristics are relatively stable, and the scattering condition of the incident light ray is not greatly suddenly changed. However, for the coexistence of the adhering mist and the adhering water droplets, the adhering water droplets will scatter the incident light rays drastically before the adhering mist, causing the acquired image to lose almost all information in the area where the water droplets are present. This serious information loss and uneven mutation in the spatial domain are not considered by the general image defogging algorithm. If the neural network frame of the common image defogging algorithm is directly adopted to treat the situation of containing attached water mist and attached water drops, the performance of the neural network frame is limited, and the quality of the restored image is poor.

Therefore, the embodiment of the application is mainly used for solving the problem of image interference which is always ignored in the industry before, namely the interference of the attached water mist and the attached water beads on the imaging quality before shooting the camera. The technology is a scheme which is proposed for the image interference problem for the first time in the industry, and reasonably designs a neural network structure with reasonable high performance according to the characteristics of the interference source, so that the attached water mist and the attached water drops with different degrees in the image can be removed well and adaptively.

The embodiment of the application designs a set of image recovery algorithm to solve the image interference commonly encountered in life production: when the camera shoots, the attached water mist and the attached water drops are arranged on the surface of the lens or the near glass, and the adhesion force is exerted on recovering an unclear image interfered by the attached water mist and the attached water drops into a clear image.

Such interference phenomenon often occurs when there is a temperature difference between both sides of the light-transmitting device, or when water is splashed on one side due to a weather change and a severe working environment. Fig. 1 shows a schematic view of a scene disturbed by an adhering mist of water droplets. Wherein (a) the lens surface has the image shot when adhering water drops; (b) A perspective schematic diagram when fog is attached to the front windshield of the automobile; (c) The lens surface simultaneously adheres to mist and water drops and captures images. An example of adhering water droplets and adhering mist is shown as (a) (b) in fig. 1, respectively, and (c) in fig. 1 shows a case where a lens captures a camera while adhering water mist and water droplets. In addition to such relatively small water droplets, there may be cases where the water droplets are large forming a sheet of water film.

Fig. 2 shows an example of the effect of the present technology to recover the adhering water mist and water droplets. The first behavior is an unclear image which is originally input and is interfered by the attached water mist and the water drops, and the second behavior is a result of three input images recovered by the technology.

Fig. 3 shows a flow of an image processing method for removing water droplets of an attached water mist, which is provided by an embodiment of the present application, and as shown in fig. 3, the method includes:

step 301: the image is input to a classified convolutional neural network.

Step 302: the classification convolutional neural network classifies the images, generates a spatial attention mask according to a target classification result, superimposes the spatial attention mask with the original input image, and inputs the superimposed spatial attention mask to the feature coding module.

Step 303: and the feature coding module filters the useless features in the superimposed image and inputs the filtered feature images to the smooth cavity convolution module.

Step 304: and the smooth cavity convolution module extracts features aiming at the filtered feature map, enhances target retention features and inputs the processed feature map to the decoding module.

Step 305: and the decoding module performs an operation of recovering the image aiming at the processed characteristic diagram.

Step 306: and outputting the restored image.

In one possible implementation, in step 302, the classifying convolutional neural network classifies the image, including:

judging the image as an image affected by the adhered water drops and/or the adhered water mist; the classified convolutional neural network extracts image features by a convolutional layer with continuously reduced seven-layer feature map size, converts two-dimensional features into one-dimensional features by a global pooling layer, and generates classified output by a full-connection layer containing three neurons.

In one possible implementation, in step 302, the spatial attention mask is generated according to the target classification result according to the following formula (1):

In one possible implementation, the feature encoding module includes three convolution layers, each followed by a respective dual-attention mechanism, and three dual-attention machine sub-modules; the convolution step length of the first two convolution layers is 1, and the convolution step length of the last convolution layer is 2, so that the length and width of the feature map transmitted in the network are respectively reduced to half of the original size.

In a possible implementation manner, in step 304, the smoothing hole convolution module extracts features for the filtered feature map and enhances a target retention feature, including:

the smooth hole convolution inputs the filtered characteristic diagram into a hole convolution layer and is smoothed by a common convolution layer; then a cavity convolution layer and a common convolution layer are arranged; finally, the processed feature map and the input filtered feature map are overlapped to form a residual error learning mechanism; the smooth hole convolution module comprises six smooth hole convolutions, wherein the hole rate of the first three is 2, and the hole rate of the second three is 4; each smooth hole convolution operation is followed by a dual-attention machine sub-module.

In one possible implementation, the dual-attentiveness-machine sub-module is a module combining a channel attentiveness mechanism and a spatial attentiveness mechanism; in a channel attention mechanism, an input feature map is respectively converted into two one-dimensional features through global maximum pooling and global average pooling, and the two one-dimensional features are processed through two shared full-connection layers, so that different channels of the original input feature map are given different weights; in the airspace attention mechanism, the input feature map is subjected to maximum value pooling and average pooling along a channel axis to obtain two single-channel two-dimensional features, and after the two single-channel two-dimensional features are overlapped, the two single-channel two-dimensional features are processed by a 7×7 large convolution kernel convolution layer to generate airspace weight distribution of the original input feature map.

In one possible implementation, the decoding module includes one transposed convolutional layer that is twice as large and two normal convolutional layers; the transpose convolution layer and the first normal convolution layer are each followed by a dual-attention machine sub-module.

An overall flowchart of implementation of the embodiment of the present application is shown in fig. 4. When an image affected by the attached water mist or attached water beads at the time of imaging is input, the additional spatial attention mask generation module generates a spatial attention mask for the image first. And then the mask is overlapped with the original input image to form a four-channel tensor, the four-channel tensor sequentially filters useless features through a feature coding module, the features are extracted by a smooth cavity convolution module, the features to be preserved are continuously enhanced, the image is restored by a decoding module, and finally the image with the attached water mist and the attached water drops removed after restoration is obtained.

The core of the additional airspace attention mask generating module is a classified convolutional neural network and a corresponding classified activation map computing process thereof. The network structure of the classification network is shown in fig. 5. FIG. 5 shows a schematic diagram of a classified convolutional neural network structure for generating additional spatial attention masks. The method comprises the steps of firstly extracting image features by a convolution layer with the size of 7 layers of feature graphs being continuously reduced, then converting two-dimensional features into 1-dimensional features by a global pooling layer, and finally generating classification output by using a full-connection layer containing 3 neurons.

For any input image, the classification network automatically determines whether the input image is very clear, has only adhered water mist interference, or contains adhered water mist. According to the classification result, the method adopts a classification activation map calculation step to obtain different contributions of different positions of the original input map space to the classification result, and uses the calculated classification activation map as an additional airspace attention mask to be transmitted to a subsequent module together with the original map.

After the classification result is obtained, the calculation mode of the classification activation graph can be calculated by the formula (1).

Fig. 6 illustrates an example of an additional spatial attention mask generated by the present method for two input samples. As can be seen from fig. 6, the areas where raindrops are attached and the areas where the gradient change under the attached mist are large show large values on the mask, which helps the present method emphasize the treatment of these areas. Fig. 6 shows an example of an additional spatial attention mask generated for two samples. (a) is a sample with only adhering bead interference; (b) is a sample having both adhering mist and adhering water droplets.

The feature encoding module consists of three convolution layers and three dual-attention mechanism sub-modules, each of which is followed by a dual-attention mechanism. The convolution step length of the first two convolution layers is 1, and the convolution step length of the last convolution layer is 2, so that the length and width of the feature map transmitted in the network are respectively reduced to one half of the original size.

The smooth hole convolution module comprises six smooth hole convolutions, wherein the hole rate of the first three is 2, and the hole rate of the last three is 4. Each smooth hole convolution operation is followed by a dual-attention machine sub-module. The structure for achieving smooth hole convolution is shown in fig. 7.

Firstly, the feature map is sent into a cavity convolution layer and then smoothed by a common convolution layer; then a cavity convolution layer and a common convolution layer are arranged; and finally, superposing the processed feature map and the input feature map to form a residual learning mechanism. Compared with the common convolution, the cavity convolution can enlarge the perception domain of the convolution operation and improve the recovery effect under the condition of not increasing the calculated amount; while the smooth hole convolution keeps a large perception domain, the smooth hole convolution suppresses the generation of some artifacts.

The structure of the dual-attention machine sub-module is shown in fig. 8, which contains a channel attention mechanism and a spatial attention mechanism. In the dual attention mechanism submodule architecture diagram of fig. 8, GMP is global max pooling; GAP is global average pooling; MPC is max pooling along the channel axis; APC is pooled along the channel axis average.

In the channel attention mechanism, the input feature map is respectively converted into two one-dimensional features through global maximum pooling and global average pooling, and the two one-dimensional features are processed by two shared full-connection layers (the neuron numbers are respectively 32 and 64) to endow different channels of the original input feature map with different weights.

In the airspace attention mechanism, the input feature map is subjected to maximum value pooling and average pooling along a channel axis to obtain two single-channel two-dimensional features, and after the two single-channel two-dimensional features are overlapped, the two single-channel two-dimensional features are processed by a 7×7 large convolution kernel convolution layer to generate airspace weight distribution of the original input feature map. The channel attention mechanism can be used to screen more useful channel information in the multi-channel profile; the learnable spatial attention mechanism is able to find different importance of different locations on the feature map.

The final decoding module consists of a transposed convolution layer with a magnification of 2 and two normal convolution layers. The transpose convolution layer and the first normal convolution layer are each accompanied by a dual-attention machine sub-module. In addition, the feature map before being sent to the smooth hole convolution module, the feature map after being subjected to four smooth hole convolution and double-attention mechanism modules and the feature map after being subjected to all smooth hole convolution are added before being sent to the transpose convolution, so that a residual error learning mechanism with a larger perception range is formed.

The above description is a structural description of the various parts of the convolutional neural network employed in the embodiments of the present application. A further requirement for training a model of an embodiment of the present application is a dataset with true values. That is, for an identical scene, embodiments of the present application require a pair of images: a picture (input) disturbed by the water mist attached to the water beads and a corresponding clear picture (true value). Thus, the convolutional network described above continuously attempts to process the input image to approximate the truth image during training. To construct such a dataset, embodiments of the present application place different states of glass plates in front of the camera to artificially construct two scenes. A piece of glass plate is provided with water mist and water drops attached to a random degree; the other glass plate is very clean.

In order to better train the deep learning model according to the embodiment of the application, a perceptual loss is added to the loss function to improve the detail quality of the generated image. The perception loss is calculated by a classical VGG19 network pre-training model, and the calculation mode is shown by a formula (2):

wherein x, y, z respectively represent three dimensions of the feature map, f (I ^gt ) Representing a characteristic diagram obtained by 7 layers of convolution layers after a clear truth image is sent into a pretrained VGG19 model, f (I) ^out ) Corresponding book of the same theoryThe method restores the VGG19 feature map generated by the obtained image.

Further, the perceptual loss and the classical mean square error loss (MSE) are added in a certain proportion to obtain a total loss function for training the model of the method, and the calculation mode is shown by a formula (3):

Loss _total ＝1.0×Loss _mse +0.05×Loss _per ..

Fig. 2 intuitively shows the effect of the image restoration of the technology by three examples, table 1 shows the quantitative statistical result of the effect achieved by the technology, and table 1 shows the average value of the effect of adhering water mist and adhering water beads in different methods. The statistics are averaged over the test samples at 72. In table 1, two similarity indicators, PSNR and SSIM, are used to represent the similarity of the restored image to the truth image, and a larger value indicates that the two compared parties are closer. As can be seen from table 1, the images with the adhered mist and adhered water-bead interference and the clear truth chart only gave scores of 17.9213 and 0.6283 on the two indexes before recovery, and the index values of 21.8073 and 0.7549 after recovery. At the same time, the present technique also exceeds the three networks for retraining of comparisons. The network structure for comparison is originally designed to handle simple adhering water droplets or mist in the atmosphere, so even if retrained on the problems addressed by the present method, the effect is difficult to exceed the more targeted present technique.

TABLE 1

It can be seen that in the embodiment of the application, through the acquisition scheme of the image with the water mist and the water drops and the corresponding clear image, a classification activation diagram is further adopted to introduce a spatial attention mask for a recovery algorithm; the convolution neural network structure emphasizing the airspace attention mechanism and the channel attention mechanism well realizes the removal of the water mist and water drops attached to the image.

The set of technical proposal designed for better removing the attached water mist and the attached water drops in the image comprises the use of a classified activation graph to introduce additional spatial attention; meanwhile, a convolutional neural network is designed by adopting modules such as a airspace attention mechanism, a channel attention mechanism, smooth cavity convolution and the like. Each design thought is matched with the characteristics of the task for removing the water dripping and mist aiming at the technology.

Considering the characteristics of the deep learning algorithm, only the network structure and the training model obtained by finely adjusting the network structure connection mode, the number of modules, the network layer parameters, the loss function and other training super parameters are included in the scope of the scheme.

Compared with the prior art, the embodiment of the application is designed and optimized for the more serious image quality degradation caused by the attached water mist interference in the image, and the advantages of the application can be divided into two parts:

First, the additional spatial attention mask, which is critical to the academic or vocational study method, is obtained and trained based on a manually set pixel difference threshold (i.e., 30 in DeRaindrop), and the additional spatial attention mask used in the embodiment of the present application is generated using a classification activation graph of a classification network. The generation mode of the embodiment of the application is based on a weak supervision mechanism, namely, in the process of training a network, a developer only needs to transmit the category of the algorithm image, and a distinguishing threshold value is not required to be set manually. Thus, the method can not only cope with the difficulty that the threshold cannot be set manually caused by the adhesion of the water mist, but also is beneficial to generalizing the algorithm to be applied to data containing various brightness such as sunny days, cloudy days and night, which is difficult to realize by the threshold dividing method in the prior art.

Secondly, besides using an additional airspace attention mask which is more intelligent and better in applicability, the embodiment of the application also uses airspace attention modules and channel attention modules in the design of the convolutional neural network in a large amount. Therefore, the convolutional neural network designed by the embodiment of the application can more effectively remove the attached water mist and the attached water drops in the image simultaneously than the method in the prior art.

It should be noted that introducing additional spatial attention masks is a relatively efficient and interpretable design concept for convolutional neural networks, but may be replaced by other methods from the perspective of performance evaluation. If the convolutional neural network used for recovering the image has other particularly effective and strong designs in aspects of feature extraction, airspace attention module and the like, the classified activation graph serving as the extra attention mask in the technology can be possibly canceled. But this extra attention mask helps to improve the interpretability of the deep learning method and also helps to promote its popularization in critical security areas.

Similarly, each module in the convolutional neural network may be replaced, for example, the jumper structure may have a slightly different residual learning module, a different channel or spatial attention mechanism, etc., and these changes may be combined to obtain a scheme with a similar effect. Therefore, other algorithms and schemes capable of achieving the same technical effects are also within the scope of the embodiments of the present application.

In addition to the pure algorithm scheme, the algorithm and the hardware combination mode can also achieve similar purposes. Assuming that all the required scenes are equipped with the wiper and the demister, only one image quality classification algorithm can be used to determine whether the attached water mist and the attached water beads are generated currently. When the classification algorithm makes a positive judgment, the result can be fed back to the demister and the windscreen wiper to remove the interference.

In summary, the embodiment of the application provides an image processing method for removing water droplets attached to a water droplet, which includes inputting an image to a classified convolutional neural network; the classification convolutional neural network classifies the images, generates a airspace attention mask according to a target classification result, superimposes the airspace attention mask with an original input image, and inputs the superimposed airspace attention mask to the feature coding module; the feature coding module filters useless features in the superimposed image, and inputs the filtered feature images to the smooth cavity convolution module; the smooth cavity convolution module extracts features aiming at the filtered feature images, enhances target retention features and inputs the processed feature images to a decoding module; the decoding module performs an operation of recovering an image according to the processed feature map; further, the restored image is output. The blurred image may be sharpened, for example, an unclear image disturbed by the adhering mist and adhering beads may be restored to a sharp image.

Based on the same technical concept, the embodiment of the application also provides an image processing system for removing the water droplets of the attached water mist, as shown in fig. 9, the system comprises:

The image input module 901 is configured to input an image to the classified convolutional neural network.

The additional spatial attention mask generating module 902 is configured to classify the image by using the classification convolutional neural network, generate a spatial attention mask according to a target classification result, superimpose the spatial attention mask on the original input image, and input the superimposed spatial attention mask to the feature encoding module 903.

The feature encoding module 903 is configured to filter the useless features in the superimposed image, and input the filtered feature map to the smooth hole convolution module 904.

The smooth hole convolution module 904 is configured to extract features from the filtered feature map, enhance target retention features, and input the processed feature map to the decoding module 905.

The decoding module 905 is configured to perform an operation of recovering an image with respect to the processed feature map.

And an image output module 906 for outputting the restored image.

Based on the same technical concept, the embodiment of the application also provides equipment, which comprises: the device comprises a data acquisition device, a processor and a memory; the data acquisition device is used for acquiring data; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform a method as set forth in any one of the methods above.

Based on the same technical concept, the embodiment of the application further provides a computer readable storage medium, wherein the computer readable storage medium contains one or more program instructions, and the one or more program instructions are used for executing the method according to any one of the methods.

In the present specification, each embodiment of the method is described in a progressive manner, and identical and similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments. For relevance, see the description of the method embodiments.

It should be noted that although the operations of the method of the present application are depicted in the drawings in a particular order, this does not require or imply that the operations be performed in that particular order or that all illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Although the application provides method operational steps as an example or a flowchart, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented by an apparatus or client product in practice, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment). The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element.

The units, devices or modules etc. set forth in the above embodiments may be implemented in particular by a computer chip or entity or by a product having a certain function. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when implementing the present application, the functions of each module may be implemented in the same or multiple pieces of software and/or hardware, or a module implementing the same function may be implemented by multiple sub-modules or a combination of sub-units. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed.

Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller can be regarded as a hardware component, and means for implementing various functions included therein can also be regarded as a structure within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

Various embodiments in this specification are described in a progressive manner, and identical or similar parts are all provided for each embodiment, each embodiment focusing on differences from other embodiments. The application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. An image processing method, the method comprising:

Inputting the image to a classified convolutional neural network;

the classification convolutional neural network classifies the image, judges the image to be clear or influenced by water drops and/or water mist, generates a airspace attention mask according to a target classification result, superimposes the airspace attention mask with an original input image, and inputs the superimposed airspace attention mask to the feature coding module;

outputting the restored image;

generating a airspace attention mask according to the target classification result and the following formula:

wherein t is the corresponding classification result, M is the calculation result of the classification activation diagram, namely the airspace attention mask, f is the characteristic diagram of the last layer of convolution layer, k is the kth channel of the characteristic diagram, omega _k The weight of the subsequent full-connection layer corresponding to the channel is (x, y) the airspace position coordinate;

The feature coding module comprises three convolution layers and three double-attention machine submodules, wherein each convolution layer is respectively followed by a double-attention mechanism; the convolution step length of the first two convolution layers is 1, and the convolution step length of the last convolution layer is 2, so that the length and width of the feature images transmitted in the network are respectively reduced to half of the original size;

the smooth hole convolution module extracts features for the filtered feature map and enhances target retention features, including:

the smooth hole convolution inputs the filtered characteristic diagram into a hole convolution layer and is smoothed by a common convolution layer; then a cavity convolution layer and a common convolution layer are arranged; finally, superposing the processed characteristic diagram and the input filtered characteristic diagram to form a residual error learning mechanism; the smooth hole convolution module comprises six smooth hole convolutions, wherein the hole rate of the first three is 2, and the hole rate of the second three is 4; each smooth hole convolution operation is followed by a double-attention machine sub-module;

the double-attention machine sub-module is a module combining a channel attention mechanism and a airspace attention mechanism; in a channel attention mechanism, an input feature map is respectively converted into two one-dimensional features through global maximum pooling and global average pooling, and the two one-dimensional features are processed through two shared full-connection layers, so that different channels of the original input feature map are given different weights; in the airspace attention mechanism, the input feature map is subjected to maximum value pooling and average pooling along a channel axis to obtain two single-channel two-dimensional features, and after the two single-channel two-dimensional features are overlapped, the two single-channel two-dimensional features are processed by a 7 multiplied by 7 convolution kernel convolution layer to generate airspace weight distribution of the original input feature map;

The decoding module comprises a transposed convolution layer amplified to two times and two common convolution layers; the transpose convolution layer and the first common convolution layer are respectively provided with a double-attention machine submodule;

in order to better train the deep learning model according to the embodiment of the application, a perceptual loss is added to the loss function to improve the detail quality of the generated image, wherein the perceptual loss is calculated by a classical VGG19 network pre-training model, and the calculation mode is shown by a formula (2):

wherein x, y, z respectively represent three dimensions of the feature map, f (I ^gt ) Representing a characteristic diagram obtained by 7 layers of convolution layers after a clear truth image is sent into a pretrained VGG19 model, f (I) ^out ) And similarly, a VGG19 characteristic diagram generated by the image recovered by the method;

Loss _total ＝1.0×Loss _mse +0.05×Loss _per ..

2. The method of claim 1, wherein the classification convolutional neural network classifies the image comprising:

3. An image processing system, the system comprising:

the additional airspace attention mask generation module is used for classifying the images by the classification convolutional neural network, judging the images to be clear or to be affected by the attached water drops and/or the attached water mist, generating airspace attention masks according to the target classification result, superposing the airspace attention masks with the original input images, and inputting the images to the feature coding module;

the feature coding module is used for filtering the useless features in the superimposed image and inputting the filtered feature images to the smooth cavity convolution module; the feature coding module comprises three convolution layers and three double-attention machine submodules, wherein each convolution layer is respectively followed by a double-attention mechanism; the convolution step length of the first two convolution layers is 1, and the convolution step length of the last convolution layer is 2, so that the length and width of the feature images transmitted in the network are respectively reduced to half of the original size;

the smooth cavity convolution module is used for extracting features aiming at the filtered feature images, enhancing target retention features and inputting the processed feature images to the decoding module; the smooth hole convolution module extracts features for the filtered feature map and enhances target retention features, including: the smooth hole convolution inputs the filtered characteristic diagram into a hole convolution layer and is smoothed by a common convolution layer; then a cavity convolution layer and a common convolution layer are arranged; finally, superposing the processed characteristic diagram and the input filtered characteristic diagram to form a residual error learning mechanism; the smooth hole convolution module comprises six smooth hole convolutions, wherein the hole rate of the first three is 2, and the hole rate of the second three is 4; each smooth hole convolution operation is followed by a double-attention machine sub-module;

The decoding module is used for carrying out image restoration operation on the processed feature images; the decoding module comprises a transposed convolution layer amplified to two times and two common convolution layers; the transpose convolution layer and the first common convolution layer are respectively provided with a double-attention machine submodule;

and the image output module is used for outputting the restored image.

4. An apparatus, the apparatus comprising: the device comprises a data acquisition device, a processor and a memory;

the data acquisition device is used for acquiring data; the memory is used for storing one or more program instructions; the processor being configured to execute one or more program instructions for performing the method of any of claims 1-2.

5. A computer readable storage medium, characterized in that the computer storage medium contains one or more program instructions for performing the method according to any of claims 1-2.