CN117575971A

CN117575971A - Image enhancement method, device, equipment and readable storage medium

Info

Publication number: CN117575971A
Application number: CN202311487671.0A
Authority: CN
Inventors: 刘会凯; 朱玟谦; 付斌; 刘程; 张澳
Original assignee: Lantu Automobile Technology Co Ltd
Current assignee: Lantu Automobile Technology Co Ltd
Priority date: 2023-11-09
Filing date: 2023-11-09
Publication date: 2024-02-20

Abstract

An image enhancement method, an image enhancement device, an image enhancement equipment and a readable storage medium relate to the field of image processing and comprise the steps of carrying out multistage depth semantic feature processing and downsampling convolution processing on an image to be enhanced based on sliding window mechanisms with different scales so as to output depth feature tensors; upsampling the depth feature tensor to output an enhanced image; wherein the sliding window mechanism of the first stage is constructed based on the moving window mechanism. The image enhancement effect can be effectively improved, and meanwhile, the complexity of the model structure, the computational complexity and the consumption of computational resources are reduced.

Description

Image enhancement method, device, equipment and readable storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image enhancement method, apparatus, device, and readable storage medium.

Background

With the wide popularization of visual sensors and the rapid development of artificial intelligence technology, the automated visual processing technology and method oriented to outdoor scenes have become research hotspots and research difficulties in industry and academia. The image enhancement technology oriented to various different environments is one of the technologies with the most rapid application development and the most extensive application in the fields of artificial intelligence and intelligent driving, and has important research and application values; the technology aims to process images shot in different environments such as severe weather through an algorithm model so as to remove weather noise (such as rainwater, snow and the like) in the images and then output high-quality images.

In the related art, the current image enhancement model can only enhance the image in a single weather environment, but the actual weather environment often contains multiple weather types, so that the types and the distribution of weather noise have various and complex characteristics, and therefore, the current image enhancement model is not suitable for enhancing the image in multiple weather environments, and the problem of poor image enhancement effect exists; in addition, although the current few techniques can primarily realize image enhancement under multiple weather conditions, the model structure and the calculation complexity are high and the consumption of calculation power resources is high, so that the instantaneity of image enhancement is affected.

Disclosure of Invention

The application provides an image enhancement method, an image enhancement device, image enhancement equipment and a readable storage medium, which can effectively improve the image enhancement effect and reduce the complexity of a model structure, the complexity of calculation and the consumption of computational resources.

In a first aspect, an embodiment of the present application provides an image enhancement method, including:

performing multi-stage depth semantic feature processing and downsampling convolution processing on the image to be enhanced based on sliding window mechanisms of different scales so as to output depth feature tensors;

upsampling the depth feature tensor to output an enhanced image;

Wherein the sliding window mechanism of the first stage is constructed based on the moving window mechanism.

With reference to the first aspect, in an implementation manner, the multi-stage depth semantic feature processing of the image to be enhanced based on the sliding window mechanism with different scales includes:

performing first-stage deep semantic feature processing on the image to be enhanced based on a sliding window mechanism of a first scale;

performing second-stage deep semantic feature processing on the output result of the first stage based on a second-scale sliding window mechanism;

carrying out third-stage deep semantic feature processing on the output result of the second stage based on a third-scale sliding window mechanism;

carrying out depth semantic feature processing of a fourth stage on the output result of the third stage based on a sliding window mechanism of the first scale;

wherein the third dimension is greater than the second dimension and the second dimension is greater than the first dimension.

With reference to the first aspect, in an implementation manner, the first-scale-based sliding window mechanism performs a first-stage deep semantic feature processing on an image to be enhanced, including:

regularizing the image to be enhanced based on a sliding window mechanism of a first scale to obtain a first regularized tensor;

Performing equal dimension mapping processing on the first regularized tensor to obtain a characteristic tensor;

regularization processing is carried out on the image to be enhanced and the characteristic tensor, so that a second regularized tensor is obtained;

and performing equal-dimension transformation on the second regularized tensor to obtain a target tensor, and repeatedly executing the steps once based on the target tensor to obtain a final target tensor.

With reference to the first aspect, in an implementation manner, the upsampling the depth feature tensor includes:

performing up-sampling processing on the depth characteristic tensor for a plurality of times;

performing deconvolution processing on the last upsampling processing result aiming at each upsampling processing to obtain a deconvolution tensor;

performing first convolution processing on the deconvolution tensor to obtain a first convolution tensor;

mapping the first convolution tensor through an activation function to obtain an activation tensor;

performing a second convolution process on the activation tensor to obtain a second convolution tensor,

residual connection is carried out on the second convolution tensor, the deconvolution tensor and a downsampling result corresponding to the current upsampling processing scale, and the residual result and the downsampling result corresponding to the next upsampling processing scale are used as input of next upsampling processing;

Wherein, in the first up-sampling process, the object of the deconvolution process is a depth feature tensor.

With reference to the first aspect, in one embodiment, the downsampling convolution process is implemented by a two-dimensional convolution with a step size of 2, a convolution kernel of 7, and an output channel dimension that is 2 times the input channel dimension.

In a second aspect, an embodiment of the present application provides an image enhancement apparatus, including: an encoder and a decoder;

the encoder is used for carrying out multi-stage depth semantic feature processing and downsampling convolution processing on the image to be enhanced based on sliding window mechanisms of different scales so as to output depth feature tensors;

the decoder is used for carrying out up-sampling processing on the depth characteristic tensor so as to output an enhanced image;

each stage of the encoder comprises a transformer module and a downsampling module which are connected in series, the transformer modules in different stages comprise sliding window multi-head self-attention modules with different scales, one sliding window multi-head self-attention module in the first stage is constructed based on a moving window mechanism, and the transformer modules and the downsampling modules are used for carrying out deep semantic feature processing and downsampling convolution processing on an image to be enhanced.

With reference to the second aspect, in one embodiment, the transformer module includes two attention units connected in series, and each attention unit includes a first layer regularization module, a sliding window multi-head self-attention module, a second layer regularization module and a multi-layer perceptron which are sequentially connected;

the first layer regularization module is used for regularizing the input tensor to obtain a first regularized tensor;

the sliding window multi-head self-attention module is used for carrying out equal-dimension mapping processing on the first regularization tensor to obtain a characteristic tensor;

the second layer regularization module is used for regularizing the input tensor and the characteristic tensor to obtain a second regularized tensor;

the multi-layer perceptron is used for carrying out equal-dimension transformation on the second regularized tensor to obtain a target tensor.

With reference to the second aspect, in one embodiment, a sliding window scale of the sliding window multi-head self-attention module in the first stage is a first scale; the sliding window scale of the sliding window multi-head self-attention module in the second stage is the second scale; the sliding window scale of the sliding window multi-head self-attention module in the third stage is a third scale; the sliding window scale of the sliding window multi-head self-attention module in the fourth stage is the first scale; wherein the third dimension is greater than the second dimension and the second dimension is greater than the first dimension.

With reference to the second aspect, in one embodiment, the decoder is specifically configured to:

With reference to the second aspect, in one embodiment, the downsampling module includes a two-dimensional convolution with a step size of 2, a convolution kernel of 7, and an output channel dimension that is 2 times the input channel dimension.

In a third aspect, embodiments of the present application provide an image enhancement apparatus, including a processor, a memory, and an image enhancement program stored on the memory and executable by the processor, wherein the image enhancement program, when executed by the processor, implements the steps of the image enhancement method as described above.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon an image enhancement program, wherein the image enhancement program, when executed by a processor, implements the steps of the image enhancement method as described above.

The beneficial effects that technical scheme that this application embodiment provided include:

the deep semantic feature processing and the downsampling convolution processing are carried out on the image to be enhanced, so that the feature learning capability of the model is improved, various environmental noises can be generalized robustly, the learned features not only cover detailed information in the original image, but also filter environmental noise information such as weather in the original image to a certain extent, and therefore accurate recovery of the image in various severe weather environmental scenes is achieved; and the hierarchical model architecture is efficiently realized by a mixed window mechanism with different scales and a moving window mechanism only in the first stage, so that the model structure and the calculation complexity are effectively reduced, the high-computation-power consumption of the model architecture based on the global is avoided, and the image enhancement quality can be obviously improved.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of an image enhancement method of the present application;

FIG. 2 is a schematic diagram of an encoder structure involved in an embodiment of the present application;

FIG. 3 is a schematic diagram of a transducer module according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a specific structure of an encoder according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a deconvolution residual module involved in an embodiment of the present application;

fig. 6 is a schematic diagram of a hardware structure of an image enhancement device according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

First, some technical terms in the present application are explained so as to facilitate understanding of the present application by those skilled in the art.

Transformer (Transformer): a deep learning model is widely applied to the field of natural language processing, such as machine translation, text classification, question-answering systems and the like.

Swin transducer: a transducer-based deep learning model.

Shifting window: the moving window mechanism is a sliding window mechanism proposed by a Swin transform, and the mechanism realizes efficient hierarchical feature expression by carrying out non-equal-scale division on global features and then carrying out multi-head self-attention calculation on divided local regions respectively.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In a first aspect, embodiments of the present application provide an image enhancement method.

In an embodiment, referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of an image enhancement method of the present application. As shown in fig. 1, the image enhancement method includes:

step S10: performing multi-stage depth semantic feature processing and downsampling convolution processing on the image to be enhanced based on sliding window mechanisms of different scales so as to output depth feature tensors; wherein the sliding window mechanism of the first stage is constructed based on the moving window mechanism.

By way of example, it should be appreciated that the current image enhancement method for environmental scenarios such as multiple types of severe weather that are true mainly faces 2 problems: 1) The method is applicable to limited scenes and image enhancement effects, namely, the current method is mainly to provide corresponding model algorithms aiming at specific low-quality single scenes (namely, the same weather environment such as a rainy day scene, a foggy day scene, a low-illumination scene and the like), and the enhancement effects of the model algorithms in the low-quality scenes mixed with various bad weather can be greatly weakened; 2) The real-time performance and the enhancement effect of the model are difficult to balance, and at present, a small part of technologies start to enhance images shot in various low-quality scenes in a targeted manner, but in order to enhance the generalization capability of the model, a complex model structure is often needed to be used, so that the calculation complexity and the calculation power resource consumption are high, and the real-time performance of the application is difficult to meet.

In order to solve the above problems, in the present embodiment, an image enhancement model is to be constructed based on an encoder-decoder structure, and there is also a jump connection between the encoder and the decoder, so that the encoder and the decoder together constitute one U-Net model structure, i.e., the encoder and the decoder constitute the end-to-end image enhancement model in a tandem and jump connection manner. Each stage of the encoder comprises a Transformer module converter and a downsampling module which are connected in series, the converter modules in different stages comprise sliding window multi-head self-attention modules in different scales, and one of the sliding window multi-head self-attention modules in the first stage is constructed based on a moving window mechanism. It can be seen that this embodiment will use a layered transform model based on a hybrid window mechanism as the backbone network of the encoder and combine with the downsampling module to continuously downsample the image to be enhanced to generate the depth feature tensor.

Because the transducer is good at capturing the dependency relationship among the long-range features, the global features of the image data can be fully learned, and the downsampling convolution is more advantageous in capturing the local invariance features, the embodiment fully uses the transducer and the downsampling convolution to perform multi-receptive field and multi-layered feature learning on the image, so that the feature learning capability of the encoder is improved, the deep semantic features of the image are learned, the features output by the encoder not only cover the detailed information in the original image, but also filter the environmental noise information such as weather in the original image to a certain extent, and the complexity of the model structure and the model calculation complexity are reduced.

Specifically, referring to fig. 2, the encoder includes a convolution layer (whose convolution kernel may be set to 7 and whose complement step size may be set to 2) for performing initial downsampling on an image to be enhanced, and four processing stages (i.e., stage1 to Stage 4), each of which is formed by connecting 1 transform module and 1 downsampling module in series, and the transform modules in different stages use different sliding window mechanisms, and the output of each Stage is to be used as the input of the next Stage; for example, for any input image, its image matrix data can be expressed in tensor form as Img e R ^3×H×W H and W represent the height and width of the input image, respectively, then the output of the ith Stage can be expressed asd represents the channel dimension, d _i ＝2 ⁱ ，H _i ＝H/2 ⁱ ,W _i ＝W/2 ⁱ I=1, 2,3,4, FM ₄ Transmitting the input as the decoder to the decoder for up-sampling, and finally obtaining an enhanced image; wherein FM ₁ 、FM ₂ 、FM ₃ And the data are input into a deconvolution residual error module corresponding to the decoder through jump connection.

In this embodiment, the scales of the sliding window multi-head self-attention modules in the transform modules at different stages are different, so as to efficiently implement a hierarchical transform architecture based on a hybrid window mechanism, so that different sliding window segmentation and fusion strategies can be used at different stages, and further, the relationship between different sliding windows is enhanced. In addition, the hybrid window mechanism can be generalized to be used in a plurality of classical models, and has higher popularization and application values.

In addition, although the shifting window mechanism can effectively enhance the information correlation between different local areas, the calculation complexity of the model is inevitably increased, but in this embodiment, the moving window mechanism is only used by one of the sliding window multi-head self-attention modules of the transform module in the first stage, so that the long-range dependence (i.e. long-range dependencies) relation of the features is obtained to the maximum extent with smaller calculation complexity, and the calculation complexity of the model is further effectively reduced.

Therefore, the embodiment carries out multi-stage depth semantic feature processing and downsampling convolution processing on the image to be enhanced through the encoders with sliding window mechanisms with different scales in the image enhancement model, so that the image enhancement effect can be effectively improved while the depth feature tensor is output, and the complexity of the model structure, the computational complexity and the consumption of computational power resources are reduced.

Step S20: and carrying out up-sampling processing on the depth characteristic tensor to output an enhanced image.

Illustratively, in this embodiment, the decoder upsamples the learned features (i.e., depth feature tensor) to restore the image to the original size, and in this process, eliminates environmental noise such as weather, and further outputs an enhanced image having the same spatial dimension and channel dimension as the image to be enhanced. Therefore, the embodiment performs multi-receptive field and multi-layer deep semantic feature learning by mixing the Transformer and the downsampling convolution in the encoder, so that the feature learning capability of the encoder is improved, and various environmental noises can be generalized robustly, so that the characteristics output by the encoder not only cover detailed information in an original image, but also filter environmental noise information such as weather in the original image to a certain extent, and the image precision recovery under various environmental scenes such as severe weather is realized; and the hierarchical Transformer architecture is efficiently realized by a mixed window mechanism with different scales and a moving window mechanism only in the first stage, so that the model structure and the calculation complexity are effectively reduced, the high-computation-power consumption of the overall-based Transformer is avoided, and the image enhancement quality can be remarkably improved.

Further, in one embodiment, the downsampling convolution process is implemented by a two-dimensional convolution with a step size of 2, a convolution kernel of 7, and an output channel dimension that is 2 times the input channel dimension.

For example, in this embodiment, the downsampling convolution processing is implemented by two-dimensional convolution with a step size of 2, a convolution kernel of 7, and an output channel dimension of 2 times the input channel dimension, that is, the downsampling module may be implemented by two-dimensional convolution filtering with a convolution kernel size of 7, a step size of 2, and an output channel dimension of 2 times the input channel dimension; it should be noted that, the specific value settings of the convolution kernel size and the step length are just presented in the embodiment, and the convolution kernel size and the step length can be adjusted to be larger or smaller according to actual requirements. Thus, for any input feature tensor In εR ^d×H×W In other words, the downsampling module maps tensor dimensions output by the transform module to The transducer module will generate In ε R ^d×H×W Mapping to dimensions is also R ^d×H×w Tensors of (a); therefore, each Stage reduces the input characteristic tensor scale to 1/4 of the original dimension, and the channel dimension is enlarged to 2 times of the original dimension.

Further, in an embodiment, the multi-stage depth semantic feature processing of the image to be enhanced based on the sliding window mechanism with different scales includes:

Illustratively, referring to FIG. 3, the deep semantic feature processing of each stage will be accomplished in this embodiment by a transducer module, each consisting of two attention units in series. Each attention unit is composed of a first layer regularization module, a sliding window multi-head self-attention module, a second layer regularization module and a multi-layer perceptron which are sequentially connected in series. It should be appreciated that for any input tensor In εR ^d×H×W The sliding window multi-head self-attention module first divides the input tensor into average segments A window of size h x w is then processed within each window by the classical multi-headed attention mechanism. The calculation formula and the calculation flow of the classical multi-head attention mechanism are expressed as follows:

where Q, K and V represent three linear transformations (i.e., query vector Q, key vector K, and value vector V), x represents the input signature; due to the multi-head attentionThe spatial dimensions of the input and output tensors of the module do not change, so the output tensor dimension of the multi-headed attention module in each window is h×w and the channel dimension is d, thereby obtainingOutput tensors of d×h×w; the dimension conversion operation is then utilized to obtain an output tensor with the dimension of d multiplied by H multiplied by W, and through the steps, the sliding window multi-head self-attention module inputs the tensor In epsilon R ^d×H×W Mapping to dimensions is also R ^d×H×W Tensors of (c).

In this embodiment, the sliding window dimensions of the sliding window multi-head self-attention module at different stages will be set to different sizes; it should be noted that the specific value settings of the first scale, the second scale, and the third scale may be determined according to actual requirements, as long as the third scale is larger than the second scale and the second scale is larger than the first scale, for example, the first scale h=w=4, the second scale h=w=16, and the third scale h=w=8.

Thus, for Stage1, the feature map will be partitioned using a sliding window of h=w=4, and a shifting window mechanism is used on the second sliding window multi-head self-attention module in the transform module to maximize the long-range dependency of the features with less computational complexity.

For Stage2, the feature map is partitioned using a sliding window of h=w=16, and neither sliding window multi-headed self-attention module in this Stage2 uses the shiftwindow mechanism.

For Stage3, the feature map is partitioned by using a sliding window with h=w=8, and neither of the two sliding window multi-head self-attention modules in Stage3 uses a shifting window mechanism to ensure that the features in each window remain the same receptive field as the features in the window in Stage 2.

For Stage4, the feature map is partitioned using a sliding window with h=w=4, and neither of the two sliding window multi-headed self-attention modules in this Stage4 uses a shifting window mechanism to ensure that the features within each window remain with the same receptive field as the features within the window in Stage 3.

It can be seen that the different sliding window designs used in the 4 stages constitute a hybrid sliding window mechanism, which enables different sliding window mechanisms to be used in the different stages of the transducer module based on the architecture shown in fig. 2, and thus can constitute an encoder as shown in fig. 4. In summary, this embodiment uses a moving window mechanism only in Stage1 of the encoder to further enhance semantic relevance between local regions, and reduce computational complexity and model complexity of the encoder; meanwhile, the balance of the characteristic expression capacity and the calculation power consumption of the encoder is realized through the controllable receptive field perception of the mixed sliding window in the subsequent Stage.

Further, in an embodiment, the first-scale based sliding window mechanism performs a first-stage deep semantic feature processing on the image to be enhanced, including:

By way of example, it should be appreciated that the deep semantic feature processing of each stage will be accomplished by an attention unit in the transducer module in this embodiment. Wherein, the theory of operation of attention unit does: input tensor IT epsilon R ^d×h×w Regularization of tensor elements is performed through a first layer regularization module to obtain a first regularized tensor; then the sliding window multi-head self-attention module performs equal dimension mapping processing on the first regularized tensor to obtain a characteristic tensor IT ₁ ∈R ^d×h×w The method comprises the steps of carrying out a first treatment on the surface of the Then use the residual structure to pair it+it ₁ A residual connection is made and a residual connection is made,obtaining tensors IT ₂ And takes the regularized data as the input of a second layer regularization module; second layer regularization pair IT ₂ Regularization of tensor elements is carried out, and a second regularized tensor is obtained; inputting the second regularized tensor into a multi-layer perceptron of equal-dimension transformation to obtain a target tensor IT ₃ ∈R ^d×h×w The method comprises the steps of carrying out a first treatment on the surface of the Then use residual structure to IT ₂ +IT ₃ Residual connection is carried out to obtain tensor IT ₄ And takes the input of the next attention unit as the input of the next attention unit, the working flow of the next attention unit is ended, and the output result IT of the transducer module is obtained after repeating the steps in the next attention unit ₅ The method comprises the steps of carrying out a first treatment on the surface of the IT because the tensor dimension is not changed throughout the process ₅ Is still R ^d×H×w However IT is ₅ After the processing of the downsampling module, the dimension of the product becomes

It should be noted that, since the principle and flow of the deep semantic feature processing in each stage are similar, the following embodiments will be explained by taking the deep semantic feature processing in the first stage as an example for simplicity of description. In this embodiment, first, regularization processing is performed on an input tensor through a first layer regularization module to obtain a first regularized tensor; performing equal dimension mapping processing on the first regularized tensor through a sliding window multi-head self-attention module to obtain a characteristic tensor; then regularizing the image to be enhanced and the characteristic tensor through a second layer regularization module to obtain a second regularized tensor; and then performing equal-dimension transformation on the second regularized tensor through the multi-layer perceptron to obtain a target tensor, and repeatedly executing the steps once based on the target tensor to obtain a final target tensor.

Further, in an embodiment, the upsampling the depth feature tensor includes:

Illustratively, in this embodiment, a decoder may be constructed using a classical deconvolution residual module and in series, and multiple upsampling processes of the depth feature tensor may be implemented by the decoder. The whole decoder can be formed by connecting 4 deconvolution residual modules in series, so as to restore the output characteristics of the encoder to the input image scale. Referring to fig. 5, each deconvolution residual module includes a deconvolution layer, a first convolution layer, an activation function, a second convolution layer, and a residual block connected in series in order, where the residual block is configured to perform residual connection on a tensor output by the first convolution layer, a tensor output by the second convolution layer, and a tensor output by the encoder. Since the encoder in the present embodiment has excellent generalization ability and outstanding image feature expression ability, a outstanding image enhancement effect can be achieved even if a general deconvolution residual module is used as a decoder.

It will be appreciated that, except that the input of the first deconvolution residual module is the output of the entire encoder, i.e. the object of the deconvolution process is the depth feature tensor in the first upsampling process, the inputs of the other three deconvolution residual modules consist of two parts: one is the output of the upper deconvolution residual block and the other is the output of the corresponding stage in the encoder; for example, the decoder second deconvolution residual block input is from the first deconvolution residual block output and the encoder Stage 3 output; the input of the third deconvolution residual error module of the decoder is from the output of the second deconvolution residual error module and the output of the encoder Stage 2; the input of the decoder fourth deconvolution residual block is from the third deconvolution residual block output and the output of the encoder Stage 1.

Since the operation principle of each upsampling process is the same, the following embodiments will be explained by taking the second upsampling process as an example for simplicity of description: for arbitrary input tensors In εR ^d×H×W The second deconvolution residual block first maps In to a tensor Out of dimension d 2H 2W by deconvolution with a step size of 2 and a convolution kernel of 4 ₁ The method comprises the steps of carrying out a first treatment on the surface of the Thereafter Out ₁ Via convolution kernel of 3, step size of 1 and output channel dimension ofIs mapped to +.>Out ₂ After mapping by the activation function, the convolution kernel is 3, the step length is 1 and the dimension of the output channel is +.>Is obtained after convolution processing Will eventually Out ₁ +Out ₃ +Out _en As output of the deconvolution residual block, where Out _en Is the output of Stage2 of the encoder.

In summary, the present embodiment proposes a transform module based on a hybrid sliding window to construct the infrastructure of an image encoder to map an image into deep feature map tensors, and then uses a deconvolution structureAnd performing dimension reduction on the image as a decoder to obtain the image with environmental noise such as climate removed. In the construction of the image enhancement model using the architecture as the encoder-decoder, model training is also required, and specifically, a training data pair (R _i ,Gt _i ) Input tensor R in (a) _i Input into a model, and processed by a coder and decoder to obtain an output result tensor Out _i Then calculate Out using the L1 penalty function _i And output tensor Gt _i And (3) carrying out back propagation on the network according to an SGD (Stochastic Gradient Descent) algorithm, optimizing model parameters, carrying out training iteration for a certain number of times to obtain an image enhancement model meeting the requirements, and stopping training.

It should be noted that, the image enhancement processing method provided in this embodiment can be used not only for enhancing images in multiple severe weather environments, but also for enhancing images in other scenes.

In a second aspect, embodiments of the present application further provide an image enhancement apparatus.

In one embodiment, an image enhancement apparatus includes: an encoder and a decoder;

According to the embodiment, multi-receptive field and multi-layer deep semantic feature learning is carried out through mixed use of a transducer and downsampling convolution in the encoder, so that the feature learning capability of the encoder is improved, various climatic noises can be generalized robustly, the features output by the encoder cover detailed information in an original image, and weather noise information in the original image is filtered to a certain extent, so that accurate recovery of images in various severe weather scenes is achieved; and the hierarchical Transformer architecture is efficiently realized by using a moving window mechanism through a mixed window mechanism with different scales and one sliding window multi-head self-attention module in the first stage, so that the model structure and the calculation complexity are effectively reduced, the high-computation consumption of the overall-based Transformer is avoided, and the image enhancement quality can be remarkably improved.

Further, in an embodiment, the transformer module includes two attention units connected in series, each attention unit includes a first layer regularization module, a sliding window multi-head self-attention module, a second layer regularization module and a multi-layer perceptron connected in sequence;

Further, in an embodiment, a sliding window scale of the sliding window multi-head self-attention module in the first stage is a first scale; the sliding window scale of the sliding window multi-head self-attention module in the second stage is the second scale; the sliding window scale of the sliding window multi-head self-attention module in the third stage is a third scale; the sliding window scale of the sliding window multi-head self-attention module in the fourth stage is the first scale; wherein the third dimension is greater than the second dimension and the second dimension is greater than the first dimension.

Further, in an embodiment, the decoder is configured to perform a multiple upsampling process on the depth feature tensor, where the decoder includes four deconvolution residual modules connected in series, each deconvolution residual module including a deconvolution layer, a first convolution layer, an activation function, a second convolution layer, and a residual block connected in series in sequence; for each up-sampling process, the deconvolution layer is used for deconvolution processing on the last up-sampling process result to obtain deconvolution tensor; the first convolution layer is used for carrying out first convolution processing on the deconvolution tensor to obtain a first convolution tensor; the activation function is used for mapping the first convolution tensor to obtain an activation tensor; the second convolution layer is used for carrying out second convolution processing on the activation tensor to obtain a second convolution tensor; the residual block is used for carrying out residual connection on the second convolution tensor, the deconvolution tensor and a downsampling result corresponding to the current upsampling processing scale, and taking the residual result and the downsampling result corresponding to the next upsampling processing scale as the input of the next upsampling processing; wherein the object of deconvolution processing of the first deconvolution residual module is a depth feature tensor.

Further, in an embodiment, the downsampling module comprises a two-dimensional convolution with a step size of 2, a convolution kernel of 7, and an output channel dimension that is 2 times the input channel dimension.

In summary, the codec structure in this embodiment has a relatively simple model structure and less computation effort consumption, and meanwhile, the encoder has superior feature learning capability, and can robustly generalize various climatic noises, so as to implement denoising and enhancement of images in various severe weather scenes, so that images such as rain, snow, fog and the like can be accurately reduced at the same time, thereby completing accurate recovery of images.

The functional implementation of each module in the image enhancement device corresponds to each step in the image enhancement method embodiment, and the functions and implementation processes thereof are not described in detail herein.

In a third aspect, embodiments of the present application provide an image enhancement apparatus, which may be an apparatus having a data processing function such as a personal computer (personal computer, PC), a notebook computer, a server, or the like.

Referring to fig. 6, fig. 6 is a schematic diagram of a hardware structure of an image enhancement device according to an embodiment of the present application. In an embodiment of the present application, the image enhancement device may include a processor, a memory, a communication interface, and a communication bus.

The communication bus may be of any type for implementing the processor, memory, and communication interface interconnections.

The communication interfaces include input/output (I/O) interfaces, physical interfaces, logical interfaces, and the like for realizing interconnection of devices inside the image enhancement apparatus, and interfaces for realizing interconnection of the image enhancement apparatus with other apparatuses (e.g., other computing apparatuses or user apparatuses). The physical interface may be an ethernet interface, a fiber optic interface, an ATM interface, etc.; the user device may be a Display, a Keyboard (Keyboard), or the like.

The memory may be various types of storage media such as random access memory (randomaccess memory, RAM), read-only memory (ROM), nonvolatile RAM (non-volatileRAM, NVRAM), flash memory, optical memory, hard disk, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (electrically erasable PROM, EEPROM), and the like.

The processor may be a general-purpose processor, and the general-purpose processor may call the image enhancement program stored in the memory and execute the image enhancement method provided in the embodiment of the present application. For example, the general purpose processor may be a central processing unit (central processing unit, CPU). The method executed when the image enhancement program is called may refer to various embodiments of the image enhancement method of the present application, and will not be described herein.

Those skilled in the art will appreciate that the hardware configuration shown in fig. 6 is not limiting of the application and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium.

The image enhancement program is stored on a storage medium readable by the application, wherein the image enhancement program, when executed by a processor, implements the steps of the image enhancement method as described above.

The method implemented when the image enhancement program is executed may refer to various embodiments of the image enhancement method of the present application, which are not described herein.

It should be noted that, the foregoing embodiment numbers are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments.

The terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the foregoing drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The terms "first," "second," and "third," etc. are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order, and are not limited to the fact that "first," "second," and "third" are not identical.

In the description of embodiments of the present application, "exemplary," "such as," or "for example," etc., are used to indicate an example, instance, or illustration. Any embodiment or design described herein as "exemplary," "such as" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary," "such as" or "for example," etc., is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and in addition, in the description of the embodiments of the present application, "plural" means two or more than two.

In some of the processes described in the embodiments of the present application, a plurality of operations or steps occurring in a particular order are included, but it should be understood that these operations or steps may be performed out of the order in which they occur in the embodiments of the present application or in parallel, the sequence numbers of the operations merely serve to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the processes may include more or fewer operations, and the operations or steps may be performed in sequence or in parallel, and the operations or steps may be combined.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising several instructions for causing a terminal device to perform the method described in the various embodiments of the present application.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims

1. An image enhancement method, the image enhancement method comprising:

Upsampling the depth feature tensor to output an enhanced image;

2. The image enhancement method according to claim 1, wherein the multi-stage depth semantic feature processing of the image to be enhanced based on a sliding window mechanism of different scales comprises:

3. The image enhancement method according to claim 2, wherein the first scale based sliding window mechanism performs a first stage of depth semantic feature processing on the image to be enhanced, comprising:

4. The image enhancement method according to claim 1, wherein the upsampling the depth feature tensor comprises:

5. The image enhancement method of claim 1, wherein: the downsampling convolution process is implemented by a two-dimensional convolution with a step size of 2, a convolution kernel of 7, and an output channel dimension that is 2 times the input channel dimension.

6. An image enhancement device, characterized in that the image enhancement device comprises: an encoder and a decoder;

7. The image enhancement apparatus of claim 6, wherein:

the transformer module comprises two attention units connected in series, and each attention unit comprises a first layer regularization module, a sliding window multi-head self-attention module, a second layer regularization module and a multi-layer perceptron which are sequentially connected;

8. The image enhancement apparatus according to claim 6 or 7, wherein:

the sliding window scale of the sliding window multi-head self-attention module in the first stage is a first scale;

the sliding window scale of the sliding window multi-head self-attention module in the second stage is the second scale;

the sliding window scale of the sliding window multi-head self-attention module in the third stage is a third scale;

The sliding window scale of the sliding window multi-head self-attention module in the fourth stage is the first scale;

9. An image enhancement device comprising a processor, a memory and an image enhancement program stored on the memory and executable by the processor, wherein the image enhancement program, when executed by the processor, implements the steps of the image enhancement method according to any one of claims 1 to 5.

10. A computer readable storage medium, wherein an image enhancement program is stored on the computer readable storage medium, wherein the image enhancement program, when executed by a processor, implements the steps of the image enhancement method according to any one of claims 1 to 5.