CN116029947B

CN116029947B - Complex optical image enhancement method, device and medium for severe environment

Info

Publication number: CN116029947B
Application number: CN202310326767.2A
Authority: CN
Inventors: 韩光洁; 王敏; 刁博宇; 李超
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-06-23
Anticipated expiration: 2043-03-30
Also published as: CN116029947A

Abstract

The invention discloses a complex optical image enhancement method, a device and a medium for severe environments, wherein the method specifically comprises the following steps: firstly, forming an experimental data set by utilizing images under different illumination factors; then constructing a plurality of attention modules to improve the image texture details; secondly, a feature-image matching fusion module is constructed to enhance feature characterization capability and model global perception capability; constructing a multi-scale neural network model to realize image enhancement under a complex and severe illumination scene; then training and generating an countermeasure network by using the data sample, constructing a countermeasure enhancement network model and outputting an enhanced image; and finally, the pre-trained model is deployed on the TVM equipment, the optimization model is automatically adjusted according to specific equipment and workload, the optimal performance is obtained, and a quick and accurate reasoning model can be provided in equipment with insufficient computing resources. The invention can reconstruct and strengthen the image distortion and blurring caused by complex illumination factors and improve the image enhancement effect.

Description

Complex optical image enhancement method, device and medium for severe environment

Technical Field

The invention relates to the field of image processing under complex illumination scenes, in particular to a complex optical image enhancement method, device and medium for severe environments.

Background

With the progress of technology and the development of human beings, a large number of edge devices and service platforms utilize visual technology to guide the operation, and analysis and decision making are performed through images. However, due to the complex field, for example, the water contains a large amount of suspended particles and silt, natural light scatters in the water, and the acquired image is clouded. Meanwhile, most devices cannot shoot targets in a short distance due to the limitation of terrains, or because shooting devices or target objects move rapidly, the resolution of some interested areas of images is low, and the images lack details, so that corresponding operations are affected. Therefore, the originally captured image needs to undergo image enhancement processing including denoising, deblurring, and color correction before use.

Existing methods for implementing image enhancement can be divided into three categories: non-model-based methods, and deep learning-based methods. Compared with the traditional image processing method, the method based on deep learning has the advantages of simplicity, rapidness and the like. However, most of the existing methods based on deep learning are designed only for a certain problem, such as image color correction, image denoising, etc., and few methods capable of simultaneously processing multiple tasks are available. In addition, the deep learning-based method requires a large amount of computation, and causes a large amount of computation load on the edge device, and in particular, in some severe environments, when devices with enough computing resources cannot be deployed, some models deployed on the device side cannot be used. The task of image enhancement in a severe scene environment is a significant challenge, both from the point of view of computational cost and time cost.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a complex optical image enhancement method, device and medium for severe environments.

The aim of the invention is realized by the following technical scheme: a first aspect of an embodiment of the present invention provides a complex optical image enhancement method for a harsh environment, including the steps of:

(1) Collecting images under different illumination conditions and different shooting conditions in a severe environment as image samples;

(2) Preprocessing the image sample acquired in the step (1) to acquire a reference image;

(3) Constructing an attention module in a multi-scale image enhancement network to extract texture information of an original image, wherein the attention module comprises a cross-scale fusion attention module, a channel attention module and a space attention module;

(4) Constructing a feature-image matching fusion module, and constructing a contrast image enhancement architecture model applicable to a severe environment according to the attention module and the feature-image matching fusion module, wherein the contrast image enhancement architecture model comprises a generator network for image enhancement and a discriminator network for discriminating images;

(5) Training an antagonistic image enhancement architecture model according to the original image acquired in the step (1) and the reference image acquired in the step (2), and performing optimization by using respective objective loss functions of a generator network and a discriminator network to perform back propagation so as to acquire a converged antagonistic image enhancement architecture model;

(6) Replacing a convolution operator in the converged antagonistic image enhancement architecture model obtained in the step (5) with a MEC operator to optimize the antagonistic image enhancement architecture model, reading the antagonistic image enhancement architecture model into a TVM compiler, then recompiling the antagonistic image enhancement architecture model, and transmitting the same to a server or edge equipment to further optimize the antagonistic image enhancement architecture model;

(7) And (3) inputting the real image sample into the contrast image enhancement architecture model optimized in the step (6) so as to obtain an enhanced image corresponding to the complex optical image.

Optionally, the step (2) includes the sub-steps of:

(2.1) enhancing all original image samples acquired in the step (1) by using a physical-based enhancement method through inherent mathematical relations inside the image to acquire a first enhanced image;

(2.2) placing all original image samples acquired in the step (1) into a deep learning-based method for enhancement to achieve optimal performance for achieving visual communication effect without scene constraints to acquire a second enhanced image;

and (2.3) manually comparing the first enhanced image and the second enhanced image, and selecting a reference image suitable for pairing with the original image according to the human visual perception effect.

Optionally, the step (3) comprises the following sub-steps:

(3.1) constructing a top-down-to-top cross-scale fusion attention module, wherein the cross-scale fusion attention module comprises an encoder and a decoder, the encoder aggregates features of different layers to obtain intermediate aggregate features, transmits the intermediate aggregate features to a decoder, and concatenates the advanced features and the intermediate aggregate features to obtain multi-scale fusion features;

(3.2) constructing a channel attention module, extracting hierarchical attribute features in the encoder network by adopting 1×1 and 3×3 convolutions to enhance feature representation, correcting gradient disappearance by adopting a ReLU activation function, and finally reducing the dimension of the channel attention module by using the 1×1 convolutions to extract advanced attribute features;

(3.3) constructing a spatial attention module, and weighting the local features through the pooling connection to extract background information and texture information of the image.

Optionally, the step (4) includes the sub-steps of:

(4.1) constructing a feature-image matching fusion module, and splicing coding features with different depths and original input images with the transformed sizes to finish feature-image matching fusion;

(4.2) embedding a trans-scale fusion attention module, a channel attention module, a space attention module and a feature-image matching fusion module based on CNN to construct an end-to-end U-shaped encoder-decoder image enhancement network, extracting image features through an encoder, and recovering to be a clear image by using a decoder;

(4.3) taking the U-shaped encoder-decoder image enhancement network as a generator network of image enhancement, and carrying a discriminator network in the PatchGAN network to construct an antagonistic image enhancement architecture model.

Optionally, the step (5) comprises the sub-steps of:

(5.1) generator network training process: inputting the original image acquired in the step (1) and the reference image acquired in the step (2) into a generator network to generate a new image, and performing back propagation optimization by utilizing a corresponding loss function to acquire a converged generator network;

(5.2) a discriminant network training process: inputting the image generated by the generator network and the reference image obtained in the step (2) into a discriminator network to judge the true and false of the image, and carrying out optimization by using the corresponding loss function back propagation to obtain a converged discriminator network.

Optionally, the loss function includes a Charbonnier loss function, a Universal loss function, and a Pertoptual loss function.

Optionally, the step (6) comprises the sub-steps of:

(6.1) replacing the convolution operator in the converged antagonistic image enhancement architecture model obtained in the step (5) with a MEC operator, reconfiguring the convolution to sparse matrix multiplication, and compressing sparse rows with blocks to optimize the antagonistic image enhancement architecture model;

(6.2) deploying the antagonism image enhancement architecture model in a server or an edge device, converting the antagonism image enhancement architecture model optimized in the step (6.1) into an ONNX format, and recompiling after reading into a TVM compiler to optimize the antagonism image enhancement architecture model;

(6.3) deploying the optimized contrast image enhancement architecture model of step (6.2) on an Nvidia V100 platform, and adjusting the high-performance convolutional network to further optimize the contrast image enhancement architecture model.

Optionally, the adjusting the high-performance convolutional network in the step (6.3) specifically includes: automatic tuning of the device is performed using the auto_tvm in the TVM compiler and the high performance convolutional network is tuned using the auto_scheduler in the TVM compiler.

A second aspect of the embodiments of the present invention provides a complex optical image enhancement device for a harsh environment, including one or more processors configured to implement the complex optical image enhancement method for a harsh environment described above.

A third aspect of the embodiments of the present invention provides a computer-readable storage medium having stored thereon a program for implementing the above-described complex optical image enhancement method for harsh environments when executed by a processor.

The invention has the beneficial effects that the invention utilizes the antagonistic image enhancement network, and adopts various attention modules and fusion modules to promote the image generator to generate more excellent images; the generator network adopts a deep cross-hierarchy input structure, and an image fusion technology is used for cascading the image and the characteristic image layer and embedding various attention modules to improve the enhancement result; in the model reasoning stage, based on a TVM compiler, the reasoning speed is increased by improving a common convolution operator, and an Auto-TVM and an Auto-Scheduler in the TVM compiler are utilized to optimize a convolutional neural network, so that an enhanced image can be quickly and effectively generated under the condition of insufficient equipment computing resources; compared with other deep learning methods, the method can rapidly process the images in large batches and is excellent in performance in a severe scene environment.

Drawings

FIG. 1 is a workflow diagram of a complex optical image enhancement method for harsh environments of the present invention;

FIG. 2 is an encoder feature generation intermediate multi-scale fusion feature of the present embodiment;

FIG. 3 is a decoder feature fusion intermediate multi-scale feature of the present embodiment;

FIG. 4 is a constructed channel attention module of the present embodiment;

fig. 5 is a built-up spatial attention module of the present embodiment;

fig. 6 is a built feature-image matching fusion module of the present embodiment;

FIG. 7 is a built image enhancement neural network model of the present embodiment;

fig. 8 is a diagram of the construction and convolution parameters of the arbiter network according to the present embodiment;

fig. 9 is a schematic structural view of a complex optical image enhancement device for harsh environments of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The complex optical image enhancement method for severe environments of the present invention, as shown in fig. 1, comprises the steps of:

(1) And acquiring images under different illumination conditions and different shooting conditions in a severe environment as image samples.

It should be noted that, the severe environment includes sea area and land scene, and the images under different illumination conditions and different shooting conditions in the sea area and land scene are collected as the image samples.

(2) Preprocessing the image sample acquired in the step (1) to acquire a reference image.

(2.1) enhancing all original image samples acquired in step (1) by using a physical-based enhancement method through inherent mathematical linkages within the image to acquire a first enhanced image.

Specifically, for all the collected original image samples, a physical-based enhancement method is used for enhancing through inherent mathematical relation in the image, for example, a transform domain method such as wavelet transform and homomorphic filtering can be adopted for enhancing the local image; and calculating a parameter enhancement degradation image of the dark channel prior algorithm by using a principal component analysis method, and further calculating an image with fuller edge details by using a second-order differential operation of a Gaussian operator to enable the Gaussian-Laplacian operator to obtain a first enhancement image.

It should be appreciated that principal component analysis converts a given set of related variables into another set of uncorrelated variables by linear transformation, with the new variables being arranged in order of decreasing variance, a common method of mathematical transformation, which is not described in detail herein.

(2.2) placing all of the raw image samples acquired in step (1) into a deep learning based method for enhancement to achieve optimal performance for achieving visual communication without scene constraints to obtain a second enhanced image.

Specifically, the image samples are transferred to an underwater image enhancement network model such as Watergan, which is currently the best performing, and a upgraded image of a different neural network model is obtained.

It should be appreciated that the best performance of a visual conveying effect may be judged using two evaluation indicators, uiqm and uciqe, which are typically dedicated to the assessment of underwater visual effects.

(3) An attention module in a multi-scale image enhancement network is constructed to extract texture information of an original image, wherein the attention module comprises a cross-scale fusion attention module, a channel attention module and a spatial attention module.

The three attention modules in the embodiment can more effectively extract texture information during model reasoning, can effectively remove interference caused by light rays, shaking and imaging in a complex scene, and can enhance the global perception capability of an image.

It should be understood that texture information in the original image is extracted so that enhancement processing can be performed.

(3.1) constructing a top-down-to-top cross-scale fusion attention module, wherein the cross-scale fusion attention module comprises an encoder and a decoder, the encoder aggregates features of different layers to obtain intermediate aggregate features, transmits the intermediate aggregate features to the decoder, and concatenates the advanced features and the intermediate aggregate features to obtain the multi-scale fusion features.

Specifically, aiming at the problems that the characteristic interaction of an end-to-end image encoder and a decoder designed based on a traditional convolutional neural network is inconvenient and information cannot effectively circulate, a top-down-to-bottom cross-scale fusion attention module in an image enhancement network for a severe scene environment is constructed, the characteristics of different layers are aggregated and are acted in a model decoder, and the working principle of the model decoder can be expressed as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing an intermediate aggregation feature at the ith layer in the network,/->

Representing the output characteristics of the i-th layer encoder, < >>

Representing the output characteristics of the j-th layer encoder, < >>

Representing the output characteristics of the k-th layer encoder, < >>

And->

Respectively represent up-sampling and down-sampling of the feature layer,/->

Representing a convolution operation. Different features are fused in a cascading mode, the fused features are changed into resolution consistent with the feature layer of the original coding layer by using convolution operation, and a specific flow is shown in fig. 2. When the aggregated features are transmitted to the decoder, the decoder will not only accept the high-level features, but also concatenate the intermediate aggregated features, fully understanding the high-level features and low-level attributes in the network model:

representing the output characteristics of the i-th layer in the decoder network.

In order to relieve the excessive network computing burden, low-level features are not cascaded to generate a clear image, and a specific flow of the intermediate features fused by a decoder is shown in fig. 3, and the intermediate features are still combined with the original image after a bottleneck block.

(3.2) constructing a channel attention module, extracting hierarchical attribute features in the encoder network by adopting 1×1 and 3×3 convolutions to strengthen the feature representation, adopting a ReLU activation function to correct gradient disappearance, and finally reducing the dimension of the channel attention module by adopting the 1×1 convolutions to extract high-level attribute features.

In this embodiment, in order to construct inter-channel dependency, the channel attention module is used to emphasize the features, and advanced feature information is selected to suppress invalid features, and the flow is shown in fig. 4. The feature representation is enhanced by extracting the adaptive attribute representation in the network pyramid and the problem of gradient disappearance is corrected using the ReLU activation function. In the module, features in a network are extracted by adopting 1×1 and 3×3 convolution, and two channel basic blocks with similar structures are cascaded:

where y represents the output characteristics, x represents the network characteristics of the basic block of input channels,

representing element-by-element additions>

Representing multiplication element by element>

Representing a sigmoid activation function. And finally, performing dimension reduction on the model by using 1 multiplied by 1 convolution, and completing cross-channel information interaction while avoiding a large amount of redundant calculation.

It should be appreciated that extracting hierarchical attribute features in the encoder network, i.e., extracting adaptive attributes in the network pyramid, aims at extracting features of each layer of the encoder decoder in the overall neural network model, which are referred to as pyramid features because of the different resolutions.

In this embodiment, the spatial attention module of the image enhancement network is constructed, and not all the feature areas in the network are equivalent to help to restore the clear image, so the spatial attention module is constructed to emphasize the contribution of the local area to restore the clear image, and the flow of the module is shown in fig. 5. The contribution of the design space attention module to distinguishing different areas of the image helps the model to process a local range with larger image color disturbance, and the local range is limited by a neighborhood when features are extracted, so that variance is increased or mean value deviation is caused. In the spatial attention module, the feature flow process is:

representing the mean pooling layer,/->

Representing the maximum pooling layer. The spatial attention module can more retain background information and texture information of images in the process of carrying out local feature weighting through effective pooling connection, provides real primary supervision signals for subsequent networks, and transmits more valuable information to deep networks.

(4) And constructing a feature-image matching fusion module and constructing a contrast image enhancement architecture model applicable to the severe environment according to the attention module and the feature-image matching fusion module, wherein the contrast image enhancement architecture model comprises a generator network for image enhancement and a discriminator network for discriminating images.

It should be appreciated that the generator network is used for image enhancement, and that a sharp image may be generated.

And (4.1) constructing a feature-image matching fusion module, and splicing the coding features with different depths and the original input images with the transformed sizes to finish feature-image matching fusion.

In this embodiment, a feature-image matching fusion module is constructed to splice the coding features of different depths with the original input image of the transformed size. The image resolution is 3×640×480, where 3 represents the RGB channel and 640×480 represents the image size. The input image generates a characteristic image layer with a resolution of 64 x 320 x 240 in the second layer of the network through convolution and the like. The picture resolution is first reduced using the torch.nn.functional.interface () function in pyrtorch, where the input parameter represents the input picture, the scale_factor parameter represents the scaling factor, and when scale_factor is 0.5, the picture size is changed from 640 x 480 to 320 x 240, i.e. the picture size is changed to 0.5 times the original one. And extracting features from the scaled image through convolution of 3×3 and 1×1, changing the channel number of the feature layer from 3 to 61, and cascading the feature layer with the image at the channel level, wherein the height and the width are unchanged, so that the feature layer is changed into 64 channels. The channel attention module is adopted to extract the high-level attribute of the generated feature layer, and a ReLU activation function is added to increase the nonlinearity of the network, so that gradient disappearance is avoided. At this time, the resolution of the two feature layers is 64×320×240, the coding feature map0 and the image feature map1 in the coding network are multiplied and fused element by element to obtain a new feature layer map2, and then the map2 and the map0 are added element by element to complete feature-image matching and fusion, as shown in fig. 6. The feature-image matching fusion module can remarkably improve the enhancement performance of the network, and can be described by the following formula:

where I represents the reduced resolution input image and B represents the underlying convolution block.

And (4.2) embedding a cross-scale fusion attention module, a channel attention module, a space attention module and a feature-image matching fusion module based on a CNN (convolutional neural network) to construct an end-to-end U-shaped encoder-decoder image enhancement network, extracting image features through an encoder, and recovering to be a clear image by using a decoder.

Specifically, in constructing an end-to-end U-type encoder-decoder image enhancement network, image features are extracted by an encoder and then restored to a clear image by a decoding network. In the image enhancement network, a built feature-image matching fusion module, a cross-scale fusion attention module, a channel attention module and a spatial attention module are embedded. The image enhancement network contains convolution kernels with different sizes, including 1×1, 3×3 and 5×5 convolution operations, the receptive fields with different sizes can extract more information from the image, and the large convolution kernels can collect more characteristic information at the same time, but the calculation amount is larger. And various convolution kernels are utilized to extract richer characteristic information, and the characteristic layers are subjected to cascade fusion, so that the visual quality of the final enhanced image is improved.

Specifically, the U-type encoder-decoder image enhancement network is used as a generator network of the antagonistic image enhancement architecture model, and an antagonistic image enhancement network architecture suitable for a multi-severe environment is formed by carrying a discriminator in the PatchGAN network, and the detailed architecture is shown in fig. 7.PatchGAN can divide the input image into blocks, map the input image into N matrix, and pay attention to more image local information, and the detailed flow of the discriminator is shown in figure 8. The resolution of the image is reduced by four times of 3X 3 convolutions in the discriminator, zero filling is carried out by using torrch.nn.zeropad2d (), the purpose of zero filling is to adjust the resolution of the characteristic diagram after convolution, and finally, patch is obtained by 3X 3 convolutions.

(5) Training a contrast image enhancement architecture model according to the original image acquired in the step (1) and the reference image acquired in the step (2), and back-propagating by utilizing a target loss function to optimize the model, and acquiring a converged contrast image enhancement architecture model.

In this embodiment, the antagonistic image enhancement architecture model is trained according to the original image acquired in step (1) and the reference image acquired in step (2). The image generated by the generator network and the reference image are simultaneously input into a discriminator network, the structure of which is shown in fig. 8, and the discriminator network judges the true or false of the image; the training network is then back-propagated with the respective objective loss functions, thereby enabling image enhancement. In this embodiment, the adaptive image enhancement architecture model is optimized by using default parameters of the Adam optimizer, the batch size is set to 16, the initial learning rate is set to 0.001, the total training times is 1000, and the learning rate is adjusted by using a simulated annealing algorithm until the model converges, so that the converged adaptive image enhancement architecture model can be obtained finally.

(5.1) generator network training process: inputting the original image acquired in the step (1) and the reference image acquired in the step (2) into a generator network, generating a new image through the processing of the generator network, and then training the generator network through back propagation by utilizing a corresponding loss function of the new image, so that a converged generator network can be acquired. In this embodiment, the generator network uses three common loss functions to supervise the model, namely a Charbonnier loss function, a Universal loss function and a Pertoptual loss function. The corresponding loss can be obtained through the loss function, and the smaller the value of the loss function is, the better the value is. And calculating the partial derivative of the loss function on each dimension parameter of the weight matrix, so that the influence efficiency of the one dimension parameter on the change of the loss function can be calculated.

(5.2) a discriminant network training process: the image generated by the generator network and the reference image are input into the discriminator network at the same time, so that the true and false of the image can be discriminated. The generated image enters a discriminator network and is subjected to convolution to obtain tensors, and the tensors and the values of the tensors are formed by 0 to calculate the mean square error. And the reference image is convolved into tensors after entering a discriminator network, and the tensors with the values consisting of 1 calculate the mean square error. The mean square errors calculated for the two images are then summed and the discriminant is trained by back propagation.

The back propagation training process is implemented by some commonly used loss functions, such as a Charbonnier loss function, a Pertoptual loss function, and the like. After the loss function is obtained, the smaller the value of the loss function is, the better. And calculating the partial derivative of the loss function on each dimension parameter of the weight matrix, so that the influence efficiency of the one dimension parameter on the change of the loss function can be calculated.

(6) Replacing the convolution operator in the converged contrast image enhancement architecture model obtained in the step (5) with a MEC operator to optimize the contrast image enhancement architecture model, reading the contrast image enhancement architecture model into a TVM compiler, then recompiling the same, and transmitting the same to a server or edge equipment to further optimize the contrast image enhancement architecture model.

In order to deploy the antagonistic image enhancement architecture model and perform image reasoning in a device under a severe environment, model parameters and operation resources need to be reduced as much as possible. In particular, convolution operators can be optimized to improve the inference efficiency of the contrast image enhancement architecture model, reducing computational expense. And reading the pre-trained antagonistic image enhancement architecture model into a TVM compiler, then recompiling, transmitting to edge equipment, optimizing the antagonistic image enhancement architecture model and performing related image reasoning.

And (6.1) a large number of convolution operations exist in the converged contrast image enhancement architecture model obtained in the step (5), and the substitution of the traditional convolution operator with the MEC operator can greatly reduce the memory access times during convolution kernel calculation, so that the deployment of the contrast image enhancement architecture model is facilitated. In addition, in the reasoning of the contrast image enhancement architecture model, the sparse matrix with a large number of zero elements in the feature layer, the convolution is reconfigured to sparse GEMM (General Matrix Multiplication, matrix multiplication), and the sparse matrix is more effectively represented and stored in a block compressed sparse line manner.

In this embodiment, the MEC operator is a convolution calculation method with high memory utilization and high speed. Most of the convolutions at present adopt an indirect calculation mode, and the memory occupation is too high although the execution efficiency is good. The MEC operator can split the calculation matrix, so that the intermediate matrix to be stored is reduced, and memory occupation and calculation speed are greatly improved.

(6.2) when the contrast image enhancement architecture model is deployed in the server or the edge device, firstly, converting the contrast image enhancement architecture model optimized in the step (6.1) into an ONNX format, reading the contrast image enhancement architecture model in the ONNX format by using a relay. And registering the operator in a TVM compiler to generate a TVM module library. And establishing communication between the local and target devices in the TVM through an RPC mechanism, further uploading the compiled contrast image enhancement architecture model to the target device by using an upload () function, and performing deployment loading on the contrast image enhancement architecture model.

It should be appreciated that the relay, front, from, onx () function, the relay, build () function, and the load () function are functions within the underlying operator library.

(6.3) after the optimized contrast image enhancement architecture model obtained in the step (6.2) is deployed on the Nvidia V100 platform, the high-performance convolutional network is adjusted, automatic adjustment of the device is performed by using the auto_tvm in the TVM compiler, the tuner is set to xgb algorithm in tuning option tuning_option { }, n_three is set to 1500, tuning operation is started by using tune_and_evaluation () function, and performance of the end-to-end model is evaluated. The auto_scheduler_tasks () function is adopted to extract search tasks and weights in the network, the end-to-end delay of the network is approximate to the product of the delay and the weights of the tasks, the run_tuning () function is used to perform the adjustment of the contrast image enhancement architecture model, the automatic adjustment is performed on the contrast image enhancement architecture model aiming at specific equipment and workload, and the reasoning calculation of the contrast image enhancement architecture model can be rapidly and effectively performed under the condition of insufficient equipment calculation resources.

It should be appreciated that the tune_and_estimate () function is a commonly used tuning and evaluation function that can be used in this embodiment to initiate tuning and evaluate the performance of the end-to-end model; the auto_scheduler_tasks () function is a commonly used function of an auto scheduler, extraction task, which in this embodiment can be used to extract search tasks and weights in the network; the run_tuning () function is a commonly used running tuning function, which in this embodiment can be used to tune the contrast image enhancement architecture model, automatically tuning the contrast image enhancement architecture model for a particular device and workload.

(7) And (3) inputting the real image sample into the optimized contrast image enhancement architecture model in the step (6) to obtain an enhancement image corresponding to the complex optical image, and finally outputting a clear image. In order to verify the quality of the enhanced image, the enhanced image generated by other methods is selected for comparison. The peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) of the images were measured and the images were compared quantitatively for their superiority.

The embodiment can improve the operation performance of related equipment and enhance the operation efficiency. The embodiment realizes the enhancement of the image by utilizing a countermeasure enhancement architecture, and adopts various attention modules and fusion modules to improve the image quality. By improving the network structure, model training parameters are reduced. And a cross-layer image input technology is applied to the network model, so that the network performance is improved. Based on the TVM platform foundation, an optimization operator is used for replacing a common convolution operator, and model reasoning capacity is quickened by improving access structure and data format. Based on the AotoTVM, the convolutional neural network is optimized, and the image enhancement model can be rapidly and effectively inferred and calculated under the condition of insufficient equipment calculation resources. Compared with the traditional method, the network is simple to realize, has strong generalization capability, can rapidly process images in a large scale, and still has excellent performance in a severe environment.

The present invention also provides an embodiment of a complex optical image enhancement device for harsh environments, corresponding to the aforementioned embodiment of a complex optical image enhancement method for harsh environments.

Referring to fig. 9, a complex optical image enhancement device for a harsh environment according to an embodiment of the present invention includes one or more processors for implementing the complex optical image enhancement method for a harsh environment in the above embodiment.

The embodiment of the complex optical image enhancement device for severe environments of the present invention can be applied to any device with data processing capability, such as a computer or the like. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 9, a hardware structure diagram of an arbitrary device with data processing capability where the complex optical image enhancement device for a severe environment of the present invention is located is shown in fig. 9, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 9, the arbitrary device with data processing capability where the device is located in an embodiment generally includes other hardware according to an actual function of the arbitrary device with data processing capability, which is not described herein again.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the present invention also provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the complex optical image enhancement method for harsh environments in the above-described embodiments.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of complex optical image enhancement for harsh environments, comprising the steps of:

said step (3) comprises the sub-steps of:

(3.3) constructing a spatial attention module, and weighting the local features through pooling connection to extract background information and texture information of the image;

said step (6) comprises the sub-steps of:

(6.3) deploying the optimized contrast image enhancement architecture model of step (6.2) on an Nvidia V100 platform, and adjusting the high-performance convolutional network to further optimize the contrast image enhancement architecture model; wherein, the adjustment of the high-performance convolution network is specifically: using auto_tvm in TVM compiler to make automatic adjustment of device, and using auto_scheduler in TVM compiler to adjust high performance convolution network;

2. The complex optical image enhancement method for harsh environments according to claim 1, characterized in that said step (2) comprises the following sub-steps:

3. The complex optical image enhancement method for harsh environments according to claim 1, characterized in that said step (4) comprises the following sub-steps:

4. The complex optical image enhancement method for harsh environments according to claim 1, characterized in that said step (5) comprises the following sub-steps:

5. The complex optical image enhancement method for harsh environments according to claim 4, wherein the loss functions comprise a chaseonnier loss function, an universal loss function, and a percustomer loss function.

6. A complex optical image enhancement device for harsh environments, comprising one or more processors for implementing the complex optical image enhancement method for harsh environments of any one of claims 1-5.

7. A computer readable storage medium, having stored thereon a program which, when executed by a processor, is adapted to carry out the complex optical image enhancement method for harsh environments according to any one of claims 1-5.