CN114663292A

CN114663292A - Ultra-lightweight picture defogging and identification network model and picture defogging and identification method

Info

Publication number: CN114663292A
Application number: CN202011527239.6A
Authority: CN
Inventors: 王中风; 王美琪; 苏天祺; 陈思依; 林军
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-06-24

Abstract

The application discloses an ultra-lightweight picture defogging and identification network model, through which the defogging and identification of pictures are realized, and the network model comprises a bidirectional GAN network model and a target detection network model which are sequentially connected. And the bidirectional GAN network model demists the fog image and outputs a clear image to the target detection network model for feature recognition processing. And performing pruning retraining on the target detection network model, wherein the training process comprises the steps of training the original images of the training set for multiple times, performing down-sampling on the original images by preset times before each training, sequencing and comparing the scaling coefficients of the batch normalization layer after each training, and removing the previous layer of convolution kernels corresponding to the channels of which the scaling coefficients are smaller than the preset scaling threshold value to realize pruning. The target detection network model is further pruned on the basis of the existing micro recognition model, the scale of ultra-lightweight picture defogging and the recognition network model is greatly reduced, and the target detection network model can be deployed on an end-side platform with limited calculation capacity and power consumption resources.

Description

Ultra-lightweight picture defogging and identification network model and picture defogging and identification method

Technical Field

The application relates to the technical field of picture processing, in particular to an ultra-lightweight picture defogging and identification network model and a picture defogging and identification method.

Background

In haze weather, the definition of the image acquired by the camera is reduced, so that the computer is difficult to recognize the object characteristics in the image. Therefore, before the image is recognized, the image needs to be defogged. At present, the defogging processing and the identification processing of the picture can be realized through a neural network model.

For defogging processing, a common neural network model is a Generative Adaptive Network (GAN), and the network model learns the mapping relationship between a foggy picture and a clear picture through a large amount of training data by adopting an unsupervised learning method, so as to implement defogging on the foggy picture. For the identification process, a commonly used neural network model is a yolo (young Only Look once) target detection model, the network model divides an input image into S × S grids, each grid is responsible for detecting an object with the center falling in the grid, and the identification of the image object features is realized by directly regressing a regression frame and a prediction type of the target at a plurality of positions of the input image.

The current YoLO target detection model is large, and a neural network model obtained by applying the YoLO target detection model to the image defogging and identification processes is not suitable for being deployed on an end-side platform with limited computing power and power consumption resources, such as a mobile phone or an automatic driving related device.

Disclosure of Invention

In order to solve the problem that the existing YOLO target detection model is large and a neural network model obtained in the process of applying the existing YOLO target detection model to image defogging and identification is not suitable for being deployed on an end-side platform with limited computing power and power consumption resources, the application discloses an ultra-light image defogging and identification network model and an image defogging and identification method through the following embodiments.

The application discloses in a first aspect, an ultra-lightweight image defogging and identification network model, including: the system comprises a bidirectional GAN network model and a target detection network model which are sequentially connected;

the bidirectional GAN network model is used for processing an input picture to be defogged and outputting a clear picture, and the target detection network model is used for carrying out feature recognition processing on the clear picture;

the target detection network model is a Yolo-Tiny-S network model which is subjected to pruning retraining; in the row pruning retraining process, original images in a target detection network model training set are trained for multiple times, before each training, the original images are subjected to down-sampling by preset times, after each training is finished, the scaling coefficients of the batch normalization layer are subjected to sequencing comparison, and the previous layer of convolution kernels corresponding to the channels of which the scaling coefficients are smaller than a preset scaling threshold value are removed to realize pruning;

the target detection network model comprises a direct feature processing module and a fused feature processing module, wherein the direct feature processing module comprises a front extraction unit, a middle extraction unit and a first rear extraction unit which are sequentially connected, a clear image passes through the front extraction unit and is input into the target detection network model, the fused feature processing module comprises a feature fusion splicing unit and a second rear extraction unit which are sequentially connected, and the front extraction unit and the output end of the middle extraction unit are connected to the input end of the feature fusion splicing unit.

Optionally, the front extraction unit includes a DBL-S combination subunit and a plurality of MDBL-S combination subunits connected in sequence;

the middle extraction unit comprises a plurality of MDBL-S combination subunits and a DBL-S combination subunit which are sequentially connected;

the first rear extraction unit and the second rear extraction unit respectively comprise one DBL-S combination subunit and one convolution subunit which are sequentially connected;

the feature fusion splicing unit comprises an up-sampling subunit and a feature splicing subunit, wherein the input end of the up-sampling subunit is connected with the output end of the middle extraction unit, and the output ends of the up-sampling subunit and the front extraction unit are connected to the input end of the feature splicing subunit;

the DBL-S combined subunit consists of a dark network convolution layer, a batch normalization layer and a leakage correction linear layer which are sequentially connected;

and the MDBL-S combined subunit consists of a maximum pooling layer and the DBL-S combined subunit which are sequentially connected.

Optionally, the bidirectional GAN network model includes an input module, a generation module, and a discrimination module;

the input module comprises a clear image input port and a fog image input port, the generation module comprises a first generation unit and a second generation unit, and the judgment module comprises a first discriminator and a second discriminator;

the clear image input port, the first generation unit and the first discriminator are sequentially connected and used for carrying out feature extraction and reconstruction on the clear image, and the fog image input port, the second generation unit and the second discriminator are sequentially connected and used for carrying out feature extraction and reconstruction on the fog image;

the first generation unit comprises a first encoder, a shared potential space and a first decoder which are connected in sequence, and the second generation unit comprises a second encoder, the shared potential space and a second decoder which are connected in sequence;

the shared potential space is used for storing high-level features and outputting the high-level features to the first decoder and the second decoder, wherein the high-level features comprise high-level features extracted by the first encoder for a clear image and high-level features extracted by the second encoder for a fog image;

the training set and the verification set of the bidirectional GAN network model both comprise paired clear graphs and fog graphs.

Optionally, the first encoder and the second encoder respectively include a first convolution block, a second convolution block, and a first coupling residual block, which are sequentially connected;

the first decoder and the second decoder respectively comprise a second coupling residual block, a first step length rolling block and a second step length rolling block which are connected in sequence;

the output end of the first coupling residual error block is connected to the input end of the shared potential space, and the output end of the shared potential space is connected to the input end of the second coupling residual error block;

the output end of the first convolution block is connected to the input end of the second step length convolution block in a jumping mode;

and the output end of the second convolution block is connected to the input end of the first step length convolution block in a hopping mode.

Optionally, the first coupling residual block and the second coupling residual block are respectively formed by cascading a plurality of sub-residual blocks;

and the sub-residual block at any stage comprises a first convolution layer, an activation function layer and a second convolution layer which are sequentially connected, and is used for processing the output result of the sub-residual block at the previous stage and outputting the processing result to the sub-residual block at the next stage and the sub-residual block at the next stage.

Optionally, the first arbiter and the second arbiter both include three layers of arbitration networks and one layer of granularity arbitration network, and each layer of arbitration network includes three convolutional layers and one activation function layer.

Optionally, the bidirectional GAN network model is optimized by the following loss function: an antagonistic loss function, an MSE loss function, and a total variation loss function are generated.

The second aspect of the application discloses a picture defogging and identification method, which comprises the following steps:

extracting depth information corresponding to different pixels in the picture to be defogged;

according to the depth information, performing depth processing on the picture to be defogged;

inputting the deeply processed picture to be defogged into a pre-constructed ultra-lightweight picture defogging and identification network model;

and acquiring the defogging and identification results of the image output by the ultra-lightweight image defogging and identification network model.

Optionally, the extracting depth information corresponding to different pixels in the picture to be defogged includes:

carrying out format conversion on the picture to be defogged, and extracting the brightness and the saturation of the picture to be defogged;

and generating depth information corresponding to different pixels in the picture to be defogged according to the brightness and the saturation.

A third aspect of the present application discloses a computer device comprising:

a memory for storing a computer program;

a processor for implementing the steps of the image defogging and identification method according to the second aspect of the present application when the computer program is executed.

A fourth aspect of the present application discloses a computer-readable storage medium having stored thereon a computer program which, when being processed and executed, implements the steps of the picture defogging and identification method according to the second aspect of the present application.

The application discloses an ultra-lightweight picture defogging and identification network model, and defogging and identification of pictures are realized through the network model, and the network model is composed of a bidirectional GAN network model and a target detection network model which are connected in sequence. The bidirectional GAN network model is used for processing an input picture to be defogged and outputting a clear picture, and the target detection network model is used for carrying out feature recognition processing on the clear picture. The target detection network model is a Yolo-Tiny-S network model which is subjected to pruning retraining, in the pruning retraining process, original images in a training set of the target detection network model are trained for multiple times, downsampling is conducted on the original images by preset times before each training, after each training is completed, ranking comparison is conducted on scaling coefficients of a batch normalization layer, previous layer convolution kernels corresponding to channels with the scaling coefficients smaller than a preset scaling threshold value are removed, and pruning is achieved. The target detection network model disclosed by the application is further pruned on the basis of the existing miniature identification model Yolov3-tiny, the scale of ultra-light picture defogging and identification network model is greatly reduced, the ultra-light picture defogging and identification network model can be conveniently deployed in chips of mobile terminals such as an end-side platform with limited calculation and power consumption resources or a vehicle-mounted camera, and high-speed target detection is further completed, and the method is more practical.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of an ultra-lightweight image defogging and identification network model disclosed in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a target detection network model disclosed in an embodiment of the present application;

FIG. 3 is a schematic diagram of another structure of a target detection network model disclosed in the embodiments of the present application;

FIG. 4 is a schematic structural diagram of a conventional defogging convolutional neural network;

fig. 5 is a schematic structural diagram of a bidirectional GAN network model disclosed in an embodiment of the present application;

fig. 6 is a schematic structural diagram of another bidirectional GAN network model disclosed in the embodiment of the present application;

fig. 7 is a schematic diagram of a shared potential space in a bidirectional GAN network model disclosed in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a first generating unit and a second generating unit in a bidirectional GAN network model disclosed in an embodiment of the present application;

fig. 9 is a schematic structural diagram of a first coupling residual block and a second coupling residual block in a bidirectional GAN network model disclosed in an embodiment of the present application;

fig. 10 is a schematic structural diagram of a first discriminator and a second discriminator in a bidirectional GAN network model disclosed in an embodiment of the present application;

fig. 11 is a schematic diagram of computing cross-domain conversion consistency for a bidirectional GAN network model according to an embodiment of the present application;

fig. 12 is a schematic application diagram of a bidirectional GAN network model disclosed in the embodiment of the present application;

fig. 13 is a schematic view of a workflow of a method for defogging and identification of an image according to an embodiment of the present application.

Detailed Description

Referring to fig. 1, a network model for defogging and recognition of an ultra-lightweight image disclosed in the first embodiment of the present application includes: the system comprises a bidirectional GAN network model and a target detection network model which are connected in sequence.

The bidirectional GAN network model is used for processing the input picture to be defogged and outputting a clear picture, namely the defogged picture.

And the target detection network model is used for carrying out feature recognition processing on the clear graph. The target detection network model is a Yolo-Tiny-S network model which is subjected to pruning retraining. In the row pruning retraining process, original images in a target detection network model training set are trained for multiple times, downsampling is conducted on the original images by preset times before each training, after each training is completed, ranking comparison is conducted on the scaling coefficients of the batch normalization layer, the previous layer of convolution kernels corresponding to the channels with the scaling coefficients smaller than the preset scaling threshold value are removed, and pruning is achieved.

In this embodiment, in order to enable the target detection network model to better complete rapid detection, the size of the model is compressed and the size of the input picture is reduced, but the convergence performance of the model needs to be ensured in the compression process, so a pruning retraining method is adopted to train the model, and the specific process includes:

firstly, directly training the original images in the training set, wherein the resolution of the original images is larger, data enhancement is used in the training process, and training pictures with different sizes are manufactured by adopting a larger scaling ratio.

And secondly, carrying out sequencing comparison on the scaling coefficients of the post-training batch normalization layer, and pruning channels with smaller scaling coefficients and compression requirements. Specifically, the convolution kernel of the previous layer corresponding to the channel with the smaller scaling coefficient is removed, and only the convolution kernel corresponding to the channel with the larger scaling coefficient in the batch normalization layer is reserved.

And thirdly, performing double downsampling on the original image, and repeating the first step and the second step to perform training again. And the scaling ratio is reduced in the training process to perform image enhancement on the original image, so that the phenomenon that the resolution of certain objects is too low to influence the convergence of the model in the small image scaling process is avoided. It should be noted that, in the process of sorting and comparing the pruning channels in the batch normalization layer, the proportion of the pruning channels is increased, and channel reduction is further performed.

And fourthly, performing down-sampling on the original image by higher times, and repeating the first step and the second step to perform retraining until the final calculated amount and the size of the input image meet the requirements of resources and speed. For example: the original image resolution is reduced from 512 × 512 to 128 × 128 at the time of input, and the number of model channels is reduced to about half of the currently general ultra-lightweight target detection model Yolov 3-tiny.

When the model converges to a better precision, the trained ultra-lightweight target detection model Yolo-Tiny-S is connected to an output port of the bidirectional GAN network model, so that a defogging and identification integrated model, namely an ultra-lightweight end-to-end image defogging and identification network model can be formed, and the existing real fog image can be directly identified in an end-to-end mode through the model.

The embodiment provides an ultra-lightweight target detection model Yolo-Tiny-S (micro-miniature Yolo detection model) for the first time, and completes target detection by combining with a bidirectional GAN network model, wherein the ultra-lightweight target detection model is based on a batch normalization layer on a lightweight target detection model Yolov3-Tiny, and is used for high-proportion pruning training of an iteration channel of a small input image.

Referring to fig. 2, the target detection network model includes a feature direct processing module and a feature fusion processing module, the feature direct processing module includes a front extraction unit, a middle extraction unit and a first rear extraction unit, which are connected in sequence, and the clearness image is input into the target detection network model through the front extraction unit. The feature fusion processing module comprises a feature fusion splicing unit and a second rear extraction unit which are sequentially connected, and the output ends of the front extraction unit and the middle extraction unit are connected to the input end of the feature fusion splicing unit.

Referring to fig. 3, the feature fusion splicing unit includes an up-sampling subunit and a feature splicing subunit, an input end of the up-sampling subunit is connected to an output end of the middle extraction unit, and an output end of the up-sampling subunit and an output end of the front extraction unit are connected to an input end of the feature splicing subunit.

The front extraction unit comprises a DBL-S combination subunit and a plurality of MDBL-S combination subunits which are sequentially connected. In this embodiment, the front extraction unit includes a DBL-S combination subunit and four MDBL-S combination subunits connected in sequence.

The middle extraction unit comprises a plurality of MDBL-S combination subunits and one DBL-S combination subunit which are sequentially connected. In this embodiment, the front extraction unit includes two MDBL-S combination sub-units and one DBL-S combination sub-unit connected in sequence.

The first rear extraction unit and the second rear extraction unit both comprise one DBL-S combination subunit and one convolution (Conv) subunit which are connected in sequence.

The DBL-S combined subunit consists of a dark network convolution layer (DarkNet convolution layer), a batch normalization layer (BN layer) and a leakage correction linear layer (LeakyRelu layer) which are connected in sequence. Wherein, S denotes slims, which is used to indicate that the corresponding module has been reduced, and the dark network convolution layer is a dark network reduced by the channel.

The MDBL-S combined subunit consists of a maximum pooling layer (Maxpool layer) and the DBL-S combined subunit which are connected in sequence.

With reference to fig. 2 and 3, the clear picture after defogging is taken as an input, feature extraction is completed through the combination of the front extraction unit and the middle extraction unit, and then two feature maps with different sizes are respectively obtained from the middle part and the rear part of the network for prediction. Specifically, each grid on a feature map predicts 3 prediction frames, each prediction frame requires five basic parameters of (x, y, w, h, confidence) (prediction x position, prediction y position, prediction frame width, prediction frame height, prediction confidence), represents the center point position, width and height of the prediction frame, and the confidence of the existence of the frame, and outputs a probability for each category. And then, after the feature splicing subunit is used for up-sampling the feature map led out from the rear part of the network, the feature map is connected with the feature map in the middle part of the network (namely the output end of the front extraction unit) to form a new fusion feature map. The feature map directly generated at the rear part of the network (namely the output end of the middle extraction unit) and the new fusion feature map are respectively sent into the first rear extraction unit and the second rear extraction unit to generate final prediction classification and regression frame, namely, the identification of the defogged clear image can be completed, y1 and y2 in the graph 3 represent two branches with the same action, y1 and y2 respectively contain the information of the prediction classification result and the regression frame, the prediction classification result refers to which kind of object is identified from the detected object of the image, and the regression frame is the frame representing the position of the detected object.

For image defogging, a defogging convolutional neural network commonly used at present is a Generative countermeasure network (GAN), and the network model adopts an unsupervised learning method and comprises two modules: the mutual game learning of the generation module and the discrimination module generates output with high accuracy. The generating module includes an encoder for extracting feature vectors in the input picture and a decoder for recovering low-level features from the feature vectors, as shown in fig. 4.

However, in the prior art, only the GAN is used for performing the unidirectional conversion from the fog image to the clear image, so that the GAN only includes the unidirectional mapping relationship from the fog region to the clear region, and after the fog image is processed by the encoder and the decoder, the output clear image usually has a halo effect and an artificial flaw, and the information retention of the original image is not good.

The present embodiment relates to a bidirectional GAN network model, which includes an input module, a generation module, and a discrimination module, referring to fig. 5.

The input module comprises a clear image input port and a fog image input port, the generation module comprises a first generation unit and a second generation unit, and the discrimination module comprises a first discriminator and a second discriminator.

The sharp image input port, the first generation unit and the first discriminator are sequentially connected and used for performing feature extraction and reconstruction on a sharp image, and the fog image input port, the second generation unit and the second discriminator are sequentially connected and used for performing feature extraction and reconstruction on a fog image.

The bidirectional GAN network model shown in FIG. 5 can defogge a fog image, and also can reversely fog a clear image, so that the consistency of change between cross-domains is enhanced, and further the conversion effect between the cross-domains can be better restrained, so that the defogged image is more natural during specific application, and the defogging effect on a real fog image is enhanced. In the existing defogging technology, only GAN is used for one-way conversion from a fog image to a clear image, and only the mapping relation from a fog domain to the clear domain can be output, so that the output of the image has a halo effect and artificial flaws, and the final original image information is not well retained.

Referring to fig. 6, in the bidirectional GAN network model designed in this embodiment, the first generating unit includes a first encoder, a shared potential space, and a first decoder that are sequentially connected, and the second generating unit includes a second encoder, the shared potential space, and a second decoder that are sequentially connected.

The shared potential space is used for storing high-level features, and outputting the high-level features to the first decoder and the second decoder, wherein the high-level features comprise high-level features extracted by the first encoder for a sharp image and high-level features extracted by the second encoder for a fog image.

In this embodiment, the characteristic connection from the fog pattern to the clear pattern can be completed through the potential space, assuming that the fog pattern and the clear pattern share one potential space therebetween. Referring to fig. 7, a clear picture X and a fog picture Y, which can be connected by a shared potential space Z, recover the images in both domains. Based on the theoretical basis, the bidirectional GAN network model disclosed in the embodiment can well process the real fog image and the synthesized fog image, and the fog image is restored and reconstructed by combining the bidirectional generation countermeasure network in a mode of firstly performing variable coding and then decoding. The bidirectional GAN network model completes training and verification through paired clear images and fog images, comprises a bidirectional mapping relation between fog domains and clear domains, can process pictures in different domains, and effectively ensures authenticity of picture reconstruction.

The embodiment discloses a bidirectional GAN network model used as a defogging network for effectively processing real dense fog, the network has a novel and effective structure, can extract depth information of fog maps at different depths, has great advantages in the aspects of remote end and dense fog processing, and can greatly improve the accuracy of subsequent object identification. In the former method, due to the limitation of a defogging model, a real fog image cannot be used for direct training, so that a corresponding defogging result cannot be well adapted to detection in an actual fog image scene. The bidirectional GAN network model can utilize real images to carry out end-to-end training, so that the information of real fog images can be better utilized to assist the improvement of the detection effect in the real fog image scene, the fog image information under the real scene can be better utilized through the joint training of real and synthetic data sets, the difficulty that the network training is difficult to carry out due to the lack of paired training data is avoided, and the learning of the real information enables the subsequent target detection network model to achieve a good generalization effect in the real scene.

Further, referring to fig. 8, the first encoder and the second encoder respectively include a first convolution block, a second convolution block and a first coupling residual block, which are sequentially connected.

The first decoder and the second decoder respectively comprise a second coupling residual block, a first step length rolling block and a second step length rolling block which are connected in sequence.

The output end of the first coupling residual error block is connected to the input end of the shared potential space, and the output end of the shared potential space is connected to the input end of the second coupling residual error block.

And the output end of the first convolution block is connected to the input end of the second step-length convolution block in a jumping mode.

The first convolution block and the second convolution block are used for extracting high-level features of an input picture, and the first coupling residual block and the second coupling residual block are used for learning detail information of different features of the picture, so that the picture can be restored and reconstructed conveniently. The first step and the second step of the convolution finish the reconstruction process from the high-level features to the output picture (a sharp image or a fog image). The encoder and decoder are tightly connected by a jump connection. After each layer of the convolution blocks, before a jump link is linked to the corresponding step convolution block, the jump link links the output characteristic diagram of the convolution block of the encoder to the input of the corresponding decoder and splices the characteristic diagram at a dimension level, so that the characteristic diagram output by the decoder can include the characteristic information of the original image. The jump connection enables the detail texture of the picture to be better learned, and the connection of the two layers of jump connections on the high dimensionality and the low dimensionality can be used for independently processing different position information of the picture, so that the consistency of an output picture and an input picture and the authenticity of reconstruction are further ensured.

Further, referring to fig. 9, the first coupling residual block and the second coupling residual block are respectively formed by cascading a plurality of sub-residual blocks.

In this embodiment, each sub-residual block is composed of two convolution layers and an activation function layer, the output and the input of a plurality of (three shown in fig. 9) sub-residual blocks are cascaded with each other, each sub-residual block processes the input of the current stage and the low-dimensional input result of the previous stage respectively to extract feature information of different dimensions, and the calculation result is sent to the next sub-residual block and the next sub-residual block respectively. Through the close connection of multiple layers of sub-residual blocks, residual information of fog features with different dimensions can be learned.

And obtaining a generated picture through the encoding and decoding of the encoder and the decoder, and inputting the generated picture to the judging module for judging. Referring to fig. 10, the first discriminator and the second discriminator respectively include three layers of discrimination networks, each layer of discrimination network includes three convolutional layers and one layer of activation function, each layer of discrimination network down-samples the input of the previous stage, compares the output results of different granularities of the three layers of discrimination networks with the real feature map, judges whether the network output meets expectations, applies a penalty to the training results that do not meet expectations, calculates the discrimination loss, and transmits the penalty back to the network to update parameters. By adopting a multi-stage discrimination network, the output result generated by the bidirectional GAN network model is subjected to three-layer characteristic sampling and comparison to compare coarse granularity with fine granularity, so that the discrimination accuracy of a discrimination module is increased.

For the bidirectional GAN network model, the embodiment adopts the loss function, the auxiliary model can be better expressed, and the optimized loss function includes generation of an antagonistic loss function, an MSE loss function and a total variation loss function.

In order to achieve good training effect, the network needs the loss function to include the generation of the confrontational loss, which is derived from whether the picture comes from the generator g (i) or the real data J. Classical generative confrontation losses are defined as follows:

L_GAN(G,D)＝E_I(log D(J))+E_I(log(1-D(G(I)))；

with reference to fig. 11, the bidirectional GAN network model generates the countermeasures loss in both directions, and calculates the cross-domain conversion consistency as follows:

L_adv＝L_GAN(G_c(I_c)，Dis_h)+L_GAN(G_h(I_h)，Dis_c)；

the LGAN () represents the classical confrontation generation loss, Ih represents the input fog pattern X, Ic represents the input clear pattern Y, Gc () represents the corresponding generation network in the clear scene, Gh () represents the corresponding generation network in the fog scene, Disc represents the discrimination module in the clear scene, and Disc represents the discrimination module in the fog scene. By sharing a potential space theory and combining bidirectional generation with resistance loss, authenticity of a reconstructed picture and similarity of results of cross-domain picture conversion can be guaranteed.

In the training of the synthetic fog picture, an MSE loss function is adopted to ensure that the predicted pictures G (I) are similar to the real picture (ground route) J. The MSE loss function is shown below:

L_MSE＝||G_c(I_c)-J_h||₁+||G_h(I_h)-J_c||₁；

wherein Jh and Jc respectively represent fog map data and clear map data in a real scene, and the loss is defined as the difference | () | |1 of pixels generating the fog (clear) map and the real fog (clear) map, which represents the mean value (mean square error) of the sum of squares of errors of corresponding pixel points on the picture.

In the training of the fog image, a total variation loss function is adopted to eliminate related artificial defects, so that the visual effect of the image is better, and the texture and the details of the image are saved. The total variation loss function is as follows:

wherein the content of the first and second substances,

and

indicating the difference in the horizontal and vertical gradients.

Through the training, when the model converges to a better precision, the obtained training weight can be stored, and then the final bidirectional GAN defogging network model can be obtained, and the model at the moment can defogge the existing fog map in an end-to-end mode.

Referring to fig. 12, when the bidirectional GAN network model disclosed in this embodiment is deployed in an actual application scenario, the defogging process includes the following steps:

1. and inputting a real fog picture in a real scene.

2. Preprocessing the picture, converting the picture from an RGB space to an HSV space, and outputting the depth information of the picture by combining with known parameters.

3. And combining the depth information, processing the fog diagram, storing the fog diagram codes into a shared potential space, and coding the pictures of the shared potential space to generate clear defogged pictures in a real scene.

In order to enhance the consistency of cross-domain conversion and obtain a more real defogged picture, the existing unidirectional GAN network is improved, based on the same shared potential space, two unidirectional GAN networks of a clear map → a potential space → a fog map and a fog map → a potential space → a clear map are fused in one module through two groups of encoders and decoders, a bidirectional GAN network model capable of generating the fog/clear map is formed, the cross-domain conversion loss of the fog map → the clear map → the fog map and the clear map → the fog map → the clear map is calculated through corresponding discriminators, and the distribution conditions of the fog and the clear map can be better learned.

The bidirectional GAN network model disclosed in this embodiment proposes a bidirectional defogging network that is encoded first and then decoded based on the assumption of shared potential space, and this network can process pictures in different domains, has strong adaptability to picture conversion between cross-domains, can process fog information in different depths, and has strong advantages in far-end and dense fog processing. Jump links, coupling residual blocks and a multilayer discrimination structure are applied, the robustness of the network is greatly improved, and the image distribution of the defogged clear image can be more accurately fitted. Real fog information can be used, depth information can be extracted through preprocessing, fog under deep concentration can be conveniently processed, and good expression can be achieved in a real scene.

The bidirectional GAN network model disclosed in the above embodiment is utilized to perform defogging processing on a picture, and in practical application, the depth information corresponding to different pixels in the to-be-defogged picture is extracted to perform depth processing on the to-be-defogged picture; and inputting the deeply processed picture to be defogged into a pre-constructed bidirectional GAN network model so as to obtain the defogged picture output by the bidirectional GAN network model.

The application discloses an ultra-lightweight picture defogging and identification network model, and defogging and identification of pictures are realized through the network model, and the network model is composed of a bidirectional GAN network model and a target detection network model which are connected in sequence. The target detection network model is further pruned on the basis of the existing micro recognition model Yolov3-tiny, the scale of ultra-light picture defogging and the recognition network model is greatly reduced, the target detection network model can be conveniently deployed in chips of mobile terminals such as an end-side platform with limited calculation and power consumption resources or a vehicle-mounted camera, and the like, high-speed target detection is further completed, and the method has higher practicability.

The ultra-lightweight picture defogging and identification network model comprises a defogging module and an identification module, wherein the defogging module adopts a bidirectional GAN network model, the identification module adopts Yolo-Tiny-S as a target detection network model, and the two modules can be directly connected on the basis of respective independent training so as to carry out fog picture scene target detection with higher recognition rate and higher speed, and avoid huge manpower and time consumption in a joint training process when a real fog picture data set for a target detection task is marked. Because the defogging module can learn the real distribution condition of the fog by combining the depth information of the real fog image, after the preorder defogging module is connected with the object recognition network, the joint training of a target detection data set (which is missing at present) under the real fog image is not needed, and the defogging module and the recognition module which are trained respectively can be directly connected, so that the integrated fog recognition working system under the real scene can be formed.

Referring to fig. 13, a second embodiment of the present application discloses a method for defogging and identification of an image, including:

step S11, extracting depth information corresponding to different pixels in the picture to be defogged.

And step S12, performing depth processing on the picture to be defogged according to the depth information.

Specifically, format conversion is carried out on the picture to be defogged, and the picture to be defogged is converted from an RGB format to an HSV format based on OpenCV, so that the brightness v and the saturation s of the picture to be defogged are extracted. OpenCV is a BSD license (open source) based distributed cross-platform computer vision and machine learning software library. Specifically, in the shadow detection algorithm, an image in an RGB format is often converted into an HSV format, for a shadow region, the chromaticity and saturation of the shadow region do not change much relative to the original image, mainly the luminance information changes greatly, and H, S, V components can be obtained by converting the RGB format into the HSV format, so that the values of the chromaticity, the saturation and the luminance can be obtained.

And generating depth information corresponding to different pixels in the picture to be defogged according to the brightness v and the saturation s.

In actual operation, according to the brightness v and the saturation s, calculation is performed through an existing depth information formula, known parameters theta 0, theta 1 and theta 2 are linearly calculated, and depth information d corresponding to different pixels in the picture to be defogged is obtained.

The depth information calculation formula is as follows:

d(X)＝θ0+θ1v(X)+θ2s(X)+ε(X)；

wherein X represents the picture to be defogged, epsilon (X) represents the random error of the model by a random variable, and epsilon can be taken as a random graph.

The input original image to be defogged is subjected to depth processing through the depth information, and the expression effect of a deep area is strengthened, for example, the brightness is improved and the contrast is enhanced aiming at a far place, so that a long-range view is clearer, and the like.

And step S13, inputting the to-be-defogged image subjected to the deep processing into a pre-constructed ultra-lightweight image defogging and identification network model.

And step S14, acquiring the defogging and identification results of the image output by the ultra-lightweight image defogging and identification network model.

The ultra-lightweight picture defogging and identification network model comprises a defogging module and an identification module, the bidirectional GAN network model of the defogging module can well extract depth information of a real fog image and synthesize the fog image to be processed, the picture is interpreted to a shared potential space through a mode of firstly changing codes and then decoding, then reconstruction is carried out, a clear image obtained by reconstruction is sent to the identification module, and the Yolo-Tiny-S model of the identification module can effectively identify objects so as to realize the purpose of identifying the objects after defogging.

The third embodiment of the present application discloses a computer device, including:

a memory for storing a computer program.

A processor, configured to implement the steps of the image defogging and identification method according to the second embodiment of the present application when the computer program is executed.

A fourth embodiment of the present application discloses a computer-readable storage medium, on which a computer program is stored, and the computer program, when being processed and executed, implements the steps of the image defogging and identification method according to the second embodiment of the present application.

The application provides a network model integrating image defogging and image recognition for a microminiature object recognition network for the first time through the embodiment aiming at the problem that the identification accuracy of a fog image in an object recognition task is low. By improving a training model of an end-to-end defogging network, a network model for defogging and identification is provided, wherein the network model is trained by mixing real pictures and synthetic pictures, and can extract depth information of the pictures for analysis, so that the end-to-end defogging network can correctly analyze depth details and dense fog information of the pictures, the pictures can have a better defogging treatment effect in a dense fog scene, and the defogged model can keep a better effect in a subsequent micro target detection network.

The present application has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to limit the application. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the presently disclosed embodiments and implementations thereof without departing from the spirit and scope of the present disclosure, and these fall within the scope of the present disclosure. The protection scope of this application is subject to the appended claims.

Claims

1. The utility model provides an ultra lightweight picture defogging and discernment network model which characterized in that includes: the system comprises a bidirectional GAN network model and a target detection network model which are sequentially connected;

the target detection network model is a Yolo-Tiny-S network model which is subjected to pruning retraining; in the row pruning retraining process, original images in a target detection network model training set are trained for multiple times, before each training, the original images are subjected to down-sampling by preset times, after each training is finished, the scaling coefficients of the batch normalization layer are subjected to sequencing comparison, previous layers of convolution kernels corresponding to channels of which the scaling coefficients are smaller than a preset scaling threshold are removed, and pruning is realized;

the target detection network model comprises a direct feature processing module and a fusion feature processing module, the direct feature processing module comprises a front extraction unit, a middle extraction unit and a first rear extraction unit which are sequentially connected, a clearness image passes through the front extraction unit and is input into the target detection network model, the fusion feature processing module comprises a fusion feature splicing unit and a second rear extraction unit which are sequentially connected, and the output end of the front extraction unit and the output end of the middle extraction unit are connected to the input end of the fusion feature splicing unit.

2. The ultra-lightweight image defogging and identification network model according to claim 1,

the front extraction unit comprises a DBL-S combination subunit and a plurality of MDBL-S combination subunits which are sequentially connected;

the MDBL-S combined subunit consists of a maximum pooling layer and the DBL-S combined subunit which are connected in sequence.

3. The ultra-lightweight picture defogging and identification network model according to claim 1, wherein the bidirectional GAN network model comprises an input module, a generation module and a discrimination module;

the first generating unit comprises a first encoder, a shared potential space and a first decoder which are connected in sequence, and the second generating unit comprises a second encoder, the shared potential space and a second decoder which are connected in sequence;

4. The ultra-lightweight picture defogging and identification network model according to claim 3, wherein said first encoder and said second encoder respectively comprise a first convolution block, a second convolution block and a first coupling residual block which are connected in sequence;

the first decoder and the second decoder respectively comprise a second coupling residual block, a first step long convolution block and a second step long convolution block which are connected in sequence;

the output of the second convolution block is hop-connected to the input of the first step-size convolution block.

5. The ultra-lightweight picture defogging and identification network model according to claim 4, wherein said first coupling residual block and said second coupling residual block are respectively composed of a plurality of sub-residual block cascades;

6. The model of claim 3, wherein the first and second discriminators each comprise three layers of discriminators and one layer of granularity discriminators, each layer of discriminators comprising three convolutional layers and one activation function layer.

7. The ultra-lightweight picture defogging and identification network model according to claim 3, wherein said bidirectional GAN network model is optimized by the following loss function: and generating an antagonistic loss function, an MSE loss function and a total variation loss function.

8. A picture defogging and identification method is characterized by comprising the following steps:

9. The method according to claim 8, wherein the extracting depth information corresponding to different pixels in the picture to be defogged comprises:

10. A computer device, comprising:

a memory for storing a computer program;

processor for implementing the steps of the method for defogging and identification of an image according to any one of claims 8 or 9 when said computer program is executed.