CN111445418A - Image defogging method and device and computer equipment - Google Patents

Image defogging method and device and computer equipment Download PDF

Info

Publication number
CN111445418A
CN111445418A CN202010243548.4A CN202010243548A CN111445418A CN 111445418 A CN111445418 A CN 111445418A CN 202010243548 A CN202010243548 A CN 202010243548A CN 111445418 A CN111445418 A CN 111445418A
Authority
CN
China
Prior art keywords
image
defogged
defogging
network
channels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010243548.4A
Other languages
Chinese (zh)
Other versions
CN111445418B (en
Inventor
王钰桥
韩岩
谭松波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202010243548.4A priority Critical patent/CN111445418B/en
Publication of CN111445418A publication Critical patent/CN111445418A/en
Application granted granted Critical
Publication of CN111445418B publication Critical patent/CN111445418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

After an image with fog to be processed is obtained, the object edge characteristics of the image with fog to be processed are obtained, the object edge characteristics and the original channel of the image with fog to be processed are combined, the image with fog to be processed is obtained, and then the image with fog to be processed is input into a defogging model for processing, so that the defogging model can also take the object edge characteristics in the image with fog to be processed into consideration in the image defogging process, and the definition of the output defogged image is ensured; and because the original full-connection layer in the SE network structure in the model replaces the coiled layer, network parameters are compressed, the image defogging processing efficiency is improved, and more useful features for image defogging processing and more less useful features are improved and inhibited by calculating attention weights of multiple dimensions of an image to be defogged, so that the accuracy of model processing is improved, and the image defogging effect and the defogging processing efficiency are further improved.

Description

Image defogging method and device and computer equipment
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image defogging method and apparatus and a computer device.
Background
Haze is a common atmospheric phenomenon produced by small floating particles such as dust and smoke in the air. Under the applied scene such as video monitoring, remote sensing, autopilot, if be in and carry out image acquisition under the haze environment, can be because of absorption and scattering that these showy granules were set a light usually for the image of gathering is the fog image that the visibility is end, the color is dim and contrast is low, seriously influences the image processing effect, can lead to the demand that can't satisfy the applied scene even.
For this reason, in the prior art, an end-to-end network dehazeNeT or an end-to-end gating context aggregation network GCANeT is usually adopted to implement processing of a hazy image; the former network can estimate the medium transmittance of the foggy image and then recover a clear image through an atmospheric scattering model. The network predicts the residue between the target clean image (namely, the defogged image) and the fogging image by aggregating the context information and fusing the characteristics of different levels, specifically realizes the defogging treatment on the fogging image, and meets the defogging requirement of an application scene on the image.
However, the existing image defogging processing method has a limited defogging effect, and the image processing speed is slow due to the large parameter quantity of the training network, thereby reducing the image defogging processing efficiency.
Disclosure of Invention
In view of the above, in order to improve the image defogging processing efficiency and effect, the present application provides an image defogging processing method, including:
acquiring a to-be-processed foggy image;
acquiring the object edge characteristics of the to-be-processed foggy image;
taking the edge characteristics of the object as a new channel of the to-be-processed foggy image to obtain the to-be-defogged image with a specific number of channels;
inputting the image to be defogged into a defogging model for processing, and outputting a fog-free image corresponding to the image to be defogged;
the defogging model is obtained by training a fog image of a sample with a specific number of channels based on a machine learning network model, the machine learning network model is provided with an SE (selective emitter) layer network structure, and the SE layer network structure is a convolution layer, so that the calculation of the attention weight of each of multiple dimensions of the input image to be defogged is realized.
In some embodiments, the processing of the image to be dehazed by the dehazing model comprises:
and acquiring the channel weight of the image characteristics of each channel corresponding to the to-be-defogged image input into the SE layer network structure and the pixel weight of each pixel by using the convolution layer in the SE layer network structure.
In some embodiments, the obtaining, by using a convolutional layer in the SE layer network structure, a channel weight of an image feature of each channel corresponding to the image to be defogged input to the SE network and a pixel weight of each pixel includes:
performing convolution operation on the image characteristics of each channel corresponding to the acquired image to be defogged through an attention mechanism to obtain channel weights corresponding to the image characteristics of each channel, and performing weighting operation on the image characteristics of each channel by using the channel weights;
performing convolution operation on the pixel information of the acquired image to be defogged through an attention mechanism to obtain pixel weights of different pixels contained in the image to be defogged, and performing weighting operation on the pixels of the acquired image to be defogged by utilizing the pixel weights.
In some embodiments, the machine learning network model further has at least a plurality of combinations among a depth separable convolutional layer, a downsampling network, a residual network, the processing of the image to be dehazed by the dehazing model further includes a plurality of combinations of:
expanding the image characteristics of each channel corresponding to the input image to be defogged through the depth separable convolution layer, and outputting the corresponding image characteristics after the channels are expanded;
processing the image characteristics of each channel corresponding to the input image to be defogged by utilizing convolution kernels with different sizes contained in the downsampling network, and outputting the image characteristics corresponding to the channels with the same number;
and smoothing the input image characteristics by utilizing a plurality of smooth residual blocks contained in the residual network, wherein the expansion values of two void convolution layers contained in the smooth residual blocks are different.
In some embodiments, the processing, by using convolution kernels of different sizes included in the downsampling network, image features of channels corresponding to the input image to be defogged, and outputting image features corresponding to the same number of channels respectively includes:
inputting respective image features of a first number of channels into a first convolution kernel with a first size for operation, and outputting image features corresponding to a second number of channels respectively, wherein the second number is greater than the first number, and the first number is greater than the specific number;
inputting the respective image features of the second number of channels into a second convolution kernel with a second size for operation, and outputting image features corresponding to a third number of channels respectively, wherein the third number is equal to the second number;
and inputting the image features of the channels of the third number into a third convolution kernel with a second size for operation, and outputting the image features corresponding to the channels of a fourth number, wherein the fourth number is equal to the first number.
In some embodiments, the processing of the image to be dehazed by the dehazing model further comprises:
and carrying out cascade fusion processing on the image features output by the down-sampling network and the image features before being input into the down-sampling network so as to increase the number of the image features input into a network connected with the down-sampling network.
In some embodiments, the obtaining the object edge feature of the to-be-processed hazy image includes:
acquiring the image gradient change of each channel of the to-be-processed foggy image;
and obtaining the average image gradient change of the image gradient changes of a plurality of channels of the to-be-processed foggy image, and determining the object edge characteristics of the to-be-processed foggy image according to the average image gradient change.
The application also provides an image defogging processing device, the device includes:
the to-be-processed foggy image acquisition module is used for acquiring a to-be-processed foggy image;
the object edge characteristic acquisition module is used for acquiring the object edge characteristics of the to-be-processed foggy image;
the image to be defogged obtaining module is used for taking the edge characteristics of the object as a new channel of the image to be defogged to obtain the image to be defogged with a specific number of channels;
the defogging processing module is used for inputting the image to be defogged into a defogging model for processing and outputting a fog-free image corresponding to the image to be processed;
the defogging model is obtained by training a fog image of a sample with a specific number of channels based on a machine learning network model, the machine learning network model is provided with an SE (selective emitter) layer network structure, and the SE layer network structure is a convolution layer, so that the calculation of the attention weight of each of multiple dimensions of the input image to be defogged is realized.
In some embodiments, the defogging processing module includes:
and the attention weight acquiring unit is used for acquiring the channel weight of the image characteristic of each channel corresponding to the image to be defogged which is input into the SE network and the pixel weight of each pixel by utilizing the convolution layer in the SE layer network structure.
The present application further proposes a computer device, the computer device comprising:
a memory for storing a program for implementing the image defogging processing method as described above;
a processor for loading and executing the program stored in the memory to realize the steps of the image defogging processing method.
Therefore, compared with the prior art, the method, the device and the computer equipment for defogging the image are provided, after the to-be-processed foggy image is obtained, the object edge characteristics of the to-be-processed foggy image are obtained, the object edge characteristics are combined with the original channel of the to-be-processed foggy image, and the to-be-defogged image is input into the defogging model for processing, so that the defogging model can also take the object edge characteristics in the to-be-defogged image into consideration in the image defogging process, and the definition of the output defogged image is ensured; and because the original full-connection layer in the SE network structure in the model replaces the coiled layer, network parameters are compressed, the image defogging processing efficiency is improved, and more useful features for image defogging processing and more less useful features are improved and inhibited by calculating attention weights of multiple dimensions of an image to be defogged, so that the accuracy of model processing is improved, and the image defogging effect and the defogging processing efficiency are further improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of the image defogging method proposed in the present application;
fig. 2 is a schematic diagram illustrating a hardware structure of a computer device according to an embodiment of the present application
Fig. 3 is a schematic flow chart showing an alternative example of the image defogging processing method proposed by the present application;
fig. 4 is a schematic flow chart showing still another alternative example of the image defogging processing method proposed by the present application;
FIG. 5 is a schematic diagram showing an alternative structure of an SE layer network structure in a defogging model suitable for the image defogging processing method provided by the application;
fig. 6 is a schematic flowchart showing still another alternative example of the image defogging processing method proposed by the present application;
FIG. 7 is a schematic diagram showing an alternative structure of a depth separable convolution layer in a defogging model suitable for use in the image defogging method proposed in the present application;
FIG. 8a shows a schematic of a prior art downsampling network;
FIG. 8b is a schematic diagram showing an alternative structure of a down-sampling network in a defogging model suitable for the image defogging method proposed in the present application;
FIG. 9 is a schematic diagram showing an alternative structure of a smooth residual block in a residual network in a defogging model suitable for the image defogging method proposed in the present application;
FIG. 10 is a schematic diagram showing an alternative network structure of an SE fusion layer in a defogging model suitable for the image defogging processing method provided by the application;
FIG. 11 is a schematic diagram of an alternative network structure of a defogging model suitable for use in the image defogging method proposed in the present application;
fig. 12 is a schematic structural view showing an alternative example of the image defogging processing device proposed by the present application;
fig. 13 is a schematic structural view showing still another alternative example of the image defogging processing device proposed by the present application;
fig. 14 is a schematic structural diagram showing still another alternative example of the image defogging processing device proposed by the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be understood that "system", "apparatus", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.
As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements. An element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present application, "a plurality" means two or more than two. The terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.
Additionally, flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.
In view of the problems in the background art, it is desirable to improve a network structure of the defogging process to improve a defogging effect and a processing efficiency of the fogging image, so that, as shown in a scene diagram of fig. 1, the fogging image can be directly output to a desired defogging model obtained by pre-training.
Specifically, the method and the device replace the full connection layer in the SE (Squeeze-and-Excitation) layer network structure with the convolutional layer to form an initial network structure for model training, namely a machine learning network model, so as to compress the network parameters of the whole model, and further improve the model training efficiency.
Moreover, aiming at the technical characteristics that the attention weight of the channel characteristic (single dimensionality) of the foggy image is calculated by utilizing the full connecting layer in the existing SE layer network structure, the calculation of the attention weight of multiple dimensionalities of the foggy image is realized by utilizing the convolution layer in the SE layer network structure, then, the defogging processing is carried out by utilizing the multiple dimensionality characteristics of the foggy image, so that the fogless image can be more accurately obtained, and the defogging effect is improved.
The SE layer network structure is a brand-new image identification structure, and the SE layer network structure strengthens important features to improve the accuracy rate by modeling the correlation among feature channels (one dimension feature), and solves the problem of loss caused by different occupied importance of different channels of feature images in the convolution pooling process. In combination with the above analysis, the present application specifically models the correlation among multiple dimensional specificities, further improves the accuracy of model processing, and does not detail the present application with respect to the operational principle of the SE layer network structure and the specific network structure.
In order to further improve the model training efficiency and the processing accuracy, the modeling can be realized according to a fusion network formed by a plurality of network structures, and the specific realization process can refer to the description of the corresponding part of the following embodiment, which is not described in detail herein.
Referring to fig. 2, a schematic diagram of a hardware structure of a computer device provided in the embodiment of the present application is a hardware structure of a computer device, and in practical applications, the computer device may include, but is not limited to, a server, or a server set composed of a plurality of servers, or a terminal device having certain data processing capability, especially image processing capability, and the terminal device may include, but is not limited to: smart phones, tablet computers, wearable devices, ultra-mobile personal computers (UMPC), netbooks, desktop computers, and the like, the application does not limit the product type of the computer device. And the computer device shown in fig. 2 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present application.
As shown in fig. 2, the computer device proposed by the present embodiment may include a memory 11 and a processor 12, wherein:
the number of the memory 11 and the processor 12 may be at least one, and both may be connected to a communication bus to realize data interaction therebetween, and the specific implementation process is not described in detail herein.
The memory 11 may be configured to store a program for implementing the image defogging method provided in the present application, and the processor 12 may load and execute the program stored in the memory 11 to implement the steps of the image defogging method provided in any alternative embodiment of the present application, where specific implementation processes may refer to descriptions of corresponding parts of corresponding embodiments below.
In the embodiment of the present application, the memory 11 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device or other volatile solid-state storage devices. The processor 12 may be a Central Processing Unit (CPU), an application-specific integrated circuit (ASIC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic device.
In one possible implementation, the memory 11 may include a program storage area and a data storage area, and the program storage area may store an operating system, and application programs required for at least one function (such as an image display function, an image defogging processing function), a program for implementing the image defogging processing method proposed in the present application, and the like; the data storage area can store data generated in the using process of the electronic equipment, such as acquired fog images to be processed, edge characteristics of objects obtained by processing, fog-free images after defogging processing and the like.
It should be understood that the structure of the computer device shown in fig. 2 does not constitute a limitation to the computer device in the embodiment of the present application, and in practical applications, the computer device may include more or less components than those shown in fig. 2, or some components may be combined, for example, the computer device is a terminal device, which may further include: at least one input device such as a keyboard, a mouse, a camera, a sound pickup, etc.; at least one output device such as a display, a speaker vibrating mechanism, a light, and the like, and various communication interfaces, and the like, which are not listed herein.
Referring to fig. 3, a schematic flow chart showing an optional example of the image defogging method provided by the present application, which may be applied to a computer device, where the present application does not limit the product type and the composition structure of the computer device, and may refer to but is not limited to the computer device described in the above embodiments, as shown in fig. 3, the image defogging method provided by the present embodiment may include:
step S11, obtaining a fog image to be processed;
in practical application, when the fogging image needs to be subjected to defogging processing, the embodiment can perform preprocessing on the fogging image, for example, format conversion, size clipping and other operations can be performed on the fogging image, so as to obtain the to-be-processed fogging image meeting the model input requirement.
In some embodiments, the computer device may convert the captured fogging image into an RGB format image, and then randomly crop the specific size of the image to be processed, such as 240 × 240 (which may be in pixels) of the RGB format image to be processed, where the image to be processed has three channels, i.e., R, G and B.
It should be noted that, the implementation process of how to obtain the to-be-processed foggy image after the acquired foggy image is preprocessed is not limited in the present application, and is not limited to the preprocessing manner described above. In addition, the processing procedure for performing the above preprocessing on the foggy image to obtain the to-be-processed foggy image may be implemented by a computer device, or may be executed by another device, and then the obtained to-be-processed foggy image is sent to the computer device.
Step S12, acquiring the object edge characteristics of the fog image to be processed;
in this embodiment, the object edge feature may indicate an outline of each object included in the to-be-processed foggy image, so as to improve a technical problem of image blur after the image defogging processing, that is, to ensure the definition of the image after the defogging processing, and the specific implementation manner of step S12 is not limited in this application.
In some embodiments, since the image blur is caused by the fact that the contour of an object (such as an object, a person, and the like) in the image is not obvious, the gray scale of the edge of the contour is not changed strongly, and the layering sense is not strong, the application may use the image gradient to measure the change rate of the gray scale of the image, so as to determine the contour characteristics between the objects. Specifically, an image gradient algorithm can be adopted to obtain respective image gradient changes of each channel (such as an R channel, a G channel, and a B channel) in the to-be-processed foggy image, and then the object edge features of the to-be-processed foggy image are determined according to the image gradient changes, the specific calculation process is not described in detail in the present application,
step S13, taking the edge characteristics of the object as a new channel of the to-be-processed foggy image to obtain the to-be-defogged image with a specific number of channels;
in combination with the above analysis, the number of the characteristic channels of the to-be-processed fogging image may be determined according to the image format thereof, for example, for the to-be-processed fogging image in RGB format, the to-be-processed fogging image may have three channels, i.e., an R channel, a G channel, and a B channel, the pixel value of each pixel point in the to-be-processed image may be determined comprehensively according to the values of the three channels, and the specific obtaining process is not described in detail.
In order to improve the reliability and accuracy of the image defogging process and thus improve the definition of the obtained fog-free image, the embodiment may determine the object edge feature of the image to be defogged as a one-dimensional feature by combining the three channels (i.e., three-dimensional features) to be the multi-dimensional image feature of the image to be defogged, so that the specific number of channels may be 4 channels.
Of course, the image defogging processing can be realized by combining with other dimensional features of the image to be processed according to the needs, and at this time, specific numerical values of the specific number are also increased correspondingly, the specific numerical values of the specific number are not limited in the present application, and can be determined according to the actual requirements of the image processing, and the embodiment of the present application only takes 4 dimensional features, i.e., 4 channels, as an example for explanation.
It can be seen that the image to be defogged obtained in this embodiment may be 240 × 4 image, that is, 240 × 240 size image to be defogged having 4 channels, that is, 4 dimensional features, but is not limited to the size of the image and the number of feature dimensions.
And step S14, inputting the image to be defogged into the defogging model for processing, and outputting a fog-free image corresponding to the image to be processed.
It should be noted that, in combination with the above analysis, the defogging model may be obtained by training a sample foggy image with a specific number of channels (e.g., 4 channels, etc.) based on a machine learning network model, and the obtaining process of the sample foggy image may refer to the obtaining process of the image to be defogged, which is not described in detail in this embodiment. And the embodiment is not detailed for the specific training process of the defogging model, and reference may be made to, but not limited to, the following description of the corresponding embodiment.
The SE layer network structure utilizes a convolution layer to realize the calculation of the attention weight of each of multiple dimensions of an input image to be defogged, such as dimensions of a channel, a pixel and the like.
Based on this, after the obtained image to be defogged is input into the defogging model, the processing process of the image to be defogged in the defogging model may include: the method comprises the steps of obtaining channel weights of image features of channels corresponding to an image to be defogged and input into an SE layer network structure and pixel weights of pixels by utilizing a convolution layer in the SE layer network structure, wherein the specific calculation process is not detailed in the application. Therefore, when the defogging model processes the channel dimension characteristics and the pixel dimension characteristics, the corresponding weight weighting operation is carried out to reflect the correlation between each image characteristic and the defogging processing, and the defogging processing is further realized according to the correlation, so that the definition and the reliability of the obtained defogged image are ensured.
In summary, after the to-be-processed foggy image is obtained, the to-be-processed foggy image is not directly input into the defogging model for defogging, but the object edge feature of the to-be-processed foggy image is extracted, and is used as a new feature channel, and the to-be-processed foggy image is generated by combining with the original feature channel of the to-be-processed foggy image, and then the to-be-defogged image is input into the defogging model for processing, so that the object edge feature in the to-be-defogged image can be considered in the image defogging process of the defogging model, and the definition of the output defogged image is ensured; meanwhile, the original full-connection layer in the SE network structure in the model replaces the coiling layer, network parameters are compressed, the image defogging processing efficiency is improved, and the attention weights of multiple dimensions of the image to be defogged are calculated and are not limited in the channel dimensions, so that the defogging model can improve the characteristics useful for defogging the image and inhibit the characteristics with low use according to the respective attention weights of the important characteristics of the multiple dimensions of the image to be defogged, the accuracy of model processing is improved, and the defogging effect and the defogging processing efficiency of the image with fog are improved.
Referring to fig. 4, a schematic flow chart illustrating yet another optional example of the image defogging method provided by the present application is shown, and the present embodiment may be an optional detailed implementation manner of the image defogging method provided by the foregoing embodiment, and as shown in fig. 4, the method may include:
step S21, obtaining a fog image to be processed;
regarding the implementation process of step S21, reference may be made to the description of the corresponding part of step S11 in the above embodiment, and this embodiment is not described again.
Step S22, acquiring the image gradient change of each channel of the fog image to be processed;
the method for acquiring the image gradient change is not limited, and the fog image to be processed can be processed according to an image gradient algorithm to obtain the image gradient change of each channel. In this case, when the image gradient value is calculated, the color value of the corresponding channel of the image to be processed is calculated instead of the gray value, so as to ensure that the image gradient changes corresponding to the three channels of the image to be processed R, G, B are obtained, and the specific calculation process is not described in detail in this application.
Step S23, obtaining average image gradient change of a plurality of channels of the fog image to be processed, and determining the object edge characteristic of the fog image to be processed according to the average image gradient change;
because the pixel points of the pixel points in the image to be processed are determined by the color values of the three channels of RGB, after the gradient change of the image of the three channels is determined, the gradient change of the pixel values of the pixel points of the entire image to be processed with fog can be represented by averaging the gradient change of the pixel values of the pixel points, and the edge characteristic of the object of the image to be processed with fog is determined according to the gradient change of the pixel values of the pixel points of the image to be processed with fog.
It should be noted that steps S22 and S23 are only one implementation of implementing S12, but are not limited to the implementation described in this embodiment,
step S24, taking the edge characteristics of the object as a new channel of the to-be-processed foggy image to obtain the to-be-defogged image with a specific number of channels;
according to the method and the device, the object edge characteristics are added on the basis of the original color characteristics of the to-be-processed foggy image, so that the color characteristics of the image and the object edge characteristics can be comprehensively considered in the subsequent image defogging process, and the definition of the obtained defogged image is ensured.
Step S25, inputting the image to be defogged into a defogging model, and performing convolution operation on the image characteristics of each channel corresponding to the acquired image to be defogged through an attention mechanism to obtain the channel weight corresponding to the image characteristics of each channel;
step S26, performing convolution operation on the acquired pixel information of the image to be defogged through an attention mechanism to obtain pixel weights of different pixels contained in the image to be defogged;
in connection with the description of the defogging model in the above embodiment, the calculation of the attention weight in steps S25 and S26 is obtained by the SE layer network structure calculation, it should be understood that, the SE-layer network structure may not be the only structure of the defogging model, i.e., the network structure of the defogging model may include other network layers or network structures, such as a convolutional layer, a pooling layer, etc., therefore, the calculation of the attention weight is only one step in the defogging process of the input image to be defogged by the defogging model, and the execution sequence of the step is not limited, the operation process of the improved SE layer network structure adopted by the defogging model is only described here, and other defogging processing steps of the defogging model are not described in detail one by one.
Based on the above analysis, the calculation process of the channel weight and the pixel weight described in the above step is only one processing step in the image defogging process performed by the defogging model, and the processing step may be located at a start processing position, an intermediate processing position, a last processing position, or the like of the entire image defogging process, so that the image to be defogged, which is subjected to the convolution operation, may be the image to be defogged of the input SE layer network structure, which may specifically be the image to be defogged of the input defogging model, or may be the image to be defogged output after the image to be defogged of the input defogging model is processed by other network layers or network structures of the defogging model, which may be determined according to the specific network structure of the defogging model, and the present application does not limit the direct source of the image to be defogged obtained in steps S25 and S26.
In combination with the above operation principle on the SE layer network structure, referring to the SE layer network structure schematic diagram shown in fig. 5, the SE layer network structure provided in this embodiment needs to implement attention weight calculation from different dimensions, which is described here by taking attention weight calculation of two dimensions of a channel and a pixel as an example, and the attention weight calculation processes for other dimensions are similar, and detailed description is not given in this application.
As shown in fig. 5, in the SE layer network structure, the meaning of the sequence numbers representing the network layers may specifically be that the network layer corresponding to sequence number ① may represent an average pooling layer, the network layer corresponding to sequence number ② may represent a convolutional layer, and the network layer corresponding to sequence number ③ may represent a relu (Rectified L input function) layer, that is, a linear active layer, and the network layer corresponding to sequence number ④ may represent a tanh layer, that is, a non-linear active layer.
In addition, in the SE layer network structure, the pooling layer may reduce the size of the model, increase the computation speed, and improve the robustness of the extracted features at the same time, in this embodiment, an average pooling manner is adopted to implement the feature extraction processing, but the present invention is not limited to this pooling manner, and the operation of the active layer in the SE layer network structure may be determined according to the selected active function, and is not limited to the active function shown in fig. 5. The present application does not describe the operation process of each network layer constituting the SE layer network structure in detail.
With reference to the SE layer network structure shown in fig. 5, after the image to be defogged input into the SE layer network structure is processed by other network layers of the defogging model, more channels than a certain number may be obtained, for example, 4 channels of the defogging model are initially input, and after the image to be defogged of 64 channels is obtained by processing, the image to be defogged of the 64 channels is input into the SE layer network structure, then, the respective channel weights of the 64 channels may be calculated by the left network structure shown in fig. 5, that is, the correlation between the image features of the 64 channels and the image defogging task is determined, so as to achieve the purposes of improving the image features with high correlation and suppressing the image features with low correlation.
For example, if the left side network of the SE layer network structure is a 240 × 64 image to be defogged, step S25 may perform a convolution operation on the image features of 64 channels, for example, by using a 1 × 1 convolution kernel, and a channel weight of each channel is obtained, that is, 64 channel weight values are output by the left side network, and the weight value may represent the image feature of the corresponding channel, and the size of the correlation of the defogging process on the image.
Similarly, in the process of calculating the pixel weight by using the right network structure shown in fig. 5, the attention mechanism is also used to obtain the correlation size of each pixel of the image to be defogged, which is input into the right network, to the defogging process, that is, the pixel weight of each pixel is obtained, and the correlation size of the corresponding pixel to the defogging process is indicated by the pixel weight size.
And step S27, performing weighted operation on the image characteristics of each channel by using the channel weight, performing weighted operation on the pixels of the acquired image to be defogged by using the pixel weight, and performing defogging processing according to the obtained weighted operation result to obtain a fog-free image.
After the above analysis, referring to the SE layer network structure shown in fig. 5, after the channel weights of the channels output by the left side network, the channels may be multiplied by the channels of the image to be defogged input to the left side network, and then summed, that is, a weighting operation is performed, so as to enhance the image characteristics of at least one channel, which is useful for defogging processing, in the obtained image to be defogged, and further improve the image defogging processing effect.
After the image features of the useful channels in the image to be defogged are enhanced by the network on the left side of fig. 5, the obtained image to be defogged is input into the network on the right side shown in fig. 5, and the pixel features useful for the image defogging treatment in the image to be defogged are continuously processed, so that the pixel features useless for the image defogging treatment are inhibited, the image defogging treatment effect is further improved, the definition of the processed image is improved, and the definition of the image without fog is further improved.
As described above, since the SE layer network structure proposed in the present application is a part of the defogging model, the weighting calculation in step S27 is also a processing step in the defogging process performed by the defogging model, and the image features useful for the defogging process are enhanced and the image features not useful for the defogging process are suppressed in the image output after the processing, and thereafter, the other network layers in the defogging model continue the defogging process on the image to obtain a desired defogged image. The subsequent defogging processing process of the image output by the SE layer network structure is not detailed in the application, and can be determined according to the specific network structure of the defogging model.
In summary, in the embodiment, for the image to be defogged, which is input to the defogging model, in addition to the image features of the RGB three channels initially provided, the object edge features are also extracted as the image features of the fourth channel, so that the image features of the input defogging model at least include the image features of the four channels.
In addition, in the process of inputting the image to be defogged into the defogging model for defogging, the channel weights of all the channels and the pixel weights of all the pixels of the image are respectively obtained by adopting a convolution operation mode through an attention mechanism, and compared with the operation mode of a full connection layer adopted in the prior art, the channel weights of all the channels are only obtained, so that not only are network parameters compressed, but also the processing efficiency is improved.
In some embodiments, there may be at least a plurality of combinations of the depth separable convolutional layer, the down-sampling network, and the residual error network for the above defogging model of the present application, and thus, as shown in fig. 6, the processing of the image to be defogged by the defogging model may further include a plurality of combinations of the following steps:
step S31, expanding the image characteristics of each channel corresponding to the input image to be defogged through the depth separable convolution layer, and outputting the corresponding image characteristics after the channels are expanded;
for the convolutional layer in the defogging model of the present application, it may include a partial normal convolutional layer and a partial depth separable convolution (Depthwise separable convolution) layer, which is one of separable convolutions, and is different from the spatial separable convolution, in the deep learning, the depth separable convolution may perform one spatial convolution while keeping the channels independent, and then perform the depth convolution operation.
Referring to the different convolution layer structure comparison diagram shown in fig. 7, a depth separable convolution (such as the convolution structure shown in the second row diagram of fig. 7) can decompose a conventional convolution (such as the convolution structure shown in the first row diagram of fig. 7) into a convolution with a depth convolution (1 × 1) + convolution kernel (such as the convolution structure shown in the third row diagram of fig. 7). Based on this, the image to be defogged input into the defogging model may be input into the depth separable convolution layer for processing, and based on the principle of convolution operation, the convolution operation may increase the number of channels of the image to be defogged, for example, the image to be defogged input 240 × 4 may be processed by the depth separable convolution layer to obtain the image to be defogged 240 × 64, that is, the input channel 4 is expanded, and the output channel is 64, but is not limited thereto. The specific implementation of step S31 may be determined according to the specific operation principle of the depth separable convolution, and this embodiment of the present application is not described in detail.
As described above for the convolution layer in the defogging model, in the convolution operation process of the input image to be defogged, in addition to the convolution operation of the input image to be defogged by using the depth separable convolution layer, the convolution operation of other types of convolution layers on the image to be defogged input into the convolution layer may be included, such as the operation of the above listed ordinary convolution layer, the operation of the void convolution layer, and the like.
In the method, hole Convolution (scaled Convolution) is to inject holes into a standard Convolution map to increase the repetition field, and compared with ordinary Convolution, hole Convolution introduces a new parameter called "expansion rate" into the Convolution layer, and the parameter defines the distance between values when Convolution kernels process data. The present application does not detail the convolution operation process of the hole convolution on the image.
By combining the analysis, compared with the traditional defogging model which adopts a common convolution layer, the convolution layer with separable depth is added in the embodiment of the application, so that the network parameters can be compressed on the premise of equivalent training, the model training efficiency is improved, and meanwhile, the image defogging processing efficiency is also improved.
Step S32, utilizing convolution kernels with different sizes contained in the downsampling network to process the image characteristics of each channel corresponding to the input image to be defogged, and outputting the image characteristics corresponding to the channels with the same number;
in a deep learning network, a sampling layer is realized by using a pooling (Pooling) related technology, and aims to reduce the dimensionality of features and retain effective information, avoid overfitting to a certain extent, and simultaneously have the purposes of keeping rotation, translation, stretching and deformation and the like. Common sampling modes include maximum value sampling, average value sampling, summation area sampling, random area sampling and the like; common pooling patterns are similar, such as maximum pooling, average pooling, random pooling, and the like.
In this embodiment, the down-sampling network may perform dimension reduction on the pixel characteristics of the input image to be defogged, for example, after performing down-sampling on 240 × 240 (the unit of which may be a pixel) image to be defogged, 120 × 120 (the unit of which may be a pixel) image to be defogged is obtained, and the application is not limited with respect to the specific sampling mode adopted by the down-sampling network.
Compared with the traditional downsampling network structure, namely, dimension reduction processing is performed on an input image by using convolution kernels with the same size, the downsampling network structure is formed by adopting a plurality of convolution kernels with different sizes, dimension reduction processing of pixel dimensions of each channel of the input image to be defogged is achieved, and the image of the output channel with the same number as the input channel is obtained.
In a possible implementation manner, for the conventional downsampling network structure shown in fig. 8a, that is, the downsampling network structure in the conventional defogging model is generally formed by using a 3 × 3 convolution kernel and a general convolution layer with a step size of 2, the number of input channels is 64, and the number of output channels is 64 or 128, fig. 8a is described only by taking as an example a structure in which the numbers of input channels and output channels are both 64, the number of input channels may be obtained by expanding the convolution layer, and a specific implementation process is not described in detail.
According to researches, the image output by the traditional downsampling structure has certain distortion, the accuracy and the fidelity of the output image are reduced, and in order to solve the problems, the method adopts convolution kernels with different sizes and continuously changes the number of channels of the image so as to keep more characteristics in the image and improve the fidelity of the image. Referring to fig. 8b, which is a schematic diagram of a down-sampling network structure suitable for the image defogging process of the present application, after an image to be defogged with 64 channels (i.e., the number of input feature maps) is input into the down-sampling network, the output channel is still 64 through the operation of the convolution layer with convolution kernel 3 × 3, then the convolution layer with convolution kernel 1 × 1 is input for processing, the input channel 64 is expanded to 192, then the output channel is still 192 through the operation of the convolution layer with convolution kernel 3 × 3, and finally the number of channels is reduced from 192 to 64 through the convolution layer with convolution kernel 3 × 3, which is the same as the number of channels input into the down-sampling network.
Therefore, in the embodiment, the downsampling network is constructed by adopting convolution kernels with different sizes, the characteristic channel processing of the images is different by the convolution kernels, even if the convolution kernels with the same size are used, the processing of the input channels of the images is different, namely the number of the output channels is different, and through the processing mode of the variable channels, the characteristics in the images to be defogged can be extracted more comprehensively, the distortion of the obtained images is reduced, and the definition of the output images of the model is improved.
Based on the above analysis, the specific implementation process of step S32 may include:
the image features of each of the first number of channels are input into the first convolution kernel with the first size for operation, and the image features corresponding to the second number of channels are output, where the second number is greater than the first number, and the first number is greater than the specific number, as shown in the above example, the first number may be 64, and the first convolution kernel with the first size may be the second convolution kernel on the left side in fig. 8b, that is, 1 × 1 convolution kernel, in this case, the second number may be 192, and the specific number may be 4, and so on.
And inputting the image characteristics of the second quantity of channels into a second convolution kernel with a second size for operation, and outputting the image characteristics corresponding to a third quantity of channels respectively, wherein the third quantity is equal to the second quantity. As an example, the second convolution kernel of the second size may be the third convolution kernel on the left side of fig. 8b, and the second number and the third number may be 192, but are not limited thereto.
And inputting the image characteristics of the channels of the third quantity into a third convolution kernel with a second size for operation, and outputting the image characteristics corresponding to the channels of the fourth quantity respectively, wherein the fourth quantity is equal to the first quantity. As an example, the second-sized third convolution kernel of this step may be the fourth convolution kernel on the left side of fig. 8b, and the third number may be 192 and the fourth number may be 64, but is not limited thereto.
Before inputting the image features into the first convolution kernel with the first size, as shown in fig. 8b, the image features of the fifth number of channels are input into the fourth convolution kernel with the second size for operation, and the image features corresponding to the sixth number of channels are output, at this time, the fifth number and the sixth number are the same as the first number, as shown in fig. 8, all of them may be 64, and in this example, the fourth convolution kernel with the second size may be the first convolution kernel on the left side of fig. 8b, that is, 3 × 3 convolution kernel.
The structure of the downsampling network proposed in the present application is not limited to the network structure shown in fig. 8b, and the sizes of the various convolution kernels included in the downsampling network and the values of the input channel and the output channel of each convolution layer are not limited, and can be flexibly adjusted according to actual requirements.
Step S33 is to smooth the input image feature using a plurality of smooth residual blocks included in the residual network, the smooth residual blocks including two void convolution layers having different expansion values.
In the convolutional neural network, the residual network has the characteristic of easy optimization, and the accuracy can be improved by increasing the equivalent depth. The method can be composed of a plurality of residual blocks, the residual blocks are connected in a jumping mode, the problem of gradient disappearance caused by depth increase in a deep neural network is solved, and the specific operation principle of the residual network is not detailed in the application.
In this embodiment, the residual network in the defogging model of the present application may be formed by 6 smooth residual blocks, the structure of each smooth residual block may refer to the structure shown in fig. 9, and the meaning of the network layer above the serial number corresponding to the serial number may be that the network layer above the serial number ⑤ may represent a packet convolutional layer, the network layer above the serial number ⑥ may represent a void convolutional layer, the network layer above the serial number ⑦ may represent an example regularization layer, and the network layer above the serial number ⑧ may represent an SE layer (that is, the SE layer network structure, and the specific composition structure may refer to the description of the corresponding part of the above embodiment).
Based on the structure of the smooth residual block shown in fig. 9, each smooth residual block in the residual network of the present application may be processed by two hole convolution layers, so as to increase the perception field in the image feature acquisition and compensate for the problem of reduced perception field caused by removing the downsampling, and the two hole convolution layers of each smooth residual block have different expansion values, and may be designed by using staggered expansion values, for example, the expansion values of the two hole convolution layers of 6 smooth residual blocks may be (2,3), (3,4), (4,5), and the like in sequence, but are not limited to these values.
It can be seen that, compared to a network structure formed by a plurality of smooth residual blocks with gradually increasing expansion values in a conventional residual error network, such as (2,2), (4,4), and the like, in the present embodiment, two void convolution layers in each smooth residual block are designed by adopting the expansion as described above, so that the output image features are smoother, the grid effect is reduced, the definition of the obtained fog-free image is improved, and a detailed implementation process of the residual error operation is not described in detail in the present application.
In some embodiments, the defogging model of the present application may include other network layers or network structures, such as an SE fusion layer, a deconvolution layer, etc., in addition to the network layers or network structures described above, according to actual needs. It should be understood that, for the network structure of the defogging model formed by the plurality of network layers or network structures listed above, the network structure of the defogging model is formed by fusing the plurality of network layers and/or network structures to form the network structure of the defogging model, instead of being formed by sequentially splicing the plurality of network layers or network structures in series, and the application does not limit the specific implementation of the fusion process and the specific structure of the defogging model after the fusion.
The network structure of the SE fusion layer may be as shown in fig. 10, each sequence number in fig. 10 may be a network layer corresponding to the lower part, and the content of each specific representation of the sequence number may be described with reference to the content of the corresponding sequence number, and as shown in fig. 10, three network structures parallel to the right side are the same and the same as the SE layer network structure, and the two conventional full connection layers are replaced by two convolution layers, so that network parameters are compressed, and processing efficiency is improved.
The Deconvolution layer may be referred to as a transformed Convolution, and its operation principle is similar to the Convolution layer operation principle, but the execution direction is opposite, that is, the forward propagation process of the Convolution layer is the backward propagation process of the Deconvolution layer, and the backward propagation process of the Convolution layer is the forward propagation process of the Deconvolution layer.
In summary of the description of the foregoing embodiments on the network structure of the defogging model, the present application may obtain the network structure of the defogging model as shown in fig. 11, but is not limited to the network structure shown in fig. 11, as shown in fig. 11, the network layer corresponding to the lower side of the serial number ⑨ may represent a depth separable convolutional layer, and the specific network structure and the image processing process thereof may refer to the description of the corresponding parts of the foregoing embodiments, as an example, the network layer output image may have a channel number of 64, as shown in fig. 11, the network layer corresponding to the lower side of the serial number ⑩ may represent any smooth residual block in the residual network, as shown in fig. 11, the residual network may include 6 smooth residual blocks, and the input and output channels of each smooth residual block may be 64, that is, the dimension of the input and output image features may be 64, and the serial number may be 64
Figure BDA0002433345230000191
The network layer corresponding to the lower part can represent an SE fusion layer, and the network structure thereof can refer to the description and serial numbers of the corresponding parts of the above embodiments
Figure BDA0002433345230000192
The network layer corresponding to the lower part may represent a deconvolution layer, and as to the content represented by the network layer corresponding to the lower part of other serial numbers in the network structure of the defogging model shown in fig. 11, reference may be made to the description of the other serial numbers above, which is not repeated in this application.
As shown in fig. 11, still taking the example of inputting an image to be defogged with a feature size of 240 × 240 as an example, after the down-sampling process, the image to be defogged with a feature size of 120 × 120 is obtained, and then, after the up-sampling process, the image to be defogged with a feature size of 240 × 240 is restored, so that a final image without fog with a feature size of 240 × 240 is output. For the specific processing process of the defogging model on the input image to be defogged, the network layers from left to right in fig. 11 can be used for processing in sequence, and the specific processing process can be determined by combining the operation principles of the corresponding network layers, which is not described in detail in the application.
With reference to the network structure shown in fig. 11, in the process of defogging the image to be defogged by the defogging model, the image features output by the down-sampling network and the image features before being input into the down-sampling network can be cascaded and fused to increase the number of the image features input into the network connected with the down-sampling network, that is, the original size features of the input image to be defogged can be added into the image features after being down-sampled by adopting a jump connection mode and then be continuously processed by other subsequent network layers, so that the number of the features extracted in the original size image dimension is enriched, and the definition of the obtained defogged image is improved. The hop location can be determined by referring to fig. 11, and detailed description is not given in this application.
It should be noted that the network structure in the defogging model provided in the present application is not limited to the network structure shown in fig. 11, and may be appropriately adjusted according to actual needs, and the details of the present application are not described in detail.
Based on the description of the network structure of the defogging model, the machine learning network model of the corresponding network structure can be constructed in advance, the defogging model can be obtained by continuously training the fogging image of the sample, and the operation of the corresponding network layer in the training process can refer to the description of the corresponding part of the embodiment. In general, the application trains 8 sample fogging images at a time, for example, 8 × 240 × 4 sample fogging images are input to the constructed machine learning network model at a time, that is, 8 sample fogging images are input, the feature size of each sample fogging image may be 240 × 240, and the number of channels is 4, but is not limited to this parameter, and the number of sample fogging images trained at a time and the training parameters thereof may be selected according to actual conditions. The application does not detail the specific training process of the defogging model.
Referring to fig. 12, a schematic structural diagram of an alternative example of the image defogging processing device provided by the present application, which may be applied to a computer device, as shown in fig. 12, may include:
a to-be-processed foggy image acquisition module 21, configured to acquire a to-be-processed foggy image;
an object edge feature obtaining module 22, configured to obtain an object edge feature of the to-be-processed foggy image;
in some embodiments, the object edge feature obtaining module 22 may include:
the image gradient change unit is used for acquiring the image gradient change of each channel of the to-be-processed foggy image;
and the object edge feature acquisition unit is used for acquiring the average image gradient change of the image gradient changes of a plurality of channels of the to-be-processed foggy image, and determining the object edge feature of the to-be-processed foggy image according to the average image gradient change.
A to-be-defogged image obtaining module 23, configured to use the object edge feature as a new channel of the to-be-processed foggy image to obtain to-be-defogged images with a specific number of channels;
the defogging processing module 24 is configured to input the image to be defogged into a defogging model for processing, and output a non-fog image corresponding to the image to be processed;
the defogging model is obtained by training a fog image of a sample with a specific number of channels based on a machine learning network model, the machine learning network model is provided with an SE (selective emitter) layer network structure, and the SE layer network structure is a convolution layer, so that the calculation of the attention weight of each of multiple dimensions of the input image to be defogged is realized.
In some embodiments, the defogging processing module 24 may include:
and the attention weight acquiring unit is used for acquiring the channel weight of the image characteristic of each channel corresponding to the image to be defogged of the SE layer network structure and the pixel weight of each pixel by utilizing the convolution layer in the SE layer network structure.
In one possible implementation manner, as shown in fig. 13, the attention weight obtaining unit may include:
the channel weight acquiring unit 241 is configured to perform convolution operation on the image features of each channel corresponding to the acquired image to be defogged through an attention mechanism to obtain a channel weight corresponding to the image feature of each channel, so as to perform weighting operation on the image features of each channel by using the channel weight;
the pixel weight obtaining unit 242 is configured to perform convolution operation on the obtained pixel information of the image to be defogged through an attention mechanism to obtain pixel weights of different pixels included in the image to be defogged, so as to perform weighting operation on the obtained pixel of the image to be defogged by using the pixel weights.
In some embodiments, if the machine learning network model further includes at least a plurality of combinations of a depth separable convolutional layer, a down-sampling network, and a residual error network, as shown in fig. 14, the defogging processing module 24 may further include:
a feature extension unit 243, configured to perform extension processing on the image features of each channel corresponding to the input image to be defogged through the depth separable convolution layer, and output the image features corresponding to the extended channels;
a downsampling processing unit 244, configured to process, by using convolution kernels of different sizes included in the downsampling network, image features of each channel corresponding to the input image to be defogged, and output image features corresponding to the same number of channels;
in one possible implementation, the downsampling processing unit 244 may include:
the first processing unit is used for inputting the image features of the channels of the first number into a first convolution kernel with a first size for operation and outputting the image features corresponding to the channels of the second number respectively, wherein the second number is greater than the first number, and the first number is greater than the specific number;
the second processing unit is used for inputting the image characteristics of the second number of channels into a second convolution kernel with a second size for operation and outputting the image characteristics corresponding to a third number of channels, wherein the third number is equal to the second number;
and the third processing unit is used for inputting the image characteristics of the third number of channels into a third convolution kernel with a second size for operation and outputting the image characteristics corresponding to a fourth number of channels, wherein the fourth number is equal to the first number.
A smoothing unit 245, configured to perform smoothing processing on the input image feature by using a plurality of smoothing residual blocks included in the residual network, where the two void convolution layers included in the smoothing residual blocks have different expansion values.
In still other embodiments, as shown in fig. 14, the defogging processing module 24 may further include:
a cascade fusion processing unit 246, configured to perform cascade fusion processing on the image features output by the down-sampling network and the image features before being input into the down-sampling network, so as to increase the number of image features input into a network connected to the down-sampling network
It should be noted that, various modules, units, and the like in the embodiments of the foregoing apparatuses may be stored in the memory as program modules, and the processor executes the program modules stored in the memory to implement corresponding functions, and for the functions implemented by the program modules and their combinations and the achieved technical effects, reference may be made to the description of corresponding parts in the embodiments of the foregoing methods, which is not described in detail in this embodiment.
The embodiment of the present application further provides a storage medium, where a computer program is stored, where the computer program can be called and executed by a processor, to implement each step of the image defogging processing method described in any one of the above method embodiments, and a specific implementation process may refer to the description of the corresponding method embodiment above, and is not described again.
Referring to fig. 2, an embodiment of the present application further provides a computer device, and regarding the implementation of the composition structure and the function of the computer device, reference may be made to the description of the embodiment of the computer device, which is not described herein again.
In summary, after acquiring the fogging image, the computer device may process the fogging image into a to-be-processed fogging image with a RGB format and a specific size, acquire an edge feature of an object therein, and use the edge feature as a new channel to obtain a to-be-defogged image with 4 channels, and then input the to-be-defogged image into the defogging model with the network structure to perform defogging processing, so as to obtain a non-fogging image with the same size. In combination with the description of the network structure of the defogging model, the network parameters are compressed by replacing two full-connection layers in the SE layer network structure with the convolution layers, the processing efficiency is improved, the attention weights of multiple dimensions of the image to be defogged are calculated, accordingly, more useful features are promoted for defogging processing of the image, and more less useful features are suppressed, so that the accuracy of model processing is improved, and the defogging effect and the defogging processing efficiency of the image are further improved.
In addition, according to actual needs, at least part of the cavity convolutions in the defogging model can be replaced by the depth separable convolutions to compress network parameters, and defogging processing efficiency is further improved; in the down-sampling process, convolution kernels with different sizes are used, the image features are processed in a mode of changing a feature channel, more features in the image are reserved, and the fidelity of the obtained image is improved; the smoothing residual block designed by the staggered expansion value is used for smoothing the input image features, so that the network effect is reduced, the image definition is improved, and the like.
Finally, it should be noted that, in the present specification, the embodiments are described in a progressive or parallel manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device and the computer equipment disclosed by the embodiment correspond to the method disclosed by the embodiment, so that the description is relatively simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An image defogging processing method, comprising:
acquiring a to-be-processed foggy image;
acquiring the object edge characteristics of the to-be-processed foggy image;
taking the edge characteristics of the object as a new channel of the to-be-processed foggy image to obtain the to-be-defogged image with a specific number of channels;
inputting the image to be defogged into a defogging model for processing, and outputting a fog-free image corresponding to the image to be defogged;
the defogging model is obtained by training a fog image of a sample with a specific number of channels based on a machine learning network model, the machine learning network model is provided with an SE (selective emitter) layer network structure, and the SE layer network structure is a convolution layer, so that the calculation of the attention weight of each of multiple dimensions of the input image to be defogged is realized.
2. The method of claim 1, the processing of the image to be dehazed by the dehazing model comprising:
and acquiring the channel weight of the image characteristics of each channel corresponding to the to-be-defogged image input into the SE layer network structure and the pixel weight of each pixel by using the convolution layer in the SE layer network structure.
3. The method according to claim 2, wherein the obtaining, by using a convolutional layer in the SE layer network structure, a channel weight of an image feature of each channel corresponding to the image to be defogged input to the SE network and a pixel weight of each pixel includes:
performing convolution operation on the image characteristics of each channel corresponding to the acquired image to be defogged through an attention mechanism to obtain channel weights corresponding to the image characteristics of each channel, and performing weighting operation on the image characteristics of each channel by using the channel weights;
performing convolution operation on the pixel information of the acquired image to be defogged through an attention mechanism to obtain pixel weights of different pixels contained in the image to be defogged, and performing weighting operation on the pixels of the acquired image to be defogged by utilizing the pixel weights.
4. The method according to any of claims 1 to 3, wherein the machine learning network model further comprises at least a plurality of combinations of depth separable convolutional layers, down-sampling networks, residual error networks, and the processing of the image to be defogged by the defogging model further comprises a plurality of combinations of:
expanding the image characteristics of each channel corresponding to the input image to be defogged through the depth separable convolution layer, and outputting the corresponding image characteristics after the channels are expanded;
processing the image characteristics of each channel corresponding to the input image to be defogged by utilizing convolution kernels with different sizes contained in the downsampling network, and outputting the image characteristics corresponding to the channels with the same number;
and smoothing the input image characteristics by utilizing a plurality of smooth residual blocks contained in the residual network, wherein the expansion values of two void convolution layers contained in the smooth residual blocks are different.
5. The method according to claim 4, wherein the processing, by using convolution kernels of different sizes included in the downsampling network, the image features of the channels corresponding to the input image to be defogged, and outputting the image features corresponding to the same number of channels respectively, includes:
inputting respective image features of a first number of channels into a first convolution kernel with a first size for operation, and outputting image features corresponding to a second number of channels respectively, wherein the second number is greater than the first number, and the first number is greater than the specific number;
inputting the respective image features of the second number of channels into a second convolution kernel with a second size for operation, and outputting image features corresponding to a third number of channels respectively, wherein the third number is equal to the second number;
and inputting the image features of the channels of the third number into a third convolution kernel with a second size for operation, and outputting the image features corresponding to the channels of a fourth number, wherein the fourth number is equal to the first number.
6. The method of claim 4, the processing of the image to be dehazed by the dehazing model further comprising:
and carrying out cascade fusion processing on the image features output by the down-sampling network and the image features before being input into the down-sampling network so as to increase the number of the image features input into a network connected with the down-sampling network.
7. The method of claim 1, the obtaining object edge features of the fog image to be processed, comprising:
acquiring the image gradient change of each channel of the to-be-processed foggy image;
and obtaining the average image gradient change of the image gradient changes of a plurality of channels of the to-be-processed foggy image, and determining the object edge characteristics of the to-be-processed foggy image according to the average image gradient change.
8. An image defogging processing device, the device comprising:
the to-be-processed foggy image acquisition module is used for acquiring a to-be-processed foggy image;
the object edge characteristic acquisition module is used for acquiring the object edge characteristics of the to-be-processed foggy image;
the image to be defogged obtaining module is used for taking the edge characteristics of the object as a new channel of the image to be defogged to obtain the image to be defogged with a specific number of channels;
the defogging processing module is used for inputting the image to be defogged into a defogging model for processing and outputting a fog-free image corresponding to the image to be processed;
the defogging model is obtained by training a fog image of a sample with a specific number of channels based on a machine learning network model, the machine learning network model is provided with an SE (selective emitter) layer network structure, and the SE layer network structure is a convolution layer, so that the calculation of the attention weight of each of multiple dimensions of the input image to be defogged is realized.
9. The apparatus of claim 8, the defogging processing module comprising:
and the attention weight acquiring unit is used for acquiring the channel weight of the image characteristic of each channel corresponding to the image to be defogged which is input into the SE network and the pixel weight of each pixel by utilizing the convolution layer in the SE layer network structure.
10. A computer device, the computer device comprising:
a memory for storing a program for implementing the image defogging processing method according to any one of claims 1 to 7;
a processor for loading and executing the program stored in the memory to realize the steps of the image defogging processing method according to any one of claims 1 to 7.
CN202010243548.4A 2020-03-31 2020-03-31 Image defogging processing method and device and computer equipment Active CN111445418B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010243548.4A CN111445418B (en) 2020-03-31 2020-03-31 Image defogging processing method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010243548.4A CN111445418B (en) 2020-03-31 2020-03-31 Image defogging processing method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN111445418A true CN111445418A (en) 2020-07-24
CN111445418B CN111445418B (en) 2024-05-28

Family

ID=71650971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010243548.4A Active CN111445418B (en) 2020-03-31 2020-03-31 Image defogging processing method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN111445418B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932365A (en) * 2020-08-11 2020-11-13 武汉谦屹达管理咨询有限公司 Financial credit investigation system and method based on block chain
CN112132761A (en) * 2020-09-16 2020-12-25 厦门大学 Single image defogging method based on cyclic context aggregation network
CN112183360A (en) * 2020-09-29 2021-01-05 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image
CN112561819A (en) * 2020-12-17 2021-03-26 温州大学 Self-filtering image defogging algorithm based on self-supporting model
CN112614072A (en) * 2020-12-29 2021-04-06 北京航空航天大学合肥创新研究院 Image restoration method and device, image restoration equipment and storage medium
CN112651891A (en) * 2020-12-18 2021-04-13 贵州宇鹏科技有限责任公司 Image defogging method based on deep learning
CN113139922A (en) * 2021-05-31 2021-07-20 中国科学院长春光学精密机械与物理研究所 Image defogging method and defogging device
CN113344806A (en) * 2021-07-23 2021-09-03 中山大学 Image defogging method and system based on global feature fusion attention network
WO2022095253A1 (en) * 2020-11-04 2022-05-12 常州工学院 Method for removing cloud and haze on basis of depth channel sensing
CN115668272A (en) * 2020-12-24 2023-01-31 京东方科技集团股份有限公司 Image processing method and apparatus, computer readable storage medium
CN116129379A (en) * 2022-12-28 2023-05-16 国网安徽省电力有限公司芜湖供电公司 Lane line detection method in foggy environment
CN117151990A (en) * 2023-06-28 2023-12-01 西南石油大学 Image defogging method based on self-attention coding and decoding
CN117690128A (en) * 2024-02-04 2024-03-12 武汉互创联合科技有限公司 Embryo cell multi-core target detection system, method and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130071043A1 (en) * 2011-09-08 2013-03-21 Fujitsu Limited Image defogging method and system
CN109447962A (en) * 2018-10-22 2019-03-08 天津工业大学 A kind of eye fundus image hard exudate lesion detection method based on convolutional neural networks
CN110097519A (en) * 2019-04-28 2019-08-06 暨南大学 Double supervision image defogging methods, system, medium and equipment based on deep learning
CN110880165A (en) * 2019-10-15 2020-03-13 杭州电子科技大学 Image defogging method based on contour and color feature fusion coding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130071043A1 (en) * 2011-09-08 2013-03-21 Fujitsu Limited Image defogging method and system
CN109447962A (en) * 2018-10-22 2019-03-08 天津工业大学 A kind of eye fundus image hard exudate lesion detection method based on convolutional neural networks
CN110097519A (en) * 2019-04-28 2019-08-06 暨南大学 Double supervision image defogging methods, system, medium and equipment based on deep learning
CN110880165A (en) * 2019-10-15 2020-03-13 杭州电子科技大学 Image defogging method based on contour and color feature fusion coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
解怀奇等: "基于通道注意力机制的视频人体行为识别", 《电子技术与软件工程》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932365A (en) * 2020-08-11 2020-11-13 武汉谦屹达管理咨询有限公司 Financial credit investigation system and method based on block chain
CN112132761A (en) * 2020-09-16 2020-12-25 厦门大学 Single image defogging method based on cyclic context aggregation network
CN112132761B (en) * 2020-09-16 2023-07-14 厦门大学 Single image defogging method based on cyclic context aggregation network
CN112183360B (en) * 2020-09-29 2022-11-08 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image
CN112183360A (en) * 2020-09-29 2021-01-05 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image
WO2022095253A1 (en) * 2020-11-04 2022-05-12 常州工学院 Method for removing cloud and haze on basis of depth channel sensing
CN112561819A (en) * 2020-12-17 2021-03-26 温州大学 Self-filtering image defogging algorithm based on self-supporting model
CN112651891A (en) * 2020-12-18 2021-04-13 贵州宇鹏科技有限责任公司 Image defogging method based on deep learning
CN115668272A (en) * 2020-12-24 2023-01-31 京东方科技集团股份有限公司 Image processing method and apparatus, computer readable storage medium
CN112614072A (en) * 2020-12-29 2021-04-06 北京航空航天大学合肥创新研究院 Image restoration method and device, image restoration equipment and storage medium
CN113139922B (en) * 2021-05-31 2022-08-02 中国科学院长春光学精密机械与物理研究所 Image defogging method and defogging device
CN113139922A (en) * 2021-05-31 2021-07-20 中国科学院长春光学精密机械与物理研究所 Image defogging method and defogging device
CN113344806A (en) * 2021-07-23 2021-09-03 中山大学 Image defogging method and system based on global feature fusion attention network
CN116129379A (en) * 2022-12-28 2023-05-16 国网安徽省电力有限公司芜湖供电公司 Lane line detection method in foggy environment
CN116129379B (en) * 2022-12-28 2023-11-07 国网安徽省电力有限公司芜湖供电公司 Lane line detection method in foggy environment
CN117151990A (en) * 2023-06-28 2023-12-01 西南石油大学 Image defogging method based on self-attention coding and decoding
CN117151990B (en) * 2023-06-28 2024-03-22 西南石油大学 Image defogging method based on self-attention coding and decoding
CN117690128A (en) * 2024-02-04 2024-03-12 武汉互创联合科技有限公司 Embryo cell multi-core target detection system, method and computer readable storage medium
CN117690128B (en) * 2024-02-04 2024-05-03 武汉互创联合科技有限公司 Embryo cell multi-core target detection system, method and computer readable storage medium

Also Published As

Publication number Publication date
CN111445418B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
CN111445418A (en) Image defogging method and device and computer equipment
CN109858461B (en) Method, device, equipment and storage medium for counting dense population
CN109271933B (en) Method for estimating three-dimensional human body posture based on video stream
CN112308200B (en) Searching method and device for neural network
CN109919032B (en) Video abnormal behavior detection method based on motion prediction
US20150030237A1 (en) Image restoration cascade
CN111340077B (en) Attention mechanism-based disparity map acquisition method and device
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN112036381B (en) Visual tracking method, video monitoring method and terminal equipment
CN110809126A (en) Video frame interpolation method and system based on adaptive deformable convolution
CN108875931A (en) Neural metwork training and image processing method, device, system
CN113066034A (en) Face image restoration method and device, restoration model, medium and equipment
CN110634103A (en) Image demosaicing method based on generation of countermeasure network
JP2017068608A (en) Arithmetic unit, method and program
CN114140346A (en) Image processing method and device
Conde et al. Lens-to-lens bokeh effect transformation. NTIRE 2023 challenge report
CN115082306A (en) Image super-resolution method based on blueprint separable residual error network
JP2019197445A (en) Image recognition device, image recognition method, and program
CN113538402B (en) Crowd counting method and system based on density estimation
CN112633260B (en) Video motion classification method and device, readable storage medium and equipment
Zhang et al. A cross-scale framework for low-light image enhancement using spatial–spectral information
CN112686828A (en) Video denoising method, device, equipment and storage medium
Zhang et al. Iterative multi‐scale residual network for deblurring
CN112115786A (en) Monocular vision odometer method based on attention U-net
CN114612305A (en) Event-driven video super-resolution method based on stereogram modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant