CN115641584B

CN115641584B - Foggy day image identification method and device

Info

Publication number: CN115641584B
Application number: CN202211671845.4A
Authority: CN
Inventors: 辛贵鹏; 彭峰; 吉鑫钰; 吕小磊; 杨旺; 褚端峰
Original assignee: Wuhan Shentu Zhihang Technology Co ltd
Current assignee: Wuhan Shentu Zhihang Technology Co ltd
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2023-04-14
Anticipated expiration: 2042-12-26
Also published as: CN115641584A

Abstract

The invention provides a foggy day image identification method and a foggy day image identification device, wherein the foggy day image identification method comprises the following steps: acquiring a foggy day image to be identified; inputting the foggy day image to be recognized into a foggy day image recognition model to obtain a recognition result; the foggy day image recognition model comprises a backbone network, a domain self-adaptive network, a neck network and a head network; a backbone network obtains a plurality of feature extraction graphs; the domain self-adaptive network obtains a plurality of domain label feature maps according to the plurality of feature extraction maps; the neck network comprises a multi-attention module, a multi-scale feature fusion module and a plurality of dynamic convolution modules, wherein the multi-attention module obtains an attention feature map according to a domain label feature map, the multi-scale feature fusion module extracts a plurality of features of the attention feature map or the domain label feature map and performs feature fusion to obtain a fusion feature map, and the dynamic convolution module performs dynamic convolution on the fusion feature map to obtain a convolution feature map; the head network is used for obtaining a recognition result according to the convolution characteristic graph. The invention improves the identification precision and the identification efficiency.

Description

Foggy day image identification method and device

Technical Field

The invention relates to the technical field of image recognition, in particular to a foggy day image recognition method and device.

Background

At present, most of visual detection algorithms can only work well in a fog-free scene, and in a haze weather environment, the quality of images shot by a vehicle-mounted camera is seriously affected, so that low-quality images in a fog day cannot be efficiently identified by a detected model. In order to eliminate the adverse effect of the foggy environment on the performance of a target detector and improve the foggy target detection effect, the existing foggy target detection methods can be mainly divided into two categories: one method is to divide the foggy day target detection problem into two links, firstly recover the foggy day image, and then detect the target of the defogged clean image; the other is a method utilizing the field of transfer learning, and the problem of target detection in foggy days is regarded as a process of transferring from target detection in a normal weather environment to a foggy weather environment.

The two prior arts have the technical problems that: 1. the primary task of the traditional foggy day target detection method (splitting method) is to recover foggy day images, namely the quality of the images after defogging can directly influence the detection effect of a target detector. Secondly, with the continuous change of the scene, the generalization capability of the model is poor, namely the problems of poor robustness and the like exist, so that the technical problems of inaccurate identification result and low identification efficiency are caused. 2. At present, a foggy day image is regarded as integral feature distribution based on a domain self-adaptive method, a domain classifier extracts features insufficiently, a target detection data set in a real foggy day environment is scarce, accurate marking is lacked, and the problem that an identification result is inaccurate is caused.

Therefore, a method and a device for identifying foggy day images are urgently needed to be provided, and the technical problems that in the prior art, an identification result which cannot identify foggy day images to be identified is inaccurate, and the identification efficiency is low are solved.

Disclosure of Invention

In view of the above, there is a need to provide a method and a device for identifying a foggy day image, so as to solve the technical problems in the prior art that an identification result for identifying a foggy day image to be identified is inaccurate and identification efficiency is low.

In one aspect, the invention provides a foggy day image identification method, which comprises the following steps:

acquiring a foggy day image to be identified;

inputting the foggy day image to be recognized into a well-trained foggy day image recognition model to obtain a recognition result;

the foggy day image recognition model comprises a backbone network, a domain self-adaptive network, a neck network and a head network; the backbone network is used for obtaining a plurality of characteristic extraction graphs according to the foggy day image to be identified; the domain self-adaptive network is used for obtaining a plurality of domain label feature maps carrying domain labels according to the plurality of feature extraction maps; the neck network comprises a multi-attention module, a multi-scale feature fusion module and a plurality of dynamic convolution modules, wherein the multi-attention module is used for obtaining a plurality of attention feature maps according to the plurality of domain label feature maps, the multi-scale feature fusion module is used for extracting a plurality of features of the plurality of attention feature maps or the plurality of domain label feature maps and fusing the features to obtain a fusion feature map, and the dynamic convolution module is used for dynamically convolving the fusion feature map to obtain a convolution feature map; the head network is used for obtaining the identification result according to the convolution feature map.

In some possible implementations, the plurality of feature extraction maps include a first feature extraction map, a second feature extraction map, and a third feature extraction map; the backbone network comprises an initial convolution attention module, a maximum pooling layer, a first backbone block, a second backbone block, a third backbone block and a fourth backbone block which are connected in sequence;

the foggy day image to be identified is sequentially processed by the initial convolution attention module, the maximum pooling layer, the first backbone block and the second backbone block to obtain a first feature extraction image;

the third backbone block is used for performing feature extraction on the first feature extraction graph to obtain a second feature extraction graph;

the fourth backbone block is used for carrying out feature extraction on the second feature extraction image to obtain the third feature extraction image.

In some possible implementations, the first, second, third, and fourth backbone blocks each include a first backbone unit and a plurality of second backbone units, the first backbone unit including a first convolution attention module, a first dynamic convolution layer, a first batch normalization layer, a second dynamic convolution layer, a second batch normalization layer, and a first activation function layer; the second backbone unit comprises a second convolution attention module, a third dynamic convolution layer, a third batch normalization layer and a second activation function layer;

the initial convolution attention module, the first convolution attention module and the second convolution attention module respectively comprise an attention dynamic convolution layer, an attention batch normalization layer and an attention activation function layer which are sequentially connected.

In some possible implementations, the domain adaptation network includes a first domain adaptation module corresponding to the first feature extraction map, a second domain adaptation module corresponding to the second feature extraction, and a third domain adaptation module corresponding to the third feature extraction map; the multi-attention module includes a first multi-attention unit corresponding to the first feature extraction map and a second multi-attention unit corresponding to the second feature extraction map; the multi-scale feature fusion module comprises a first multi-scale feature fusion unit corresponding to the first feature extraction graph, a second multi-scale feature fusion unit corresponding to the second feature extraction graph and a third multi-scale feature fusion unit corresponding to the third feature extraction graph;

the first domain adaptation module is used for determining a first domain label feature map of the first feature extraction map;

the second domain adaptation module is used for determining a second domain label feature map of the second feature extraction map;

the third domain adaptation module is used for determining a third domain label feature map of the third feature extraction map;

the first multiple attention unit is used for obtaining a first attention feature map according to the first domain label feature map;

the second multi-attention unit is used for obtaining a second attention feature map according to the second domain label feature map;

the first multi-scale feature fusion unit is used for extracting a plurality of features of the first attention feature map and performing feature fusion to obtain a first fusion feature map;

the second multi-scale feature fusion unit is used for extracting a plurality of features of the second attention feature map and performing feature fusion to obtain a second fusion feature map;

the third multi-scale feature fusion unit is used for extracting a plurality of features of the third domain label feature map and performing feature fusion to obtain a third fusion feature map.

In some possible implementations, the first and second multi-attention units each include a parallel channel attention submodule, a regional attention submodule, and a joint convolution layer, the channel attention submodule including a global average pooling layer, a first channel full-link layer, a second channel full-link layer, a first channel activation function layer, a third channel full-link layer, and a second channel activation function layer; the region attention submodule comprises a first region convolution layer, a first region activation function layer, a second region convolution layer, a second region activation function layer, a third region convolution layer and a third region activation function layer;

the channel attention submodule is used for extracting channel attention features of the domain label feature maps;

the region attention submodule is used for extracting region attention features of the plurality of domain label feature maps;

and the joint convolution layer is used for carrying out convolution operation on the channel attention feature, the region attention feature and the domain label feature map to obtain the attention feature map.

In some possible implementations, the first multi-scale feature fusion unit, the second multi-scale feature fusion unit, and the third multi-scale feature fusion unit each include: the device comprises a first scale feature extraction subunit, a second scale feature extraction subunit, a third scale feature extraction subunit, a fourth scale feature extraction subunit, a first multi-scale feature fusion layer, a multi-scale convolution layer and a second multi-scale feature fusion layer;

the first scale feature extraction subunit, the second scale feature extraction subunit, the third scale feature extraction subunit, and the fourth scale feature extraction subunit are respectively configured to extract multi-scale features of the attention feature map or the domain label feature map, and correspondingly obtain a first scale feature map, a second scale feature map, a third scale feature map, and a fourth scale feature map;

the first multi-scale feature fusion layer is used for fusing the first scale feature map, the second scale feature map and the third scale feature map to obtain an initial feature fusion map;

the multi-scale convolution layer is used for performing convolution on the initial feature fusion graph to obtain a multi-scale convolution feature graph;

and the second multi-scale feature fusion layer is used for fusing the multi-scale convolution feature map and the fourth scale feature map to obtain the fused feature map.

In some possible implementations, the first domain adaptation module, the second domain adaptation module, and the third domain adaptation module each include a gradient inversion layer, a first domain adaptive convolution layer, a second domain adaptive convolution layer, and a domain classifier connected in sequence.

In some possible implementations, the dynamic convolution module includes a first dynamic convolution unit connected to the first multi-scale feature fusion unit, a second dynamic convolution unit connected to the second multi-scale feature fusion unit, a third dynamic convolution unit connected to the third multi-scale feature fusion unit, and a fourth dynamic convolution unit and a fifth dynamic convolution unit sequentially connected to the third dynamic convolution unit; the neck network further comprises a first upsampling layer connected with the third dynamic convolution unit and a second upsampling layer connected with the second dynamic convolution unit;

the third dynamic convolution unit is used for performing dynamic convolution on the third fusion characteristic diagram to obtain a third convolution characteristic diagram;

the first upsampling layer is used for upsampling the third convolution characteristic diagram to obtain a first upsampling characteristic diagram;

the second dynamic convolution unit is used for performing dynamic convolution on the second fusion characteristic diagram and the first up-sampling characteristic diagram to obtain a second convolution characteristic diagram;

the second upsampling layer is used for upsampling the second convolution characteristic diagram to obtain a second upsampling characteristic diagram;

the first dynamic convolution unit is used for performing dynamic convolution on the first fused feature map and the second up-sampling feature map to obtain a first convolution feature map;

the fourth dynamic convolution unit is used for performing dynamic convolution on the third convolution characteristic diagram to obtain a fourth convolution characteristic diagram;

and the fifth dynamic convolution unit is used for performing dynamic convolution on the fourth convolution characteristic diagram to obtain a fifth convolution characteristic diagram.

In some possible implementations, the first dynamic convolution unit includes a dynamic global average pooling layer, a first dynamic full-link layer, a first dynamic activation function layer, a second dynamic full-link layer, a second dynamic activation function layer, three weighted dynamic convolution layers in parallel, a dynamic joint convolution layer, a dynamic batch normalization layer, and a third dynamic activation function layer.

On the other hand, the invention also provides a foggy day image recognition device, which comprises:

the image acquisition unit is used for acquiring a foggy day image to be identified;

the image recognition unit is used for inputting the foggy day image to be recognized into a foggy day image recognition model with complete training to obtain a recognition result;

the foggy day image recognition model comprises a backbone network, a domain self-adaptive network, a neck network and a head network; the backbone network is used for obtaining a plurality of characteristic extraction graphs according to the foggy day image to be identified; the domain self-adaptive network is used for obtaining a plurality of domain label characteristic graphs carrying domain labels according to the plurality of characteristic extraction graphs; the neck network comprises a multiple attention module, a multi-scale feature fusion module and a plurality of dynamic convolution modules, wherein the multiple attention module is used for obtaining a plurality of attention feature maps according to the plurality of domain label feature maps, the multi-scale feature fusion module is used for extracting a plurality of features of the plurality of attention feature maps or the plurality of domain label feature maps and carrying out feature fusion to obtain a fusion feature map, and the dynamic convolution module is used for carrying out dynamic convolution on the fusion feature map to obtain a convolution feature map; the head network is used for obtaining the identification result according to the convolution feature map.

The beneficial effects of adopting the above embodiment are: according to the foggy day image identification method, the neck network is arranged to comprise the dynamic convolution module, compared with the traditional convolution, the method does not depend on input, and shares the characteristics of convolution kernel parameters and the like, a specific convolution kernel parameter can be learned for different types of targets in the image, the accuracy and the precision of feature extraction are improved, and therefore the detection accuracy of an identification result can be improved. Furthermore, the method abandons a cascade detection method of defogging the foggy image to be identified and then detecting, uses an end-to-end model to directly identify the foggy image, and improves the detection efficiency. Furthermore, the neck network comprises the multi-scale feature fusion module, so that the expression capability of the shallow network features is further enhanced, and the accuracy of the identified identification result can be further improved.

Moreover, the domain adaptive network provided by the invention obtains a plurality of domain label feature maps according to a plurality of feature extraction maps, realizes a domain adaptive method at a feature level, can solve the technical problems of shortage of real data sets in foggy days and no labels in the prior art, and reduces the domain deviation of normal weather environment and foggy day environment, thereby further improving the identification precision of an identification result.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an embodiment of a foggy day image identification method provided by the present invention;

FIG. 2 is a schematic structural diagram of an embodiment of a foggy day image recognition model provided by the present invention;

fig. 3 is a schematic structural diagram of an embodiment of a first backbone unit provided in the present invention;

FIG. 4 is a schematic structural diagram of an initial convolution attention module, a first convolution attention module, and a second convolution attention module according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an embodiment of a first multi-attention unit and a second multi-attention unit provided in the present invention;

fig. 6 is a schematic structural diagram of an embodiment of a first multi-scale feature fusion unit, a second multi-scale feature fusion unit, and a third multi-scale feature fusion unit provided in the present invention;

FIG. 7 is a schematic structural diagram of an embodiment of a first domain adaptation module, a second domain adaptation module, and a third domain adaptation module provided in the present invention;

FIG. 8 is a schematic structural diagram of a first dynamic convolution unit according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an embodiment of the foggy day image recognition device provided by the invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor systems and/or microcontroller systems.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The embodiment of the invention provides a foggy day image identification method and a foggy day image identification device, which are respectively explained below.

Before the embodiment is shown, the applicable scenes of the embodiment of the invention are introduced: the embodiment of the invention aims at three objects of motor vehicles, non-motor vehicles and pedestrians in a dynamic traffic environment, namely: motor vehicles, non-motor vehicles and pedestrians are identified.

Fig. 1 is a schematic flow diagram of an embodiment of a foggy day image recognition method provided by the present invention, and fig. 2 is a schematic structural diagram of an embodiment of a foggy day image recognition model provided by the present invention, as shown in fig. 1 and fig. 2, the foggy day image recognition method includes:

s101, obtaining an image of the foggy day to be identified;

s102, inputting the foggy day image to be recognized into a well-trained foggy day image recognition model to obtain a recognition result;

the foggy day image recognition model comprises a backbone network, a domain self-adaptive network, a neck network and a head network; the backbone network is used for obtaining a plurality of characteristic extraction graphs according to the foggy day images to be identified; the domain self-adaptive network is used for obtaining a plurality of domain label feature maps carrying domain labels according to the plurality of feature extraction maps; the neck network comprises a multi-attention module, a multi-scale feature fusion module and a plurality of dynamic convolution modules, wherein the multi-attention module is used for obtaining a plurality of attention feature maps according to a plurality of domain label feature maps, the multi-scale feature fusion module is used for extracting a plurality of features of the plurality of attention feature maps or the plurality of domain label feature maps and carrying out feature fusion to obtain a fusion feature map, and the dynamic convolution module is used for carrying out dynamic convolution on the fusion feature map to obtain a convolution feature map; the head network is used for obtaining a recognition result according to the convolution feature map.

Compared with the prior art, the foggy day image identification method provided by the embodiment of the invention has the advantages that the neck network comprises the dynamic convolution module, compared with the traditional convolution, the method does not depend on input, and the characteristics such as convolution kernel parameters and the like are shared, so that a specific convolution kernel parameter can be learned for different types of targets in the image, the accuracy and the precision of feature extraction are improved, and the detection precision of an identification result can be improved. Furthermore, the embodiment of the invention abandons a cascade detection method of defogging the foggy image to be identified and detecting the foggy image, and uses an end-to-end model to directly identify the foggy image, thereby improving the detection efficiency. Furthermore, the neck network comprises the multi-attention module, so that the key features can be screened out while the calculated amount is reduced, the background noise is suppressed, and the expression capability of the shallow network features is further enhanced by arranging the neck network comprising the multi-scale feature fusion module, so that the accuracy of the identified identification result can be further improved.

In addition, the domain adaptive network provided by the embodiment of the invention obtains a plurality of domain label feature maps according to a plurality of feature extraction maps, realizes a domain adaptive method at a feature level, can solve the technical problems of shortage of real data sets in foggy days and no labels in the prior art, reduces the domain deviation of normal weather environments and foggy day environments, and further improves the identification precision of identification results.

It should be understood that: the mode of acquiring the fog day image to be identified in step S101 may be to acquire the fog day image to be identified according to an image acquisition device, or may be to call a historically stored fog day image to be identified from a storage medium.

In some embodiments of the present invention, the plurality of feature extraction maps includes a first feature extraction map, a second feature extraction map, and a third feature extraction map; as shown in fig. 2, the backbone network includes an initial convolution attention module, a maximum pooling layer, a first backbone block, a second backbone block, a third backbone block, and a fourth backbone block, which are connected in sequence; the first backbone block, the second backbone block, the third backbone block and the fourth backbone block each comprise a first backbone unit and a plurality of second backbone units;

the foggy day image to be identified is sequentially processed by an initial convolution attention module, a maximum pooling layer, a first backbone block and a second backbone block to obtain a first characteristic extraction image;

the third backbone block is used for carrying out feature extraction on the first feature extraction image to obtain a second feature extraction image;

and the fourth backbone block is used for carrying out feature extraction on the second feature extraction image to obtain a third feature extraction image.

In a specific embodiment of the present invention, the first backbone block comprises two second backbone units, the second backbone block comprises three second backbone units, the third backbone block comprises five second backbone units, and the fourth backbone block comprises two second backbone units.

In some embodiments of the present invention, as shown in fig. 3, the first backbone unit includes a first convolution attention module, a first dynamic convolution layer, a first batch normalization layer, a second dynamic convolution layer, a second batch normalization layer, and a first activation function layer. The second backbone unit comprises a second convolution attention module, a third dynamic convolution layer, a third batch normalization layer and a second activation function layer.

Specifically, as shown in fig. 4, the initial convolution attention module, the first convolution attention module, and the second convolution attention module each include an attention dynamic convolution layer, an attention batch normalization layer, and an attention activation function layer, which are connected in sequence.

The activation function of the first activation function layer, the second activation function layer and the attention activation function layer is any one of the activation functions of Sigmoid, tanh, reLU, leakyreu, ELU, maxout and the like.

In some embodiments of the present invention, as shown in FIG. 2, the domain adaptation network includes a first domain adaptation module corresponding to the first feature extraction map, a second domain adaptation module corresponding to the second feature extraction map, and a third domain adaptation module corresponding to the third feature extraction map; the multi-attention module comprises a first multi-attention unit corresponding to the first feature extraction diagram and a second multi-attention unit corresponding to the second feature extraction diagram; the multi-scale feature fusion module comprises a first multi-scale feature fusion unit corresponding to the first feature extraction graph, a second multi-scale feature fusion unit corresponding to the second feature extraction graph and a third multi-scale feature fusion unit corresponding to the third feature extraction graph;

the first domain self-adapting module is used for determining a first domain label feature map of the first feature extraction map;

the second domain self-adapting module is used for determining a second domain label feature map of the second feature extraction map;

the third domain self-adaptive module is used for determining a third domain label feature map of a third feature extraction map;

the first multi-attention unit is used for obtaining a first attention feature map according to the first domain label feature map;

and the third multi-scale feature fusion unit is used for extracting a plurality of features of the third domain label feature map and performing feature fusion to obtain a third fusion feature map.

In some embodiments of the present invention, as shown in fig. 5, each of the first multi-attention unit and the second multi-attention unit includes a parallel channel attention submodule, a region attention submodule, and a joint convolution layer, where the channel attention submodule includes a global average pooling layer, a first channel full-connected layer, a second channel full-connected layer, a first channel activation function layer, a third channel full-connected layer, and a second channel activation function layer; the region attention submodule comprises a first region convolution layer, a first region activation function layer, a second region convolution layer, a second region activation function layer, a third region convolution layer and a third region activation function layer;

and the joint convolution layer is used for carrying out convolution operation on the channel attention feature, the region attention feature and the domain label feature map to obtain an attention feature map.

According to the embodiment of the invention, by arranging the parallel channel attention submodule and the area attention submodule, the key space information can be extracted through the channel attention submodule, and the noise information is suppressed through the area attention submodule. And then, carrying out convolution operation on the channel attention feature, the region attention feature and the domain label feature map through the joint convolution layer, carrying out re-weighting to obtain an attention feature map, and being capable of amplifying key feature information to the maximum extent and suppressing background noise. Therefore, the accuracy of the extracted attention feature map is improved, and the accuracy of the recognition result can be further improved.

Specifically, the activation function of the first channel activation function layer, the first region activation function layer, and the second region activation function layer is ReLu, and the activation function of the second channel activation function layer and the third region activation function layer is Sigmoid.

The channel attention submodule in the embodiment of the invention firstly uses the global average pooling layer to add all pixel values of each channel feature map to obtain an average value so as to generate a merged feature map, and secondly generates a one-dimensional channel attention feature through the first channel full-connection layer, the second channel full-connection layer, the first channel activation function layer, the third channel full-connection layer and the second channel activation function layer. The region attention submodule firstly compresses the channel by using a first region convolution layer with the convolution kernel size of 1 × 1 to reduce the channel dimension and reduce the calculation link, and secondly extracts the spatial information of a key region by using a second region convolution layer and a third region convolution layer with the convolution kernel size of 3 × 3 and the first region activation function layer, the second region activation function layer and the third region activation function layer, wherein the convolution kernel of 3 × 3 can reduce the calculation amount while keeping the receptive field.

In some embodiments of the present invention, as shown in fig. 6, the first multi-scale feature fusion unit, the second multi-scale feature fusion unit, and the third multi-scale feature fusion unit each include: the device comprises a first scale feature extraction subunit, a second scale feature extraction subunit, a third scale feature extraction subunit, a fourth scale feature extraction subunit, a first multi-scale feature fusion layer, a multi-scale convolutional layer and a second multi-scale feature fusion layer;

the first scale feature extraction subunit, the second scale feature extraction subunit, the third scale feature extraction subunit and the fourth scale feature extraction subunit are respectively used for extracting multi-scale features of the attention feature map or the domain label feature map, and correspondingly obtaining a first scale feature map, a second scale feature map, a third scale feature map and a fourth scale feature map;

and the second multi-scale feature fusion layer is used for fusing the multi-scale convolution feature map and the fourth scale feature map to obtain a fusion feature map.

According to the embodiment of the invention, by arranging the four branches of the first scale feature extraction subunit, the second scale feature extraction subunit, the third scale feature extraction subunit and the fourth scale feature extraction subunit, the multi-scale receptive field features can be fused, and the semantic information and the texture features in the shallow feature map are fully utilized, so that the shallow feature map is efficiently utilized. Therefore, the aim of utilizing the shallow feature map to the maximum extent and further enhancing the information in the shallow feature can be achieved, the information of the small target is prevented from being covered by noise, and the accuracy of feature extraction is further improved.

In a specific embodiment of the present invention, as shown in fig. 6, the first scale feature extraction subunit includes three first scale convolution layers with convolution kernel sizes of 1 × 1, 3 × 3, and 1 × 1, respectively; the second scale feature extraction subunit comprises four second scale convolution layers with convolution kernel sizes of 1 × 1, 3 × 3, 5 × 5 and 1 × 1 respectively; the third scale feature extraction subunit comprises an average pooling layer with the size of 3 multiplied by 3 and a third scale convolution layer with the convolution kernel size of 1 multiplied by 1; the fourth scale feature extraction subunit comprises a fourth scale convolution layer with the convolution kernel size of 1 × 1.

The embodiment of the invention combines the multiscale feature maps after 3 × 3 convolution, 5 × 5 convolution and 3 × 3 pooling layers. Second, the channels are compressed using a 1 × 1 convolution. The parallel convolution kernels with different sizes are used for capturing target features under different receptive fields, so that multi-scale receptive field features are fused, and the shallow feature map is efficiently utilized. And finally, summing the output and the feature map after being convolved by 1 multiplied by 1 to generate a fused feature map after multi-scale fusion. The characteristic expression of the smaller target is strengthened, so that the foggy day image recognition model is more sensitive to the smaller target, and the recognition precision of the foggy day image recognition model is improved, thereby improving the recognition precision of the foggy day image recognition method.

In some embodiments of the present invention, as shown in fig. 7, the first domain adaptation module, the second domain adaptation module, and the third domain adaptation module each include a gradient inversion layer, a first domain adaptive convolution layer, a second domain adaptive convolution layer, and a domain classifier, which are connected in sequence.

Wherein, the gradient inversion layer transmits positive values in the forward propagation process of the network training and transmits negative values in the backward propagation process, and the function of the gradient inversion layer is to maximize the loss of the domain adaptive module. The first domain adaptive convolutional layer, the second domain adaptive convolutional layer and the domain classifier are used for predicting the class probability that the feature extraction map belongs to a certain domain, namely, the domain label.

According to the embodiment of the invention, by arranging the first domain self-adaptive convolution layer and the second domain self-adaptive convolution layer, the extracted high-level features can cover more useful information after the feature extraction graph is subjected to convolution operation, and meanwhile, the differences of image styles, positions or illumination conditions and the like between the source domain (images in normal weather) and the target domain (images in fog days) can be greatly reduced, so that the individual target differences can be amplified, the common features between domains can be extracted, and the domain classifier is more accurate.

In an embodiment of the present invention, the depth of the feature extraction map is reduced to 128 after the first domain adaptive convolution layer, and the number of channels of the feature extraction map is reduced to 1 after the second domain adaptive convolution layer.

In some embodiments of the present invention, as shown in fig. 2, the dynamic convolution module includes a first dynamic convolution unit connected to the first multi-scale feature fusion unit, a second dynamic convolution unit connected to the second multi-scale feature fusion unit, a third dynamic convolution unit connected to the third multi-scale feature fusion unit, and a fourth dynamic convolution unit and a fifth dynamic convolution unit sequentially connected to the third dynamic convolution unit; the neck network further comprises a first upsampling layer connected with the third dynamic convolution unit and a second upsampling layer connected with the second dynamic convolution unit;

the first up-sampling layer is used for up-sampling the third convolution characteristic diagram to obtain a first up-sampling characteristic diagram;

the first dynamic convolution unit is used for performing dynamic convolution on the first fusion characteristic diagram and the second up-sampling characteristic diagram to obtain a first convolution characteristic diagram;

According to the embodiment of the invention, the weight of the small target in the convolution characteristic diagram can be further increased by arranging the first upper sampling layer and the second upper sampling layer, the characteristic expression of the small target is strengthened, the sensitivity of the foggy day image identification model to the small target is further improved, and the identification precision of the foggy day image identification method is further improved.

In some embodiments of the present invention, as shown in fig. 8, the first dynamic convolution unit includes a dynamic global averaging pooling layer, a first dynamic full-link layer, a first dynamic activation function layer, a second dynamic full-link layer, a second dynamic activation function layer, three weighted dynamic convolution layers in parallel, a dynamic joint convolution layer, a dynamic batch normalization layer, and a third dynamic activation function layer.

The three weighted dynamic convolution layers respectively represent convolution layers of three categories of motor vehicles, extension vehicles and pedestrians.

The activation function of the first dynamic activation function layer and the activation function of the third dynamic activation function layer are ReLu, and the activation function of the second dynamic activation function layer is Sigmoid.

The basic assumptions of conventional convolution are: the convolution kernel parameters are shared for all samples. In conventional convolution, its convolution kernel parameters are trained to be determined and "one-look-at-one" for all input samples. Assuming that the research objects are three major categories of motor vehicles, non-motor vehicles and pedestrians, the traditional convolution kernel represents a more authoritative expert Conv in three fields of motor vehicle x1, non-motor vehicle x2 and pedestrian x3 identification, and the input x and output satisfy: output = Conv (xi). Therefore, when receiving features from objects in different classes at the upper layer, the "absolute fairness" can lead to that the inter-class difference of the classes with originally great feature difference is reduced sharply when the classes are represented by the trained convolution kernel parameters. The weight dynamic convolution layer in the embodiment of the invention changes the original more authoritative expert in three fields into the most authoritative expert in three respective fields: a vehicle recognition field Conv1, a non-vehicle recognition field Conv2, and a pedestrian recognition field Conv3. In the first dynamic convolution unit, the convolution kernel of each weight dynamic convolution layer has the same dimension as the conventional static convolution kernel parameters, and the convolution kernel parameters are obtained by transforming the input. The process can be described as:

in the formula, W ₁ 、W ₂ And W ₃ Respectively representing the motor vehicle identification field Conv ₁ Conv in the field of non-motor vehicle identification ₂ Conv in the field of pedestrian recognition ₃ The weight of (c); and (5) performing Concat superposition treatment.

In the first dynamic convolution unit, the feature map from the upper layer not only contains the target feature information, but also may contain a large amount of background noise features, and as the convolution layer becomes deeper, the feature information is continuously compressed, which results in less and less target key information, and this is fatal for small targets, and the background noise directly masks the tiny small target features. Therefore, it is necessary to suppress non-interesting target information, to improve the criticality of the target feature, to make each expert recognize it more accurately and reliably, and to update the weight information. Then, the output of the previous layer is connected by a second dynamic full-link layer and then is subjected to exponential normalization processing by Sigmoid, the output is three dimensional parameters, and the process can be described as follows:

where R is a matrix of learned routing weights, allowing adaptation to the local receptive field using the global receptive field context, mapping the aggregated input to 3 expert weights.

W can be solved through the process ₁ 、W ₂ 、W ₃ The corresponding weight distribution of the three domain experts.

It should be understood that: the structures of the second dynamic convolution unit, the third dynamic convolution unit, the fourth dynamic convolution unit, and the fifth dynamic convolution unit are the same as the structure of the first dynamic convolution unit, which is not described in detail herein.

In some embodiments of the present invention, as shown in fig. 2, the head network includes a first head predicting module, a second head predicting module, a third head predicting module, a fourth head predicting module, and a fifth head predicting module, the first head predicting module is configured to predict the first convolution feature map and output a first prediction result, the second head predicting module is configured to predict the second convolution feature map and output a second prediction result, the third head predicting module is configured to predict the third convolution feature map and output a third prediction result, the fourth head predicting module is configured to predict the fourth convolution feature map and output a fourth prediction result, and the fifth head predicting module is configured to predict the fifth convolution feature map and output a fifth prediction result. And finally, the head network obtains an identification result according to the first prediction result, the second prediction result, the third prediction result, the fourth prediction result and the fifth prediction result.

In order to verify the improvement of the feature characterization capability of the foggy day image identification model provided by the embodiment of the invention after adding the multiple attention module, the multi-scale feature fusion module and the plurality of dynamic convolution modules, the results of identifying the normal weather are compared by adding four models, namely a traditional FCOS model, a model (FCOS + DyConv) for adding the dynamic convolution modules, a model (FCOS + DAM + MSF) for adding the multi-scale feature fusion module and the plurality of dynamic convolution modules, and a model (DF-FCOS) for adding the dynamic convolution modules, the multi-scale feature fusion module and the plurality of dynamic convolution modules, and the results are shown in table 1:

TABLE 1 comparison of detection accuracy of different recognition models in normal weather environment

As can be seen from table 1, dy-Conv is used instead of the conventional convolution Conv for feature extraction in the backbone network of FCOS while using the designed feature enhancement method (DAM in combination with MSF) for target detection. The improved network not only enhances the characteristic representation capability of the backbone network, but also enables the key characteristic diagram information sent to the detection part to be more prominent, thereby improving the identification precision of the normal weather image identification.

In order to verify the superiority of the fog image recognition model provided by the embodiment of the invention to the fog image recognition after being added into the domain adaptive network, detection accuracies of five recognition models, namely FasterR, DF-FCOS (DF-FCOS), fasterR (DA-FasterR) added into the domain adaptive network, YOLO (DA-YOLO) added into the domain adaptive network and FCOS (DA-FCOS) added into the domain adaptive network, which are not added into the domain adaptive network in the prior art, are compared on the Cityscapes data set and the FoggyCityscapes data set, and a comparison result of the detection accuracies is shown in Table 2:

TABLE 2 comparison of detection accuracy of different recognition models in foggy weather environment

As can be seen from Table 2: when the foggy day image recognition model provided by the embodiment of the invention is used for detecting foggy day targets, the detection Precision of various targets is higher than that of other recognition models, and the Average detection Precision (mAP) of all types is higher than that of other recognition models, so that the targets are recognized by the foggy day image recognition model provided by the embodiment of the invention, and the accuracy and Precision of the recognition result of the dynamic traffic targets in the foggy day environment can be greatly improved.

In order to better implement the foggy day image recognition method in the embodiment of the present invention, on the basis of the foggy day image recognition method, correspondingly, an embodiment of the present invention further provides a foggy day image recognition apparatus, as shown in fig. 9, the foggy day image recognition apparatus 900 includes:

an image obtaining unit 901, configured to obtain a foggy day image to be identified;

the image recognition unit 902 is used for inputting the foggy day image to be recognized into a well-trained foggy day image recognition model to obtain a recognition result;

the foggy day image recognition model comprises a backbone network, a domain self-adaptive network, a neck network and a head network; the backbone network is used for obtaining a plurality of characteristic extraction graphs according to the foggy day images to be identified; the domain self-adaptive network is used for obtaining a plurality of domain label feature maps carrying domain labels according to the plurality of feature extraction maps; the neck network comprises a multi-attention module, a multi-scale feature fusion module and a plurality of dynamic convolution modules, wherein the multi-attention module is used for obtaining a plurality of attention feature maps according to a plurality of domain label feature maps, the multi-scale feature fusion module is used for extracting a plurality of features of the plurality of attention feature maps or the plurality of domain label feature maps and carrying out feature fusion to obtain a fusion feature map, and the dynamic convolution module is used for carrying out dynamic convolution on the fusion feature map to obtain a convolution feature map; the head network is used for obtaining a recognition result according to the convolution characteristic graph.

The foggy day image recognition apparatus 900 provided in the foregoing embodiment may implement the technical solutions described in the foregoing embodiments of foggy day image recognition methods, and the specific implementation principles of the modules or units may refer to the corresponding contents in the foregoing embodiments of foggy day image recognition methods, and are not described herein again.

The method and the device for identifying the foggy day image provided by the invention are described in detail, a specific example is applied in the method to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A foggy day image identification method is characterized by comprising the following steps:

acquiring a foggy day image to be identified;

inputting the foggy day image to be recognized into a foggy day image recognition model with complete training to obtain a recognition result;

the foggy day image recognition model comprises a backbone network, a domain self-adaptive network, a neck network and a head network; the backbone network is used for obtaining a plurality of characteristic extraction graphs according to the foggy day image to be identified; the domain self-adaptive network is used for obtaining a plurality of domain label feature maps carrying domain labels according to the plurality of feature extraction maps; the neck network comprises a multiple attention module, a multi-scale feature fusion module and a plurality of dynamic convolution modules, wherein the multiple attention module is used for obtaining a plurality of attention feature maps according to the plurality of domain label feature maps, the multi-scale feature fusion module is used for extracting a plurality of features of the plurality of attention feature maps or the plurality of domain label feature maps and carrying out feature fusion to obtain a fusion feature map, and the dynamic convolution module is used for carrying out dynamic convolution on the fusion feature map to obtain a convolution feature map; the head network is used for obtaining the identification result according to the convolution feature map;

the plurality of feature extraction maps comprise a first feature extraction map, a second feature extraction map and a third feature extraction map;

the domain-adaptive network includes a first domain adaptation module corresponding to the first feature extraction map, a second domain adaptation module corresponding to the second feature extraction map, and a third domain adaptation module corresponding to the third feature extraction map; the multi-attention module includes a first multi-attention unit corresponding to the first feature extraction map and a second multi-attention unit corresponding to the second feature extraction map; the multi-scale feature fusion module comprises a first multi-scale feature fusion unit corresponding to the first feature extraction map, a second multi-scale feature fusion unit corresponding to the second feature extraction map and a third multi-scale feature fusion unit corresponding to the third feature extraction map;

2. The foggy day image identification method of claim 1, wherein the backbone network comprises an initial convolution attention module, a maximum pooling layer, a first backbone block, a second backbone block, a third backbone block, and a fourth backbone block connected in sequence;

3. The foggy day image recognition method of claim 2, wherein the first, second, third, and fourth backbone blocks each comprise a first backbone unit and a plurality of second backbone units, the first backbone unit comprising a first convolution attention module, a first dynamic convolution layer, a first batch normalization layer, a second dynamic convolution layer, a second batch normalization layer, and a first activation function layer; the second backbone unit comprises a second convolution attention module, a third dynamic convolution layer, a third batch normalization layer and a second activation function layer;

4. The foggy day image recognition method of claim 1, wherein the first and second multiple attention units each comprise a parallel channel attention submodule, a regional attention submodule, and a joint convolution layer, the channel attention submodule comprising a global average pooling layer, a first channel full-link layer, a second channel full-link layer, a first channel activation function layer, a third channel full-link layer, and a second channel activation function layer; the region attention submodule comprises a first region convolution layer, a first region activation function layer, a second region convolution layer, a second region activation function layer, a third region convolution layer and a third region activation function layer;

5. The foggy day image recognition method according to claim 1, wherein the first multi-scale feature fusion unit, the second multi-scale feature fusion unit, and the third multi-scale feature fusion unit each comprise: the device comprises a first scale feature extraction subunit, a second scale feature extraction subunit, a third scale feature extraction subunit, a fourth scale feature extraction subunit, a first multi-scale feature fusion layer, a multi-scale convolutional layer and a second multi-scale feature fusion layer;

6. The foggy day image recognition method of claim 1, wherein the first domain adaptive module, the second domain adaptive module, and the third domain adaptive module each comprise a gradient inversion layer, a first domain adaptive convolution layer, a second domain adaptive convolution layer, and a domain classifier connected in sequence.

7. The foggy day image identification method according to claim 1, wherein the dynamic convolution module comprises a first dynamic convolution unit connected with the first multi-scale feature fusion unit, a second dynamic convolution unit connected with the second multi-scale feature fusion unit, a third dynamic convolution unit connected with the third multi-scale feature fusion unit, and a fourth dynamic convolution unit and a fifth dynamic convolution unit which are sequentially connected behind the third dynamic convolution unit; the neck network further comprises a first upsampling layer connected with the third dynamic convolution unit and a second upsampling layer connected with the second dynamic convolution unit;

8. The foggy day image identification method according to claim 7, wherein the first dynamic convolution unit comprises a dynamic global average pooling layer, a first dynamic full-link layer, a first dynamic activation function layer, a second dynamic full-link layer, a second dynamic activation function layer, three weight dynamic convolution layers in parallel, a dynamic joint convolution layer, a dynamic batch normalization layer and a third dynamic activation function layer.

9. A foggy day image recognition device, comprising:

the image acquisition unit is used for acquiring the foggy day image to be identified;

the domain adaptation network comprises a first domain adaptation module corresponding to the first feature extraction map, a second domain adaptation module corresponding to the second feature extraction map, and a third domain adaptation module corresponding to the third feature extraction map; the multi-attention module includes a first multi-attention unit corresponding to the first feature extraction map and a second multi-attention unit corresponding to the second feature extraction map; the multi-scale feature fusion module comprises a first multi-scale feature fusion unit corresponding to the first feature extraction graph, a second multi-scale feature fusion unit corresponding to the second feature extraction graph and a third multi-scale feature fusion unit corresponding to the third feature extraction graph;