CN115861841A - SAR image target detection method combined with lightweight large convolution kernel - Google Patents

SAR image target detection method combined with lightweight large convolution kernel Download PDF

Info

Publication number
CN115861841A
CN115861841A CN202211573253.9A CN202211573253A CN115861841A CN 115861841 A CN115861841 A CN 115861841A CN 202211573253 A CN202211573253 A CN 202211573253A CN 115861841 A CN115861841 A CN 115861841A
Authority
CN
China
Prior art keywords
layer
convolution
sar image
branch
convolution kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211573253.9A
Other languages
Chinese (zh)
Inventor
李钊
孙晓晖
许涛
刘永涛
田西兰
杨雪亚
刘小平
常沛
高晶晶
张玉营
李玉景
朱程涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 38 Research Institute
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN202211573253.9A priority Critical patent/CN115861841A/en
Publication of CN115861841A publication Critical patent/CN115861841A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an SAR image target detection method combined with a lightweight large convolution kernel, and belongs to the technical field of target detection of SAR images. The invention provides a light-weight large convolution kernel layer, which ensures that the parameter number of a model is greatly reduced compared with that of a model using a conventional convolution kernel while continuously expanding a receptive field, thereby not only ensuring the detection precision, but also being convenient to arrange on embedded equipment due to the light-weight characteristic; the strategy designed by the invention can learn more characteristics through more branches by using the multi-branch model during training, and can effectively reduce the time loss of calling the memory caused by the multi-branch by using the single-branch model during testing.

Description

SAR image target detection method combined with lightweight large convolution kernel
Technical Field
The invention relates to the technical field of SAR image target detection, in particular to an SAR image target detection method combining a lightweight large convolution kernel.
Background
China aerospace reconnaissance equipment has the capabilities of multiple sources, multiple bands, multiple modes, multiple applications and high-resolution ground imaging. How to realize the processing and the decoding of the image information in the embedded platform more quickly and effectively becomes a problem to be solved. Although the existing neural network model can screen effective available information from the SAR image, the parameter quantity of the existing neural network model is often large and cannot be efficiently transplanted to an embedded platform; meanwhile, the large number of model parameters can influence the message generation speed, and the high timeliness required by investigation cannot be met.
At present, the lightweight algorithm based on the neural network is mainly divided into a plurality of branches: the method comprises the specific steps that a large model is used for training, and then the small model is enabled to have the characterization capability close to that of the large model through means such as control variables and loss functions; the second method is that through a Neural Architecture Search (NAS) algorithm, a Search space and a strategy are defined, candidate models which meet conditions are found out in the space according to the Search strategy, then evaluation is carried out respectively, and the next round of Search is carried out according to evaluation feedback; and the third method is to design a lightweight model manually, and realize the effect of reducing the model parameters under the condition of ensuring the accuracy rate through the optimization of some modules.
Current knowledge distillation and neural network architecture search present certain problems on embedded devices. Knowledge distillation is difficult to realize, a teacher network needs to be additionally designed, the distillation difficulty is inconsistent aiming at different tasks, and the small distilled model cannot be guaranteed to be usable certainly; the neural network architecture search needs to consume large computing resources, and the designed model is poor in interpretability. Therefore, the model lightweight is a method which is more suitable for loading the neural network model on the embedded equipment under the current environment.
The model lightweight design is always proposed as a preferred lightweight scheme for carrying models on some mobile terminal equipment, in particular, deep separable convolution is taken as a common model lightweight method, the common version of the model lightweight method, namely the MobileNet series, has been developed to the third generation so far, and the performance and the lightweight degree of the model lightweight method are well-praised by the academic world and the industry.
Neural networks have an inherent principle: the larger the receptive field is, the more global information can be acquired, and a series of downstream tasks such as target detection, semantic segmentation, posture recognition and the like are promoted. However, a series of means for expanding the convolution kernel proposed by previous researches, such as deep separable convolution, often use a convolution kernel that is not "large", such as 3*3 and ordinary convolution with a void ratio of 3, and its receptive field is 5.
Based on the reasons, the transplantation of the SAR image target detection model on the embedded equipment is challenged, and therefore, the SAR image target detection method combining with the light-weight large convolution kernel is provided.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: how to solve the problem of light weight of embedded equipment when carrying an SAR image target detection model, reducing the parameter quantity of the model and reasoning time under the condition of ensuring the identification precision, and providing the SAR image target detection method combining a light weight large convolution kernel.
The invention solves the technical problems through the following technical scheme, and the invention comprises the following steps:
s1: data set collection and production
Collecting an SAR image, preprocessing the SAR image to generate an SAR image data set, and dividing the SAR image data set into a training set and a test set;
s2; model training
Training the model by utilizing the SAR image in the training set to obtain a trained SAR image target detection model;
s3: model reasoning
And after the model is trained, inputting the SAR image of the test set into the SAR image target detection model for reasoning to obtain a detection result.
Further, in the step S1, the preprocessing includes quantization, normalization, data amplification and cropping of the SAR image.
Furthermore, in step S2, the SAR image target detection model includes a backbone network, a multi-scale feature interaction structure, and three detection heads, where the backbone network and the multi-scale feature interaction structure are connected, and the multi-scale feature interaction structure is connected with the three detection heads respectively.
Furthermore, the backbone network comprises an initialization layer, three multi-branch large convolution kernel blocks, three conversion layers and a spatial pyramid pooling layer; the multi-scale feature interaction structure comprises two upper sampling layers with attention mechanisms and two lower sampling layers with attention mechanisms, and the detection head comprises a common convolution layer and a dimension conversion module; processing an input SAR image into a feature map with a set size through an initialization layer, sequentially and continuously inputting three groups of multi-branch large convolution kernel blocks and conversion layers, passing through one conversion layer, respectively reducing the length and the width of the feature map by half, multiplying the number of channels by one time, passing through three conversion layers, generating feature maps with three sizes, marking as t1, t2 and t3, inputting the feature map t3 into a space pyramid pooling layer to obtain high-level semantic information s3, splicing the space pyramid pooling layer with two paths of feature maps output by the last conversion layer to form a feature map ts3, splicing the feature map ts3 with an upper sampling layer with an attention mechanism to form a feature map up2, forming the feature map up1 with another upper sampling layer with the attention mechanism, splicing the feature maps up1, t2 and up2 to form a feature map d2 through a lower sampling layer with the attention mechanism, splicing the feature map d2 with the feature map ts3 to form a feature map d3 through another upper sampling layer with the attention mechanism, respectively performing multi-branch feature map detection and conversion on three groups of feature maps by using interaction scale, respectively, and three groups of feature maps, respectively input and input into three groups of common feature maps, and input and conversion modules, and input into a common scale.
Furthermore, the initialization layer comprises two common convolutional layers and two depth separable convolutional layers with the grouping number of two, wherein the first common convolutional layer, the first depth separable convolutional layer, the second common convolutional layer and the second depth separable convolutional layer are connected in sequence.
Furthermore, the multi-branch large convolution kernel block comprises three common convolution layers, two GELU activation and BN layers and a large kernel convolution block, wherein the first common convolution layer, the first GELU activation and BN layer, the large kernel convolution block and the second common convolution layer are sequentially connected to form a main path; the input characteristic diagram is directly superposed on the output end to form a branch; and simultaneously, the input characteristic diagram passes through a common convolution layer, and then the characteristic diagram obtained after the second GELU activation and BN layer is superposed on the output end to form another branch.
Furthermore, the common convolutional layer of the present invention refers to a convolutional layer with a convolutional kernel size of 3*3 or 1*1, and this type of convolutional layer is commonly used in various neural network models and is called a common convolutional layer.
Furthermore, the large-kernel convolution block comprises a depth separable convolution layer, a cavity convolution layer and a common convolution layer which are sequentially connected, the convolution kernel size of the large-kernel convolution block is larger than 5*5, and the convolution layer with a larger receptive field is realized by superposition of feature maps of a plurality of different receptive fields.
Furthermore, the conversion layer adopts a sub-pixel sampling strategy, the sub-pixel sampling process is to split the feature map with the dimension of H, W and C into small blocks consisting of a plurality of grids, the small blocks are spliced into a plurality of sub-feature maps according to the position of each small block, and the channel number of the small blocks is output after dimension reduction through a common convolution layer.
Furthermore, the upper sampling layer with the attention mechanism expands the size of the lower characteristic graph to be twice of the size through neighborhood interpolation, then is spliced with the characteristic graph with the same size, and finally is input into the attention layer to carry out channel level attention weighting; the method comprises the steps that firstly, a common convolutional layer is used for reducing the length and the width by half for the downsampling layer with the attention mechanism, then the downsampling layer is spliced with a characteristic map of the same level, finally, an attention layer is input to carry out channel level attention weighting, and the attention layer realizes attention weighting in a mode of coexistence of channel attention and axial attention.
Further, in step S3, during inference, the multi-branch model in the multi-branch large convolution kernel block is converted into a single-branch model, and the specific conversion mode is as follows: the parameters obtained after training a multi-branch large convolution kernel block formed by a large kernel convolution block serving as a main path, a second GELU activation and BN layer serving as a branch and an input characteristic graph serving as a second branch are reserved, two branches are expanded into convolution kernel parameters equivalent to the receptive field 11 x 11 due to the expandability of the convolution kernels, the sizes of the convolution kernels of the branch where the main path and the second GELU activation and BN layer are located and the branch where the input characteristic graph is located are consistent, and finally the convolution kernels are added in an add mode.
Compared with the prior art, the invention has the following advantages:
1. the target detection algorithm of the SAR image by using the larger convolution kernel (11 x 11) is realized, a light-weight large convolution kernel layer is provided, and the receptive field is expanded continuously, the model parameters are greatly reduced compared with the model using the conventional convolution kernel, so that the detection precision is ensured, and the model is conveniently arranged on the embedded equipment due to the light weight characteristic.
2. The strategy designed by the invention can learn more characteristics through more branches by using the multi-branch model during training, and can effectively reduce the time loss of calling the memory caused by the multi-branch by using the single-branch model during testing.
Drawings
FIG. 1 is an overall architecture diagram of a SAR image target detection model incorporating a lightweight large convolution kernel in an embodiment of the present invention;
FIG. 2a is a schematic structural diagram of an initialization layer in an embodiment of the invention;
FIG. 2b is a diagram illustrating the structure of a multi-branch large convolution kernel block according to an embodiment of the present invention;
FIG. 2c is a diagram illustrating a structure of a large kernel volume block according to an embodiment of the present invention;
FIG. 3a is a schematic flow chart of an implementation of a spatial pyramid pooling layer in an embodiment of the present invention;
FIG. 3b is a schematic diagram of an upsampling layer with an attention mechanism in an embodiment of the present invention;
FIG. 3c is a schematic diagram of a downsampling layer with attention mechanism in an embodiment of the present invention;
FIG. 3d is a schematic structural diagram of an attention layer in an embodiment of the present invention;
FIG. 4 is a flow chart illustrating an implementation of a conversion layer with sub-pixel sampling according to an embodiment of the present invention;
fig. 5 is a schematic flow chart of an implementation of converting a multi-branch module into a single-branch module according to an embodiment of the present invention.
Detailed Description
The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.
The embodiment provides a technical scheme: a SAR image target detection method combining a lightweight large convolution kernel comprises three steps of data collection and sorting, model training and reasoning, and specifically comprises the following steps:
(1) Collection and production of data sets: the SAR image data set is collected by considering both a large target and a small target, and after the SAR image data set is collected, the image is cut into 640 × 640 size according to a series of preprocessing modes (including quantization, data amplification, cutting and the like) so as to adapt to image training;
(2) Training of the model: the cut training set is trained by using the SAR image target detection algorithm combined with the lightweight large convolution kernel, the algorithm uses a model formed by large-scale convolution kernels, and a smart hierarchical design is used, so that the model maintains a large receptive field and reduces parameter quantity as much as possible. Meanwhile, the invention provides that a sub-pixel down-sampling layer and a multi-branch structure are combined during training to ensure that the model can obtain more effective information during training;
(3) Reasoning of the model: the trained model may discard the multi-branch structure and use the single-branch structure when reasoning. The invention uses a skillful means to restore the multi-branch reasoning characteristic diagram to the single branch, ensures that the model learns sufficient characteristics, and simultaneously shortens the time for keeping the current characteristic diagram when the model calls commands such as add (add) and splice (match) because of the reduction of the number of the branches, thereby effectively reducing the reasoning time and ensuring that the model maintains high real-time performance on the embedded equipment.
The preprocessing method in step (1) includes a series of preprocessing methods commonly used in image processing and neural networks, including but not limited to: SAR image quantization, normalization, data amplification and clipping. The data amplification method has various methods, including but not limited to, translation and rotation from left to right and up and down, horizontal turnover from left to right and up to down, gray contrast transformation, affine transformation, random target pasting and the like.
As shown in fig. 1, the SAR image target detection model combined with the lightweight large convolution kernel proposed in step (2) includes the following parts: a backbone network (backbone), a multiscale feature interaction structure (neck) and a detection head (detect head). The main improvement direction of the invention is that a multi-branch network formed by stacking lightweight large convolution kernels is used in the trunk, so that richer receptive fields are ensured; meanwhile, the common means of pooling, large-step convolution and the like of a conversion layer (transition) are replaced by the sub-pixel down-sampling provided by the invention, so that more effective information can be ensured. The backbone network includes an initialization layer (init), a multi-branch large convolution kernel (stage), a transformation layer (transition), and a Spatial Pyramid Pooling (SPP). In the multi-scale feature interaction structure part, the invention designs an up-sampling layer and a down-sampling layer with attention mechanism, so that the model can fully utilize effective information of different scales. Finally, the invention uses the double-branch detection head to perform regression on the category and the coordinate by combining and comparing the category and the coordinate respectively, thereby avoiding the problem of mutual coupling of regression objects caused by a single-branch detection head.
The specific detection flow of the target detection algorithm provided by the invention is as follows:
1. inputting the images uniformly cropped to 640 x 3 into the model;
2. backbone network: after the initialization layer, the feature size is 160 × 32, and then the feature size is sequentially input into the multi-branch large convolution kernel block and the inversion layer, the feature size is respectively reduced by half and the channel number is doubled every time the inversion layer passes. After three times of conversion layers, the sizes of characteristic graphs are respectively 80 × 64,40 × 128 and 20 × 256, and are respectively marked as t1, t2 and t3; and then, inputting the feature maps t3 of 20 × 256 into the spatial pyramid pooling layer to obtain more effective high-level semantic information s3, and then splicing the two feature maps to form the feature map ts3 with the size of 20 × 512.
3. Multi-scale feature interaction structure: ts3 then passes through the upsampling layer with attention (up-attention) to form feature map up2 with dimension 40 x 128, and then passes through the upsampling layer with attention once to form feature map up1 with dimension 80 x 64. Then, after the feature map up1 is spliced with t2 and up2, a feature map d2 is formed through a down-sampling layer (down-attention) with an attention mechanism, and after the feature map d2 is spliced with a feature map ts3, a feature map d3 is formed through a down-sampling layer with an attention mechanism. The feature maps up1, d2 and d3 are respectively detection feature maps with different scales and are used as the output of the multi-scale feature interaction structure;
4. a detection head: the three feature maps of the feature maps up1, d2 and d3 with different scales are respectively input into three detection heads, and the detection heads predict the category and the position through two branches.
As shown in fig. 2a, the initialization layer structure used in step (2) contains four convolutional layers: one layer of normal 3*3 convolutional layer, one layer of normal 1*1 convolutional layer, two layers of depth separable convolutional layer with the group number of 2. After the series of operations, the length and the width of the feature map are reduced by two times respectively.
As shown in fig. 2b, the multi-branch Large Convolution Kernel Block in step (2) includes a head layer 1*1 Convolution layer, a GELU activation and BN (Batch Normalized) layer, and a Large Kernel Convolution Block (LKCB) layer. And simultaneously, the system also comprises two branches, wherein one branch is used for directly superposing the input characteristic diagram at the output end, and the other branch is used for superposing the input characteristic diagram at the output end through a common 1*1 convolution layer and then through a GELU activation layer and a BN layer to obtain a characteristic diagram, so that the multi-branch large convolution kernel block not only contains a high-efficiency characteristic diagram obtained from a very large receptive field, but also keeps a relatively shallow characteristic diagram.
As shown in fig. 2c, the Large Kernel Convolution Block (LKCB) involved in step (2) is a convolution layer in which a large receptive field is realized by superimposing feature maps of a plurality of different receptive fields. In order to realize the convolution layer with the receptive field of 11, the invention uses the following hierarchy superposition scheme: the first layer is a depth separable convolutional layer with a convolutional kernel size of 5, the second layer is a void convolutional layer with a convolutional kernel size of 5 and a void ratio of 3, and the third layer is a common 1*1 convolutional layer. The scheme greatly reduces the parameter number while considering a larger receptive field, and the parameter number is calculated as follows:
in a conventional 11 × 11 convolutional layer, if the number of input channels is M and the number of output channels is N, the parameter number of the single convolutional layer is 11 × M × N, i.e. 121MN.
If, according to the currently popular scheme, multiple layers 3*3 are used in a convolutional cascade of 11 × 11 receptive fields, then 5 layers 3*3 convolutional layer cascades are required, with a parameter of 3 × m × n × 5, i.e., 45MN.
If a void convolution with a convolution kernel size of 5 and a void fraction of 3 is used, then 2 stacks are required and an additional layer of 1*1 convolution is used with a parameter of 3 × m × n × 2+1 × m × n, i.e., 19MN.
If LKCB is proposed according to the present invention, a deep separable convolution layer with convolution kernel size of 5 is required, with a parameter number of 5 × M + M × N; a layer of void convolution layers having a convolution kernel size of 5 and a void ratio of 3, with a parameter number of 3 x m x n; one layer of a conventional 1*1 convolution with a parameter of 1 × m × n. Thus, the total reference number is (25 + 11N) M.
In the model, the minimum value of N is 64, if the calculation is carried out according to the minimum value, 7744M is needed when a common 11 x 11 convolutional layer is used, 2880M is totally used when a 3*3 convolutional layer is used, 1216M is used when a convolutional kernel size is 5 and a void ratio is 3, and 729M is needed when the LKCB provided by the invention is used, the advantages that the solutions are fewer than all the schemes, and the parameter quantity of the LKCB is less along with the continuous increase of N can be further reflected, so that convenience is provided for carrying the SAR image target detection model on the embedded device.
As shown in fig. 3a, the SPP layer involved in step (2) is a typical spatial pyramid pooling layer, and the present invention performs some improvements while preserving pooling, and the input feature map performs pooling operations with pooling scales of 5 × 5,9 × 9, 13 × 13, respectively, and conventional 3*3 convolution operation, then performs stitching, and finally reduces the number of channels to 256 by a conventional 1*1 convolution layer and outputs.
As shown in fig. 4, the conversion layer involved in step (2) aims to reduce the length and width dimensions of the feature map. The invention adopts a sub-pixel sampling strategy, the sub-pixel sampling process is to split a feature map with the scale of H x W C into small blocks consisting of 2*2 grids, 4 sub-feature maps, namely H/2*W/2 x 4c, are spliced according to the position of each small block, and then the channel number of the small blocks is reduced through a layer of common 1*1 rolling layers, and the final output size is H/2*W/2 x 2c.
The structure of the up-sampling layer and the down-sampling layer with the attention mechanism involved in the step (2) is shown in fig. 3b and 3c, the up-sampling layer expands the size of the feature map of the lower layer to twice through neighborhood interpolation, then the feature map is spliced with the feature map with the same size, and finally the feature map is input into the attention layer to carry out channel-level attention weighting; the down-sampling layer firstly uses the common convolution layer with the step length of 2 to reduce the length and the width by half, then is spliced with the characteristic diagram of the same level, and finally is input into the attention layer. With respect to the attention mechanism referred to herein, the attention layer of the present invention employs a combination of channel attention and axial attention as shown in FIG. 3 d. Axial attention carries out axial average pooling and maximum pooling on the input feature map, then splicing is carried out, an axial weight distribution map is obtained after a layer of 7*7 common convolution layer, and then the axial weight distribution map is multiplied with the input feature map to obtain an axial feature map; the channel attention is also characterized in that feature maps are screened by adopting a weight layer after average pooling and maximum pooling, a level attention feature map is learned through two layers of full link layers (MLP), and the learned feature map is multiplied by an input feature map to obtain a feature map in the channel direction; finally, splicing the feature maps in the channel direction and the axial direction, and then using a common 1*1 convolution layer to reduce the dimension to be the same as the dimension of the input feature map to be used as a final output feature map.
And (3) adopting a design idea similar to a YOLO series in the last detection head in the step (2), and outputting three groups of feature maps with different scales by the model, wherein the feature maps are 80 × 256, 40 × 256 and 20 × 256 respectively. The invention designs a double-branch detection head, wherein two branches respectively use 1*1 convolution to divide a feature map of each scale into two branches, dimension transformation is carried out according to 1*1 convolution layers with channel numbers of 3N,12 and 3 (3 represents that each point can regress anchor frames with three sizes, N is a target type, 12=3 + 4 represents xy coordinates of the upper left corner and the lower right corner of a target, and 3=3 +1 represents the intersection ratio of the target and a true value), so that the target types, the coordinates and the intersection ratio represented by three groups of feature maps are predicted.
The method for converting the multi-branch model in the multi-branch large convolution kernel block in step (3) into the single-branch model is shown in fig. 5, and firstly, parameters obtained after training the left middle branch of fig. 5, i.e., the GELU activation + BN branch, and the right 1*1 convolution + BN + GELU layer branch, are retained, and because of the scalability of the convolution kernel, the parameters are expanded to be convolution kernel parameters equivalent to the receptive field 11 × 11 (equivalent to the receptive field of 11 × 11, whether the size is 3*3 convolution and 1*1 convolution overlap), so that the sizes of the convolution kernels of the middle and right branches are consistent, and the convolution kernels can be added in an add manner, which is a multi-branch to single-branch strategy during inference. Note that all parameters of the model are convolution kernel parameters, not bias (bias). The method can effectively retain the characteristics of the trained large convolution kernel model, can efficiently transfer the memory, and shortens the time for transferring the memory to store the current weight when transferring commands such as addition, splicing and the like, thereby accelerating the reasoning time.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A SAR image target detection method combined with a lightweight large convolution kernel is characterized by comprising the following steps:
s1: data set collection and production
Collecting an SAR image, preprocessing the SAR image to generate an SAR image data set, and dividing the SAR image data set into a training set and a test set;
s2; model training
Training the model by utilizing the SAR image in the training set to obtain a trained SAR image target detection model;
s3: model reasoning
And after the model is trained, inputting the SAR image of the test set into the SAR image target detection model for reasoning to obtain a detection result.
2. The SAR image target detection method combined with the lightweight large convolution kernel as recited in claim 1, characterized in that: in the step S1, the preprocessing includes quantization, normalization, data amplification, and clipping of the SAR image.
3. The SAR image target detection method combined with the light-weight large convolution kernel according to claim 1, characterized in that: in the step S2, the SAR image target detection model includes a backbone network, a multi-scale feature interaction structure, and three detection heads, where the backbone network and the multi-scale feature interaction structure are connected, and the multi-scale feature interaction structure is connected with the three detection heads respectively.
4. The SAR image target detection method combined with the light-weight large convolution kernel according to claim 3, characterized in that: the main network comprises an initialization layer, three multi-branch large convolution kernel blocks, three conversion layers and a spatial pyramid pooling layer; the multi-scale feature interaction structure comprises two upper sampling layers with attention mechanisms and two lower sampling layers with attention mechanisms, and the detection head comprises a common convolution layer and a dimension conversion module; processing an input SAR image into a feature map with a set size through an initialization layer, sequentially and continuously inputting three groups of multi-branch large convolution kernel blocks and conversion layers, passing through one conversion layer, respectively reducing the length and the width of the feature map by half, multiplying the number of channels by one time, passing through three conversion layers, generating feature maps with three sizes, marking as t1, t2 and t3, inputting the feature map t3 into a space pyramid pooling layer to obtain high-level semantic information s3, splicing the space pyramid pooling layer with two paths of feature maps output by the last conversion layer to form a feature map ts3, splicing the feature map ts3 with an upper sampling layer with an attention mechanism to form a feature map up2, forming the feature map up1 with another upper sampling layer with the attention mechanism, splicing the feature maps up1, t2 and up2 to form a feature map d2 through a lower sampling layer with the attention mechanism, splicing the feature map d2 with the feature map ts3 to form a feature map d3 through another upper sampling layer with the attention mechanism, respectively performing multi-branch feature map detection and conversion on three groups of feature maps by using interaction scale, respectively, and three groups of feature maps, respectively input and input into three groups of common feature maps, and input and conversion modules, and input into a common scale.
5. The SAR image target detection method combined with the light-weight large convolution kernel according to claim 4, characterized in that: the initialization layer comprises two common convolution layers and two depth separable convolution layers with the grouping number being two, wherein the first common convolution layer, the first depth separable convolution layer, the second common convolution layer and the second depth separable convolution layer are connected in sequence.
6. The SAR image target detection method combined with the light-weight large convolution kernel according to claim 5, characterized in that: the multi-branch large convolution kernel block comprises three common convolution layers, two GELU activation and BN layers and a large kernel convolution block, wherein the first common convolution layer, the first GELU activation and BN layer, the large kernel convolution block and the second common convolution layer are sequentially connected to form a main path; the input characteristic diagram is directly superposed on the output end to form a branch; and simultaneously, the input characteristic diagram passes through a third common convolution layer, and is superposed on the characteristic diagram obtained after the second GELU activation and BN layer to form another branch circuit at the output end.
7. The SAR image target detection method combined with the lightweight large convolution kernel as recited in claim 6, characterized in that: the large-kernel convolution block comprises a depth separable convolution layer, a cavity convolution layer and a common convolution layer which are sequentially connected, the convolution kernel size of the large-kernel convolution block is larger than 5*5, and the convolution layer with a larger receptive field is realized by superposition of feature maps of a plurality of different receptive fields.
8. The SAR image target detection method combined with the light-weight large convolution kernel according to claim 7, characterized in that: the conversion layer adopts a sub-pixel sampling strategy, the sub-pixel sampling process is to split the feature graph with the dimension of H x W x C into small blocks consisting of a plurality of grids, the small blocks are spliced into a plurality of sub-feature graphs according to the position of each small block, and then the channel number of the small blocks is output after dimension reduction through a common convolution layer.
9. The SAR image target detection method combined with the light-weight large convolution kernel according to claim 8, characterized in that: the upper sampling layer with the attention mechanism expands the size of the lower characteristic graph to double the size of the lower characteristic graph through neighborhood interpolation, then splices the lower characteristic graph with the characteristic graphs with the same size, and finally inputs the lower characteristic graph into the attention layer to carry out channel level attention weighting; the method comprises the steps that firstly, a common convolutional layer is used for reducing the length and the width by half for the downsampling layer with the attention mechanism, then the downsampling layer is spliced with a characteristic map of the same level, finally, an attention layer is input to carry out channel level attention weighting, and the attention layer realizes attention weighting in a mode of coexistence of channel attention and axial attention.
10. The method for detecting the SAR image target combined with the light-weight large convolution kernel as claimed in claim 1 or 9, characterized in that: training by using a multi-branch large convolution kernel block model in the training process of the step S2; in step S3, during inference, the multi-branch model in the multi-branch large convolution kernel block is converted into a single-branch model, and the specific conversion mode is as follows: the parameters obtained after training a multi-branch large convolution kernel block formed by a large kernel convolution block serving as a main path, a second GELU activation and BN layer serving as a branch and an input characteristic graph serving as a second branch are reserved, two branches are expanded into convolution kernel parameters equivalent to the receptive field 11 x 11 due to the expandability of the convolution kernels, the sizes of the convolution kernels of the branch where the main path and the second GELU activation and BN layer are located and the branch where the input characteristic graph is located are consistent, and finally the convolution kernels are added in an add mode.
CN202211573253.9A 2022-12-08 2022-12-08 SAR image target detection method combined with lightweight large convolution kernel Pending CN115861841A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211573253.9A CN115861841A (en) 2022-12-08 2022-12-08 SAR image target detection method combined with lightweight large convolution kernel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211573253.9A CN115861841A (en) 2022-12-08 2022-12-08 SAR image target detection method combined with lightweight large convolution kernel

Publications (1)

Publication Number Publication Date
CN115861841A true CN115861841A (en) 2023-03-28

Family

ID=85671216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211573253.9A Pending CN115861841A (en) 2022-12-08 2022-12-08 SAR image target detection method combined with lightweight large convolution kernel

Country Status (1)

Country Link
CN (1) CN115861841A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036937A (en) * 2023-07-21 2023-11-10 山东省计算中心(国家超级计算济南中心) Blind road direction identification and flaw detection method based on Internet of things and deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036937A (en) * 2023-07-21 2023-11-10 山东省计算中心(国家超级计算济南中心) Blind road direction identification and flaw detection method based on Internet of things and deep learning
CN117036937B (en) * 2023-07-21 2024-01-26 山东省计算中心(国家超级计算济南中心) Blind road direction identification and flaw detection method based on Internet of things and deep learning

Similar Documents

Publication Publication Date Title
Yu et al. MSTNet: A multilevel spectral–spatial transformer network for hyperspectral image classification
CN113409191B (en) Lightweight image super-resolution method and system based on attention feedback mechanism
CN109472298A (en) Depth binary feature pyramid for the detection of small scaled target enhances network
CN114565860B (en) Multi-dimensional reinforcement learning synthetic aperture radar image target detection method
CN111144329A (en) Light-weight rapid crowd counting method based on multiple labels
CN113344188A (en) Lightweight neural network model based on channel attention module
CN112132844A (en) Recursive non-local self-attention image segmentation method based on lightweight
CN111832453A (en) Unmanned scene real-time semantic segmentation method based on double-path deep neural network
CN113298235A (en) Neural network architecture of multi-branch depth self-attention transformation network and implementation method
CN110782430A (en) Small target detection method and device, electronic equipment and storage medium
CN116740527A (en) Remote sensing image change detection method combining U-shaped network and self-attention mechanism
CN113989169A (en) Expansion convolution accelerated calculation method and device
CN115861841A (en) SAR image target detection method combined with lightweight large convolution kernel
CN116740516A (en) Target detection method and system based on multi-scale fusion feature extraction
CN116090517A (en) Model training method, object detection device, and readable storage medium
Ye et al. Light-YOLOv5: A lightweight algorithm for improved YOLOv5 in PCB defect detection
CN113609904B (en) Single-target tracking algorithm based on dynamic global information modeling and twin network
CN117372777A (en) Compact shelf channel foreign matter detection method based on DER incremental learning
CN117218643A (en) Fruit identification method based on lightweight neural network
CN116468902A (en) Image processing method, device and non-volatile computer readable storage medium
CN115331261A (en) Mobile terminal real-time human body detection method and system based on YOLOv6
CN114821224A (en) Method and system for amplifying railway image style conversion data
Zhao et al. Lightweight anchor-free one-level feature indoor personnel detection method based on transformer
Li et al. Underwater object detection based on improved SSD with convolutional block attention
Hou et al. A single-stage multi-class object detection method for remote sensing images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination