CN114708511B - Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement - Google Patents

Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement Download PDF

Info

Publication number
CN114708511B
CN114708511B CN202210614648.2A CN202210614648A CN114708511B CN 114708511 B CN114708511 B CN 114708511B CN 202210614648 A CN202210614648 A CN 202210614648A CN 114708511 B CN114708511 B CN 114708511B
Authority
CN
China
Prior art keywords
feature
fusion
module
features
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210614648.2A
Other languages
Chinese (zh)
Other versions
CN114708511A (en
Inventor
符颖
王坤
文武
吴锡
周激流
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN202210614648.2A priority Critical patent/CN114708511B/en
Publication of CN114708511A publication Critical patent/CN114708511A/en
Application granted granted Critical
Publication of CN114708511B publication Critical patent/CN114708511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a remote sensing image target detection method based on multi-scale feature fusion and feature enhancement, which adopts a self-adaptive multi-scale feature fusion module to perform feature fusion, simultaneously adopts more transverse connections in the fusion process, increases the communication between adjacent features, fully utilizes the extracted multi-scale features, enriches feature information, simultaneously increases jump connection, enables original features to participate in the fusion process, and improves the multi-scale feature expression capability of a network. The multi-branch cavity convolutions with different expansion rates in the attention characteristic enhancing module are used for obtaining the receptive fields with different sizes, when objects with different sizes exist in the remote sensing image, the characteristics of the targets with different scales can be extracted simultaneously, the generalization capability of the network to the target scale is improved, and the characteristic information of the targets is enhanced while background and noise information is weakened by adopting the mixed attention mechanism module.

Description

Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement
Technical Field
The invention relates to the field of remote sensing image processing, in particular to a remote sensing image target detection method based on multi-scale feature fusion and feature enhancement.
Background
With the increasing maturity of remote sensing technology, many satellites and aviation sensors can obtain remote sensing images with higher resolution, can provide visual and clear earth surface information, and have great significance for earth surface observation. The remote sensing image target detection is used as an important task in the field of remote sensing image interpretation analysis, the position and the category of an interested target can be positioned and marked from a wide visual field range, and the method is an important basis for application such as city planning, land utilization, traffic dispersion, military monitoring and the like. In recent years, with the rapid development of deep learning, a neural network obtains features with stronger semantic representation capability by performing multilayer convolution operation on an image, so that the target detection performance is further improved.
The target detection method based on deep learning has two classification standards, and is divided into two-stage detection and single-stage detection according to whether region-of-interest extraction is required. According to whether the Anchor frame needs to be preset or not, the detection based on the Anchor frame and the detection based on the key point are divided, and the detection based on the key point is also called Anchor free detection. The double-stage target detection is divided into two stages to complete the whole detection process. The regions of interest are first extracted, followed by further detection and identification for each region. The two-stage target detection method achieves higher precision, however, because the region of interest needs to be extracted first and each region needs to be classified and regressed respectively, extra calculation amount is increased, the speed is not fast enough, and the method is difficult to apply to a system with higher real-time requirement. The single-stage target detection finishes the whole detection process in one stage, has high target detection speed, basically meets the requirements of a real-time system, but has slightly lower detection precision than a multi-stage target detection mode. Most detection methods need to extract an anchor point frame, and further fine adjustment is performed by taking the anchor point frame as an initial detection frame. And adjusting the position, shape and size of the anchor point frame by regressing the position offset of the central points of the real frame and the anchor point frame and corresponding to the wide and high scaling ratios so as to enable the anchor point frame to gradually coincide with the real frame. The advantage of detecting based on the anchor frame is that the network output values are relative values based on the anchor frame, the range of the value range is small, the training is easy, and the convergence speed is high. However, the detection process needs to be matched based on the anchor point frame, the design of the anchor point frame needs to perform a great amount of manual intervention aiming at different tasks, certain priori knowledge is needed aiming at specific tasks, and the parameter adjusting process is complicated; some objects with rare aspect ratios are difficult to match, thereby resulting in missed detection; meanwhile, a large number of anchor frames also cause problems of large memory occupation, high time complexity and the like. For the problems brought by the detection of the anchor point frame, a detection mode based on the key points is popular gradually in recent times. The method directly classifies and regresses the target based on the pixel level, avoids the introduction of an anchor point frame, and alleviates a series of problems caused by the anchor point frame. The detection method based on the key points avoids a manual and tedious anchor point design process, avoids matching processes such as overlap ratio (IoU) calculation and the like in the detection process, reduces the operation amount, obtains higher precision, and becomes a hotspot of current research at present.
1. After multiple pooling operations, information loss of small-scale targets is serious, the detection effect of the small targets needs to be improved, the number of the small-size targets in the remote sensing image is large, such as airplanes, automobiles, ships and the like, and the problem of difficulty in detecting the small targets exists.
2. When objects with different sizes exist in the image, the detection effect is poor. The target scale in the remote sensing image is changed greatly, the size difference of different types of targets or the same type of targets collected under different resolutions is very different, the size of a receptive field is limited by using conventional convolution, and the effect is poor when the detected target scale is changed greatly.
3. When the target is densely distributed in the image, the detection precision is reduced, the target of interest in the remote sensing image is densely distributed, such as vehicles and ships, and the like, and the situations of close arrangement and disordered distribution may exist, so that a plurality of target examples may exist in the same region of interest, and background noise is easily introduced, so that the detection precision is reduced.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a remote sensing image target detection method based on multi-scale feature fusion and feature enhancement, which comprises the following steps of realizing top-down feature fusion of features with different resolutions by a self-adaptive multi-scale feature fusion module, simultaneously adopting transverse connection to increase the communication between adjacent features, then improving the generalization capability of a network to a target scale and enhancing effective feature information by combining a multi-branch cavity convolution and attention mechanism through an attention feature enhancement module, and improving the target detection capability, wherein the method specifically comprises the following steps:
step 1, extracting input remote sensing image characteristics, inputting the remote sensing image into a main network by adopting a ResNet network, and outputting multi-scale characteristic graph groups with different resolutions on the last four layers of the ResNet network through multi-group convolution and pooling operations
Figure DEST_PATH_IMAGE001
Step 2: adjusting the number of characteristic channels, and combining the multi-scale characteristic graphs
Figure DEST_PATH_IMAGE002
Are respectively carried out once
Figure DEST_PATH_IMAGE003
The number of channels of the multi-scale feature map is adjusted to be equal to the shallowest feature map
Figure DEST_PATH_IMAGE004
Obtaining a feature map group by coincidence
Figure DEST_PATH_IMAGE005
And step 3: adopting a self-adaptive multi-scale feature fusion module to perform pair on the feature map group obtained in the step 2
Figure 796353DEST_PATH_IMAGE005
Performing feature fusion, including a first top-down fusion stage, a bottom-up fusion stage, and a second top-down fusion stage, specifically:
step 31: the first top-down fusion stage, which introduces transverse connection in the fusion process, and gradually fuses from the deepest characteristic diagram
Figure DEST_PATH_IMAGE006
Is started by mixing
Figure 202842DEST_PATH_IMAGE006
After upsampling and
Figure DEST_PATH_IMAGE007
are fused to obtain
Figure DEST_PATH_IMAGE008
Figure 234777DEST_PATH_IMAGE007
After upsampling and
Figure DEST_PATH_IMAGE009
are fused to obtain
Figure DEST_PATH_IMAGE010
Figure 736645DEST_PATH_IMAGE009
After upsampling and
Figure DEST_PATH_IMAGE011
are fused to obtain
Figure DEST_PATH_IMAGE012
Completing the first forward propagation; then will be
Figure 611122DEST_PATH_IMAGE008
After upsampling and
Figure 418539DEST_PATH_IMAGE010
are fused to obtain
Figure DEST_PATH_IMAGE013
Will be
Figure 43162DEST_PATH_IMAGE010
After upsampling and
Figure 758175DEST_PATH_IMAGE012
are fused to obtain
Figure DEST_PATH_IMAGE014
Completing the second forward propagation; finally will be
Figure DEST_PATH_IMAGE015
After upsampling and
Figure 409079DEST_PATH_IMAGE014
are fused to obtain
Figure DEST_PATH_IMAGE016
Completing the third forward propagation to finally obtain a feature map group
Figure DEST_PATH_IMAGE017
Step 32: a bottom-up fusion stage by matching the set of feature maps obtained in step 31 from the shallowest features
Figure 169487DEST_PATH_IMAGE016
At the beginning, will
Figure 428211DEST_PATH_IMAGE016
After double down-sampling and
Figure 415757DEST_PATH_IMAGE009
and
Figure 431818DEST_PATH_IMAGE015
are fused to obtain
Figure DEST_PATH_IMAGE018
Then will be
Figure 210066DEST_PATH_IMAGE018
After double down-sampling and
Figure 678788DEST_PATH_IMAGE007
and
Figure 63500DEST_PATH_IMAGE008
are fused to obtain
Figure DEST_PATH_IMAGE019
And finally will
Figure 704042DEST_PATH_IMAGE019
After double down-sampling and
Figure 102242DEST_PATH_IMAGE006
are fused to obtain
Figure DEST_PATH_IMAGE020
Finally, a feature map set is obtained
Figure DEST_PATH_IMAGE021
Step 33: in the second top-down fusion stage, the feature map set obtained in step 32 is fused from the deepest features
Figure 672243DEST_PATH_IMAGE020
Sequentially up-sampling and adding layer by layer to obtain a high-resolution first characteristic diagram with the size of P/4
Figure DEST_PATH_IMAGE022
And 4, step 4: the first characteristic diagram obtained in the step 33
Figure 166372DEST_PATH_IMAGE022
Inputting the feature into an attention feature enhancement module for feature enhancement, wherein the attention feature enhancement module comprises a multi-branch hole convolution module andthe mixed attention mechanism module and each branch of the multi-branch cavity convolution module are provided with different expansion rates, and the first characteristic diagram is obtained
Figure 891446DEST_PATH_IMAGE022
Fusing the features after convolution with different expansion rates to obtain a second feature map
Figure DEST_PATH_IMAGE023
And 5: the second characteristic diagram
Figure 277076DEST_PATH_IMAGE023
Suppressing background and noise by inputting to a mixed attention module including a channel domain attention module and a spatial domain attention module, a second feature map
Figure 720390DEST_PATH_IMAGE023
Obtaining a third feature map after processing by the channel domain attention module and the spatial domain attention module
Figure DEST_PATH_IMAGE024
Step 6: obtaining the final detection result through classification and regression, and outputting the third feature map output in the step 5
Figure 357824DEST_PATH_IMAGE024
And obtaining a central point prediction result, a central point offset prediction result and a target width and height prediction result after three convolution branches of 3x3, and obtaining a final prediction result by fusing the three prediction results.
According to a preferred embodiment, the hybrid attention mechanism module obtains a third characteristic map
Figure 294994DEST_PATH_IMAGE024
The specific process comprises the following steps:
step 51: the second characteristic diagram obtained in the step 4
Figure 222497DEST_PATH_IMAGE023
Inputting a channel domain attention module, firstly adding all characteristic values of each channel through global average pooling GAP, then averaging, converting a two-dimensional characteristic diagram into a real number to obtain a vector of C '1', C represents the number of channels, simultaneously using the global average pooling GAP and the global maximum pooling GMP along the channel dimension, respectively sending the vector into 2 full-connection layers for training and learning to obtain 2 one-dimensional channel weight sequences, adding 2 groups of channel weight sequences, mapping the added channel weight sequences to [0,1 ] after passing through a Sigmoid activation function, and mapping the channel weight sequences to [0,1 ] after passing through a Sigmoid activation function]Finally, 1 group of weight sequences are obtained and are compared with the second characteristic diagram
Figure 899247DEST_PATH_IMAGE023
Carrying out feature weighting to obtain an intermediate feature map
Figure DEST_PATH_IMAGE025
Completing the channel domain attention operation;
step 52: drawing the intermediate features
Figure 328653DEST_PATH_IMAGE025
Respectively obtaining feature graphs of 2 single channels through global average pooling GAP and global maximum pooling GMP, connecting the feature graphs of the 2 single channels according to channel dimensionality, performing convolution operation to obtain a spatial domain attention feature graph, and mapping to [0,1 ] after passing through a Sigmoid activation function]Obtaining the attention weight of the space domain, and combining the attention weight with the intermediate feature map
Figure 293899DEST_PATH_IMAGE025
Multiplying to obtain the final third characteristic diagram
Figure 177322DEST_PATH_IMAGE024
According to a preferred embodiment, the feature fusion module in step 3 further considers that the contribution degrees of the features with different resolutions to the fused features are different, increases a learnable weight coefficient, and realizes the effect of adaptive fusion, thereby improving the ratio invariance of the features, and specifically realizing the following steps:
firstly, the resolution ratios of the multi-scale features to be fused are adjusted to be consistent, and the adjustment means is as follows:
(1) in the first top-down stage, the deep features are subjected to two-time up-sampling by a nearest neighbor interpolation method;
(2) in the stage from bottom to top, the shallow layer features are sampled twice by maximum pooling, the adjusted features are multiplied by their corresponding weight coefficients, added element by element, and finally fused by Swish activating function, convolution and batch normalization.
Compared with the prior art, the invention has the beneficial effects that:
1. the self-adaptive multi-scale feature fusion module AMFF adopts more transverse connections during feature fusion, increases communication between adjacent features, fully utilizes extracted multi-scale features, enriches feature information, increases jump connection, allows original features to participate in a fusion process, and avoids information loss caused by repeated up-sampling and down-sampling, thereby improving the multi-scale feature expression capability of a network.
2. In the invention, the contribution degree of different resolution characteristics to the characteristics after fusion is considered to be different during fusion, and the learnable weight coefficient is introduced to realize the effect of self-adaptive fusion, thereby improving the proportion invariance of the characteristics.
3. The attention characteristic enhancement module AFE adopts multi-branch cavity convolution with different expansion rates to obtain the receptive fields with different sizes, and when objects with different sizes exist in the remote sensing image, the characteristics of the targets with different scales can be extracted simultaneously, so that the generalization capability of the network to the target scales is improved.
4. Aiming at noise introduced by feature fusion and multi-branch hole convolution, feature information of a target is enhanced while background and noise information is weakened by a channel domain attention and space domain attention module.
Drawings
FIG. 1 is a schematic diagram of the structure of a remote sensing image target detection network according to the present invention;
FIG. 2 is a schematic structural diagram of an adaptive multi-scale feature fusion module AMFF according to the present invention;
FIG. 3 is a flow diagram of the adaptive feature fusion process of the present invention;
FIG. 4 is a schematic diagram of the convolution structure of multi-branched holes with different expansion ratios according to the present invention;
FIG. 5 is a schematic diagram of an AFE of the attentive feature enhancement module of the present invention;
FIG. 6 is a schematic structural diagram of a CBAM of the hybrid attention mechanism of the present invention, FIG. 6a is a schematic structural diagram of a channel domain attention module, and FIG. 6b is a schematic structural diagram of a spatial domain attention module;
FIG. 7 is a graph comparing the results of experiments performed on DIOR data sets by the method of the present invention, FIG. 7a is the results of a CenterNet test example, and FIG. 7b is the results of a test example by the method of the present invention;
FIG. 8 is a graph comparing the results of experiments performed by the method of the present invention on a NWPUWHR-10 dataset, FIG. 8a is the results of a CenterNet test example, and FIG. 8b is the results of a test example of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings in combination with the embodiments. It is to be understood that these descriptions are only illustrative and are not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The following detailed description is made with reference to the accompanying drawings.
AMFF in the invention represents an adaptive multi-scale feature fusion module.
The AFE in the present invention represents an attention feature enhancement module.
Some symbols in the drawings attached to the specification are described below: conv denotes normal convolution, 3x3 denotes the convolution kernel size is 3x 3; depthchonv denotes depth separable convolution; convolution with 3x3 and r =12 indicates a hole convolution with a convolution kernel size of 3x3 and an expansion rate of 12; BN is batch normalization; wish, ReLU, Sigmoid denote activation functions.
Aiming at some problems in the prior art, the invention provides a remote sensing image target detection method based on multi-scale feature fusion and feature enhancement, and the structural schematic diagram of a detection network is shown in figure 1. Through self-adaptation multiscale feature fusion module, when realizing top-down feature fusion to the feature of different resolutions, still adopt more transverse connection, increase the interchange between the adjacent feature, then through attention feature reinforcing module, adopt the multi-branch cavity convolution of different expansion rates in order to obtain different receptive fields, improve the ability that the target detected, specifically as follows:
step 1: extracting the characteristics of the input remote sensing image, inputting the remote sensing image into the main network by adopting a ResNet network through the main network, and outputting the remote sensing image in the last four layers of the ResNet network to obtain multi-scale characteristic graph groups with different resolutions through operations such as multiple groups of convolution, pooling and the like
Figure 353087DEST_PATH_IMAGE001
(ii) a Wherein, the multi-scale feature map group
Figure DEST_PATH_IMAGE026
Are one quarter, one eighth, one sixteenth and one thirty half of the input remote sensing image, respectively. P in fig. 1 refers to the size of the input remote sensing image.
Step 2: because the number of the characteristic channels extracted by the backbone network is too many and contains a lot of redundant information, the number of the characteristic channels is adjusted, and the multi-scale characteristic graph group is formed
Figure 257738DEST_PATH_IMAGE002
Respectively carrying out convolution operation of 1x1 once to adjust the channel number of the multi-scale feature map to the shallowest feature map
Figure 558050DEST_PATH_IMAGE004
Obtaining a feature map group by coincidence
Figure 69498DEST_PATH_IMAGE005
And step 3: adopting a self-adaptive multi-scale feature fusion module to perform pair on the feature map group obtained in the step 2
Figure 732559DEST_PATH_IMAGE005
And performing feature fusion, including a first top-down fusion stage, a bottom-up fusion stage and a second top-down fusion stage, and fig. 2 is a schematic structural diagram of the adaptive multi-scale feature fusion module AMFF of the present invention. The method specifically comprises the following steps:
step 31: the first top-down fusion stage, which introduces transverse connection in the fusion process, and gradually fuses from the deepest characteristic diagram
Figure 253233DEST_PATH_IMAGE006
Is started by
Figure 204790DEST_PATH_IMAGE006
After upsampling and
Figure 504051DEST_PATH_IMAGE007
are fused to obtain
Figure 37495DEST_PATH_IMAGE008
Figure 713194DEST_PATH_IMAGE007
After upsampling and
Figure 988099DEST_PATH_IMAGE009
are fused to obtain
Figure DEST_PATH_IMAGE027
Figure 623041DEST_PATH_IMAGE009
After upsampling and
Figure 502836DEST_PATH_IMAGE011
are fused to obtain
Figure DEST_PATH_IMAGE028
Completing the first forward propagation; then will be
Figure 486621DEST_PATH_IMAGE008
After upsampling and
Figure 869892DEST_PATH_IMAGE027
are fused to obtain
Figure 183060DEST_PATH_IMAGE015
Will be
Figure 30711DEST_PATH_IMAGE027
After upsampling and
Figure 696879DEST_PATH_IMAGE028
are fused to obtain
Figure 692516DEST_PATH_IMAGE014
Completing the second forward propagation; finally will be
Figure 153147DEST_PATH_IMAGE015
After upsampling and
Figure 114193DEST_PATH_IMAGE014
are fused to obtain
Figure 888831DEST_PATH_IMAGE016
Completing the third forward propagation to finally obtain a feature map group
Figure 981116DEST_PATH_IMAGE017
In the first top-down fusion stage, after double upsampling of deeper features, the deeper features are fused with the features with the same resolution, so that the communication between adjacent features is increased, the information of the features is retained, the semantic information of deep features is effectively introduced into shallow features, and the semantic information of shallow high-resolution features is increased.
Step 32: a bottom-up fusion stage by matching the set of feature maps obtained in step 31 from the shallowest features
Figure 636086DEST_PATH_IMAGE016
At the beginning, will
Figure 458330DEST_PATH_IMAGE016
After two times down-sampling with
Figure 794197DEST_PATH_IMAGE009
And
Figure 702108DEST_PATH_IMAGE015
are fused to obtain
Figure 363597DEST_PATH_IMAGE018
Then will be
Figure 673137DEST_PATH_IMAGE018
After double down-sampling and
Figure 180340DEST_PATH_IMAGE007
and
Figure 512795DEST_PATH_IMAGE008
are fused to obtain
Figure 775148DEST_PATH_IMAGE019
And finally will
Figure 306404DEST_PATH_IMAGE019
After double down-sampling and
Figure 656178DEST_PATH_IMAGE006
are fused to obtain
Figure 866577DEST_PATH_IMAGE020
Finally, a characteristic diagram is obtained
Figure 745235DEST_PATH_IMAGE021
In order to avoid the target information loss caused by frequent up-sampling times in the top-down path in the step 31, the network introduces the bottom-up path and adds jump connection to enable the original input feature map group
Figure 294946DEST_PATH_IMAGE005
Participate in the process of layer-by-layer fusion to obtain
Figure 737427DEST_PATH_IMAGE021
Step 33: the second top-down fusion stage, which is to fuse the results obtained in step 32 from the deepest features
Figure 778896DEST_PATH_IMAGE020
Starting up to sample sequentially and add layer by layer to obtain a high-resolution first feature map with the size of P/4
Figure 383051DEST_PATH_IMAGE022
In the prediction of the network, the higher resolution feature map is usually used to improve the detection performance of the small-sized target, so that a top-down path is added to the result obtained in step 32 from the deepest feature
Figure 888899DEST_PATH_IMAGE020
Starting up to sample sequentially and add layer by layer to obtain a high-resolution first feature map with the size of P/4
Figure 49317DEST_PATH_IMAGE022
The semantic information is richer while the details and the position information of the small target are kept, so that the detection precision of the small target is improved. P/4 represents a quarter of the input image size.
The multi-scale feature fusion module also considers that different resolution features have different contribution degrees to the fused features, increases learnable weight coefficients, and realizes the effect of self-adaptive fusion, thereby improving the proportion invariance of the features, and the specific implementation process is as follows:
the fusion strategy of the invention is to adjust the resolution of the multi-scale features to be fused to be consistent, and the adjustment means is as follows: (1) in the first top-down fusion stage, two times of up-sampling is carried out on deep features by a nearest neighbor interpolation method; (2) in the bottom-up fusion stage, shallow features are downsampled twice through maximum pooling. Then, the adjusted features are multiplied by the corresponding weight coefficients respectively, and then are added element by element, and finally the features are fused through a Swish activation function, convolution and batch normalization.
The mathematical expression (1) is a brief representation of the node to be fused in the fusion process, wherein
Figure DEST_PATH_IMAGE029
Representing the feature nodes that are currently undergoing fusion,
Figure DEST_PATH_IMAGE030
the representation points to the current node
Figure DEST_PATH_IMAGE031
A certain feature node of the plurality of feature nodes,
Figure DEST_PATH_IMAGE032
for the node corresponding to
Figure 912989DEST_PATH_IMAGE029
The weight coefficient of the node is determined,
Figure DEST_PATH_IMAGE033
indicating a point
Figure 167339DEST_PATH_IMAGE029
The total number of nodes.
Figure DEST_PATH_IMAGE034
(1)
Figure 917076DEST_PATH_IMAGE032
Is calculated as in equation (2) and is defined as
Figure 789174DEST_PATH_IMAGE030
In all directions
Figure 454771DEST_PATH_IMAGE029
Is/are as follows
Figure 426226DEST_PATH_IMAGE033
The ratio of each node is 0-1
Figure DEST_PATH_IMAGE035
=0.0001 is an extremely small value set to prevent the learned weight coefficient from being unstable. And, in calculating
Figure 734365DEST_PATH_IMAGE032
Before Relu activating function processing is adopted to ensure
Figure 298902DEST_PATH_IMAGE032
Not negative.
Figure DEST_PATH_IMAGE036
(2)
To be provided with
Figure 721135DEST_PATH_IMAGE022
The node is taken as an example, the specific mode is shown as formula (3), and the fusion process is shown as fig. 3.
Figure DEST_PATH_IMAGE037
(3)
Wherein UpSAMle represents that the nearest linear interpolation method is adopted to carry out two-time up-sampling, so that the resolution of the features to be fused is kept consistent,
Figure DEST_PATH_IMAGE038
and
Figure DEST_PATH_IMAGE039
is the weight coefficient of the node. Depth separable Convolution (DepthwiseSeparateable Convolition) is employed in the AMFF module. Unlike conventional convolution, the depth separable convolution first performs the convolution operation on the independent channel layers, and then uses 1 × 1 to perform convolutionThe convolution operation is completed by the line depth expansion, and the method can effectively reduce the time consumption and parameter quantity of convolution calculation and improve the network detection efficiency.
And 4, step 4: the first characteristic diagram obtained in the step 3 is processed
Figure 6421DEST_PATH_IMAGE022
Inputting the feature to an attention feature enhancing module for feature enhancement, wherein the attention feature enhancing module comprises a multi-branch cavity convolution module and a mixed attention mechanism module, each branch of the multi-branch cavity convolution module is provided with different expansion rates, and a first feature map is obtained
Figure 73691DEST_PATH_IMAGE022
Fusing the features after convolution with different expansion rates to obtain a second feature map
Figure 668618DEST_PATH_IMAGE023
And (3) convolving the multi-branch holes with different expansion rates to acquire the receptive fields with different sizes and capture multi-scale context information. Specifically, the expansion ratios r used were 12, 24, and 36, respectively.
In the convolutional neural network, the fixed-size receptive field is not favorable for detecting objects with different sizes, and is particularly not friendly to a target with a large amount of drastic scale change in a remote sensing image. In the semantic segmentation task, a hole convolution is usually used to better classify each pixel point in an image from expanding a receptive field, so as to realize accurate segmentation, for example, a hole space pyramid pooling (ASPP) method in deep lab, but since a target scale in a remote sensing image changes more severely than a natural image, the ASPP method is not directly applicable to target detection of the remote sensing image. The invention is inspired by the ASPP method, and the cavity convolution branches with larger span and larger expansion rate are adopted to obtain the receptive fields with different sizes, so that the detection performance of the network on targets with different scales is improved, and fig. 4 is a structural schematic diagram of the multi-branch cavity convolution with different expansion rates, and the receptive fields of the cavity convolution with different expansion rates are shown in the figure.
Meanwhile, the remote sensing image has a complex scene, and the phenomena of target dense arrangement and missing detection and false detection can occur; and a great deal of noise is introduced after the feature fusion process of the step 3 and the multi-branch hole convolution of the step 4. In order to solve the problem, the invention uses a mixed attention mechanism module CBAM of a space domain and a channel domain to suppress background and noise after the multi-branch hole convolution, wherein the attention of the channel domain can make a network focus on a feature map of an effective channel, the attention of the space domain can focus more on a position which is helpful to a task, and the serial combination of the space domain and the channel domain can highlight effective information enhancement features.
And 5: the second characteristic diagram
Figure 936570DEST_PATH_IMAGE023
The mixed attention mechanism module is input to suppress background and noise, the mixed attention mechanism module comprises a channel domain attention module and a space domain attention module, the structure of the mixed attention mechanism module is shown in fig. 6, fig. 6a is a structural schematic diagram of the channel domain attention module, and fig. 6b is a structural schematic diagram of the space domain attention module. Where C denotes the number of channels, W, H denotes the width and height of the characteristic diagram, and FC1 and FC2 are full-link layers.
Step 51: the second characteristic diagram obtained in the step 4
Figure 669734DEST_PATH_IMAGE023
And inputting a channel domain attention module, firstly adding all characteristic values of each channel through the global average pooling GAP, then averaging, and converting the two-dimensional characteristic graph into a real number to obtain a vector of Cx1x 1.
Since global max pooling GMP (GMP) is also beneficial for the screening of valid feature information, GAP and GMP are used simultaneously along the channel dimension; respectively sending the training data to 2 full-connection layers for training and learning to obtain 2 one-dimensional channel weight sequences; adding 2 groups of channel weight sequences, mapping to [0,1 ] after Sigmoid activation function]Finally, 1 group of weight sequences is obtained; it is mixed with
Figure 494075DEST_PATH_IMAGE023
Carrying out feature weighting to obtain an intermediate feature map
Figure 511667DEST_PATH_IMAGE025
And completing the channel domain attention operation.
Specifically, the second characteristic diagram obtained in the step 4 is used
Figure 194977DEST_PATH_IMAGE023
Input channel Domain attention Module, second Profile
Figure 247113DEST_PATH_IMAGE023
Obtaining two vectors with the size of 64 multiplied by 1 through two parallel maximum value pooling and average pooling, then respectively inputting the two vectors into a shared full-connection layer, compressing the number of channels to 4 by the first full-connection layer, expanding the number of channels back to 64 by the second full-connection layer to obtain 2 one-dimensional channel weight sequences, adding 2 groups of weight sequences, and mapping the values to [0,1 ] through a Sigmoid activation function]Finally, 1 group of weight sequences are obtained and are compared with the second characteristic diagram
Figure 177724DEST_PATH_IMAGE023
Carrying out feature weighting to obtain an intermediate feature map
Figure 571577DEST_PATH_IMAGE025
Completing the attention operation of the channel domain;
step 52: different from channel domain attention channel information, the spatial domain attention is mainly attention position information, and an intermediate feature map is obtained
Figure 560260DEST_PATH_IMAGE025
Respectively obtaining feature graphs of 2 single channels through GAP and GMP, connecting the feature graphs of the 2 single channels according to channel dimensionality, performing convolution operation to obtain a spatial domain attention feature graph, and mapping to [0,1 ] after passing through a Sigmoid activation function]Obtaining the attention weight of the space domain, and combining the attention weight with the intermediate feature map
Figure 166385DEST_PATH_IMAGE025
Multiplying to obtain the final third characteristic diagram
Figure 607729DEST_PATH_IMAGE024
Specifically, the method comprises the following steps: drawing the intermediate features
Figure 336432DEST_PATH_IMAGE025
Dividing the channel into two vectors with the size of H multiplied by W multiplied by 1 through maximum pooling and average pooling, stacking the two tensors together through channel dimension splicing operation, wherein the number of channels is 2, changing the channels into 1 through convolution operation while ensuring that H and W are unchanged, and then connecting a Sigmoid activation function to map the numerical values to [0,1]Obtaining attention weight of space domain, and finally obtaining the intermediate feature map
Figure 421763DEST_PATH_IMAGE025
Multiplying the weight value to obtain a third feature map with the size of 64 multiplied by H multiplied by W
Figure 363172DEST_PATH_IMAGE024
And completing the spatial domain attention operation.
The structural diagram of the attention feature enhancement module AFE is shown in fig. 5. The input features are subjected to three cavity convolution branches with the expansion rates of r =12, r =24 and r =36 convolution kernels of 3x3, the output features of the three branches are spliced in channel dimensions, then subjected to convolution fusion features of 1x1 and added with the input original features element by element, and finally, a CBAM mixed attention module is used for suppressing background and noise and enhancing feature information.
Step 6: and (3) obtaining a final detection result through classification and regression, obtaining a central point prediction result, a central point offset prediction result and a target width and height prediction result through classification and regression on the fourth feature map output in the step (5) through three convolution branches of 3x3, and obtaining the final prediction result by combining the target central point, the central point offset and the target width and height, wherein the feature sizes of the central point prediction result, the central point offset prediction result and the target width and height prediction result are respectively (C, P/4, P/4), (2, P/4, P/4) and (2, P/4, P/4), wherein C is the number of the feature types in the detected image, and P is the size of the input image.
To further illustrate the effectiveness of the proposed method, the evaluation criterion employs the average accuracy (mAP) widely used in target detection, which refers to the average of multiple classes of average Accuracy (AP). The AP represents the area size of a curve drawn by a certain category in the range of 0 to 1 according to its corresponding accuracy (Precision) and Recall (Recall). The calculation modes of the accuracy rate and the recall rate are respectively shown as the following formula (4) and formula (5):
Figure DEST_PATH_IMAGE040
(4)
Figure DEST_PATH_IMAGE041
(5)
wherein, TP represents a true positive case, i.e. the model is judged to be true and actually is also a true target, FN represents a false negative case, and FP represents a false positive case.
The average precision AP value of a single category is calculated as formula (6), and the average precision mAP value is calculated as formula (7), wherein C represents the number of categories participating in calculation, P is the accuracy, and R is the recall ratio.
Figure DEST_PATH_IMAGE042
(6)
Figure DEST_PATH_IMAGE043
(7)
Platform and system setup for experiments: CPU is AMD Ryzen 53600X 6-Core; the GPU is NVIDIA GeForce RTX 3090; the operating system is Ubuntu20.04, the pytorch1.8 deep learning framework is used, and the Python version is 3.8. In the experiment, an SGD is selected as a network optimizer, momentum is set to be 0.9, an initial learning rate is set to be 0.01, an attenuation coefficient is 0.0001, a step strategy is adopted to adjust the learning rate, the batch size is 18, 24 epochs are iterated altogether, and the learning rates are reduced at 18 th epoch and 22 th epoch respectively. The same data enhancement method is adopted in the experiment, and comprises color gamut enhancement, affine transformation, random clipping, random rotation, random scale transformation, random inversion and the like.
The DIOR and NWPUVHR-10 public remote sensing image data set is compared with some existing classical target detection algorithms. The same configuration was used on the NWPUVHR-10 dataset in comparison with the YOLOv3, Faster-RCNNWithFPN, Cascade-RCNNWith FPN, RetinaNet and CenterNet networks, with the results shown in Table 1.
TABLE 1 comparison of the detection results of different algorithms on the NWPUVHR-10 dataset
Figure DEST_PATH_IMAGE044
Note: bold represents the optimal value. The current column goodness is underlined.
FIG. 7 is a diagram comparing an example of the detection performed on a DIOR dataset with a base line network CenterNet network. FIG. 7a shows the results of an example of the CenterNet assay, and FIG. 7b shows the results of an example of the assay according to the method of the present invention.
FIG. 8 is a graph comparing an example of the test performed on the NWPUWHR-10 dataset with the CenterNet method. FIG. 8a shows the results of an example of the CenterNet assay, and FIG. 8b shows the results of an example of the assay according to the method of the present invention. As can be seen from fig. 7 and 8, the detection precision and accuracy of the method of the present invention are significantly better than those of the comparative method, especially in the detection of small targets.
It should be noted that the above-mentioned embodiments are exemplary, and that those skilled in the art, having benefit of the present disclosure, may devise various arrangements that are within the scope of the present disclosure and that fall within the scope of the invention. It should be understood by those skilled in the art that the present specification and figures are illustrative only and are not limiting upon the claims. The scope of the invention is defined by the claims and their equivalents.

Claims (3)

1. The remote sensing image target detection method based on multi-scale feature fusion and feature enhancement is characterized in that the method adopts transverse connection to increase communication between adjacent features while realizing top-down feature fusion on features with different resolutions through a self-adaptive multi-scale feature fusion module, and then enhances effective feature information while improving generalization capability of a network to target scales and improving target detection capability through an attention feature enhancement module by combining a multi-branch cavity convolution and attention mechanism, and specifically comprises the following steps:
step 1: extracting the characteristics of the input remote sensing image, inputting the remote sensing image into the main network by adopting a ResNet network through the main network, and outputting multi-scale characteristic map groups with different resolutions at the last four layers of the ResNet network through multi-group convolution and poolingC 1 ,C 2 ,C 3 ,C 4 };
Step 2: adjusting the number of characteristic channels, and making the multi-scale characteristic map groupC 1 ,C 2 ,C 3 ,C 4 Respectively carrying out convolution operation of 1x1 once, and adjusting the channel number of the multi-scale feature map to be the shallowest feature mapC 1 Matching to obtain a feature map groupP 1 ,P 2 , P 3 ,P 4 };
And step 3: adopting the self-adaptive multi-scale feature fusion module to perform feature map set { P) obtained in step 2 1 ,P 2 ,P 3 ,P 4 Performing feature fusion, including a first top-down fusion stage, a bottom-up fusion stage, and a second top-down fusion stage, specifically:
step 31: the first top-down fusion stage, which introduces transverse connection in the fusion process, and gradually fuses from the deepest characteristic diagramP 4 Is started by mixingP 4 After upsampling andP 3 are fused to obtain
Figure 452155DEST_PATH_IMAGE001
P 3 After upsampling and
Figure 641959DEST_PATH_IMAGE002
are fused to obtain
Figure 400443DEST_PATH_IMAGE003
P 2 After upsampling and
Figure 119132DEST_PATH_IMAGE004
are fused to obtain
Figure 228646DEST_PATH_IMAGE005
Completing the first forward propagation; then will be
Figure 690982DEST_PATH_IMAGE001
After upsampling and
Figure 303973DEST_PATH_IMAGE003
are fused to obtain
Figure 193562DEST_PATH_IMAGE006
Will be
Figure 806684DEST_PATH_IMAGE003
After upsampling and
Figure 72711DEST_PATH_IMAGE005
are fused to obtain
Figure 829225DEST_PATH_IMAGE007
Completing the second forward propagation; finally will be
Figure 155295DEST_PATH_IMAGE006
After upsampling and
Figure 504981DEST_PATH_IMAGE007
are fused to obtain
Figure 840278DEST_PATH_IMAGE008
Completing the third forward propagation to finally obtain a feature map group
Figure 427861DEST_PATH_IMAGE009
Step 32: a bottom-up fusion stage by matching the set of feature maps obtained in step 31 from the shallowest features
Figure 659253DEST_PATH_IMAGE008
At the beginning, will
Figure 246967DEST_PATH_IMAGE008
After double down-sampling and
Figure 851868DEST_PATH_IMAGE002
and
Figure 31307DEST_PATH_IMAGE010
are fused to obtain
Figure 696251DEST_PATH_IMAGE011
Then will be
Figure 226720DEST_PATH_IMAGE011
After double down-sampling and
Figure 635311DEST_PATH_IMAGE012
and
Figure 200416DEST_PATH_IMAGE001
are fused to obtain
Figure 770681DEST_PATH_IMAGE013
And finally will
Figure 788447DEST_PATH_IMAGE013
After two times down-sampling with P 4 Are fused to obtain
Figure 24167DEST_PATH_IMAGE014
Finally, a feature map set is obtained
Figure 443778DEST_PATH_IMAGE015
Step 33: in the second top-down fusion stage, the feature map set obtained in step 32 is fused from the deepest features
Figure 184944DEST_PATH_IMAGE014
Sequentially up-sampling and adding layer by layer to obtain a high-resolution first characteristic diagram with the size of P/4P out
And 4, step 4: the first characteristic diagram obtained in the step 33P out Inputting the feature into the attention feature enhancing module for feature enhancement, wherein the attention feature enhancing module comprises a multi-branch hole convolution module and a mixed attention mechanism module, each branch of the multi-branch hole convolution module is provided with different expansion rates, and the first feature map is obtained by integrating the first feature map and the second feature mapP out Fusing the features after convolution with different expansion rates to obtain a second feature mapF 1
And 5: the second feature mapF 1 Suppressing background and noise input to the hybrid attention mechanism module, the hybrid attention mechanism module including a channel domain attention module and a spatial domain attention module, the second feature mapF 1 Obtaining a third feature map after the processing of the channel domain attention module and the spatial domain attention moduleF out
Step 6: obtaining the final detection result through classification and regression, and outputting the third feature map output in the step 5F out Obtaining a central point prediction result, a central point offset prediction result and a central point offset prediction result after three convolution branches of 3x3And (4) obtaining a final prediction result by fusing the three prediction results according to the target width and height prediction result.
2. The method for detecting the target of the remote sensing image based on the multi-scale feature fusion and the feature enhancement as claimed in claim 1, wherein the mixed attention mechanism module obtains a third feature mapF out The specific process comprises the following steps:
step 51: the second characteristic diagram obtained in the step 4F 1 Inputting the channel domain attention module, firstly adding all characteristic values of each channel through a global average pooling GAP, then averaging, converting a two-dimensional characteristic diagram into a real number to obtain a Cx1x1 vector, C represents the number of channels, simultaneously using the global average pooling GAP and the global maximum pooling GMP along the channel dimension, respectively sending the real number and the vector into 2 full connection layers for training and learning to obtain 2 one-dimensional channel weight sequences, adding 2 groups of channel weight sequences, mapping the sum to [0,1 after a Sigmoid activation function]Finally, 1 group of weight sequences are obtained and are combined with the second characteristic diagramF 1 Carrying out feature weighting to obtain an intermediate feature map
Figure 955585DEST_PATH_IMAGE016
Completing the channel domain attention operation;
step 52: the intermediate feature map is processed
Figure 174821DEST_PATH_IMAGE016
Respectively obtaining feature graphs of 2 single channels through global average pooling GAP and global maximum pooling GMP, connecting the feature graphs of the 2 single channels according to channel dimensionality, performing convolution operation to obtain a spatial domain attention feature graph, and mapping to [0,1 ] after passing through a Sigmoid activation function]Obtaining the attention weight of the space domain, and combining the attention weight with the intermediate feature map
Figure 448938DEST_PATH_IMAGE016
Multiplying to obtain the final third characteristic diagramF out
3. The method for detecting the target of the remote sensing image based on the multi-scale feature fusion and the feature enhancement as claimed in claim 2, wherein the feature fusion module in the step 3 further considers that the contribution degrees of the features with different resolutions to the fused features are different, increases learnable weight coefficients, and realizes the effect of adaptive fusion, thereby improving the ratio invariance of the features, and the specific implementation process is as follows:
firstly, the resolution ratios of the multi-scale features to be fused are adjusted to be consistent, and the adjustment means is as follows:
(1) in the first top-down stage, the deep features are subjected to two-time up-sampling by a nearest neighbor interpolation method;
(2) in the stage from bottom to top, the shallow layer features are sampled twice by maximum pooling, the adjusted features are multiplied by their corresponding weight coefficients, added element by element, and finally fused by Swish activating function, convolution and batch normalization.
CN202210614648.2A 2022-06-01 2022-06-01 Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement Active CN114708511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210614648.2A CN114708511B (en) 2022-06-01 2022-06-01 Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210614648.2A CN114708511B (en) 2022-06-01 2022-06-01 Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement

Publications (2)

Publication Number Publication Date
CN114708511A CN114708511A (en) 2022-07-05
CN114708511B true CN114708511B (en) 2022-08-16

Family

ID=82177099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210614648.2A Active CN114708511B (en) 2022-06-01 2022-06-01 Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement

Country Status (1)

Country Link
CN (1) CN114708511B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187603A (en) * 2022-09-13 2022-10-14 国网浙江省电力有限公司 Power equipment detection method and device based on deep neural network
CN115565077A (en) * 2022-09-29 2023-01-03 哈尔滨天枢问道技术有限公司 Remote sensing image small target detection algorithm based on spatial feature integration
CN116051984B (en) * 2022-12-20 2023-07-04 中国科学院空天信息创新研究院 Weak and small target detection method based on Transformer
CN116563615B (en) * 2023-04-21 2023-11-07 南京讯思雅信息科技有限公司 Bad picture classification method based on improved multi-scale attention mechanism
CN117132870B (en) * 2023-10-25 2024-01-26 西南石油大学 Wing icing detection method combining CenterNet and mixed attention
CN117237830B (en) * 2023-11-10 2024-02-20 湖南工程学院 Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention
CN117726954B (en) * 2024-02-09 2024-04-30 成都信息工程大学 Sea-land segmentation method and system for remote sensing image

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046598A (en) * 2019-04-23 2019-07-23 中南大学 The multiscale space of plug and play and channel pay attention to remote sensing image object detection method
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
CN111311518A (en) * 2020-03-04 2020-06-19 清华大学深圳国际研究生院 Image denoising method and device based on multi-scale mixed attention residual error network
CN111754404A (en) * 2020-06-18 2020-10-09 重庆邮电大学 Remote sensing image space-time fusion method based on multi-scale mechanism and attention mechanism
CN112215207A (en) * 2020-11-10 2021-01-12 中国人民解放军战略支援部队信息工程大学 Remote sensing image airplane target detection method combining multi-scale and attention mechanism
CN112580664A (en) * 2020-12-15 2021-03-30 哈尔滨理工大学 Small target detection method based on SSD (solid State disk) network
CN114359709A (en) * 2021-12-07 2022-04-15 北京北方智图信息技术有限公司 Target detection method and device for remote sensing image

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046598A (en) * 2019-04-23 2019-07-23 中南大学 The multiscale space of plug and play and channel pay attention to remote sensing image object detection method
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method
CN111311518A (en) * 2020-03-04 2020-06-19 清华大学深圳国际研究生院 Image denoising method and device based on multi-scale mixed attention residual error network
CN111754404A (en) * 2020-06-18 2020-10-09 重庆邮电大学 Remote sensing image space-time fusion method based on multi-scale mechanism and attention mechanism
CN112215207A (en) * 2020-11-10 2021-01-12 中国人民解放军战略支援部队信息工程大学 Remote sensing image airplane target detection method combining multi-scale and attention mechanism
CN112580664A (en) * 2020-12-15 2021-03-30 哈尔滨理工大学 Small target detection method based on SSD (solid State disk) network
CN114359709A (en) * 2021-12-07 2022-04-15 北京北方智图信息技术有限公司 Target detection method and device for remote sensing image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"一种高分辨率卫星图像道路提取方法";晏美娟,魏敏,文武;《成都信息工程大学学报》;20220215;46-50 *
Guokai Zhang ; Weizhe Xu ; Wei Zhao ; Chenxi Huang ; Eddie Ng Yk."A Multiscale Attention Network for Remote Sensing Scene Images Classification".《IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING》.2021, *

Also Published As

Publication number Publication date
CN114708511A (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN114708511B (en) Remote sensing image target detection method based on multi-scale feature fusion and feature enhancement
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN109902806B (en) Method for determining target bounding box of noise image based on convolutional neural network
CN110443805B (en) Semantic segmentation method based on pixel density
CN112084868A (en) Target counting method in remote sensing image based on attention mechanism
CN114445430B (en) Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion
CN110826411B (en) Vehicle target rapid identification method based on unmanned aerial vehicle image
CN115223057B (en) Target detection unified model for multimodal remote sensing image joint learning
CN115035361A (en) Target detection method and system based on attention mechanism and feature cross fusion
CN112966747A (en) Improved vehicle detection method based on anchor-frame-free detection network
CN114898284B (en) Crowd counting method based on feature pyramid local difference attention mechanism
CN113052200A (en) Sonar image target detection method based on yolov3 network
CN113205103A (en) Lightweight tattoo detection method
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN115187786A (en) Rotation-based CenterNet2 target detection method
CN112215199A (en) SAR image ship detection method based on multi-receptive-field and dense feature aggregation network
CN116486080A (en) Lightweight image semantic segmentation method based on deep learning
CN115984323A (en) Two-stage fusion RGBT tracking algorithm based on space-frequency domain equalization
CN112668532A (en) Crowd counting method based on multi-stage mixed attention network
CN116402761A (en) Photovoltaic panel crack detection method based on double-channel multi-scale attention mechanism
CN112132746B (en) Small-scale pedestrian target rapid super-resolution method for intelligent roadside equipment
CN112686233B (en) Lane line identification method and device based on lightweight edge calculation
CN115223033A (en) Synthetic aperture sonar image target classification method and system
CN113537032A (en) Diversity multi-branch pedestrian re-identification method based on picture block discarding
CN112733934A (en) Multi-modal feature fusion road scene semantic segmentation method in complex environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant