CN111738110A - Remote sensing image vehicle target detection method based on multi-scale attention mechanism - Google Patents

Remote sensing image vehicle target detection method based on multi-scale attention mechanism Download PDF

Info

Publication number
CN111738110A
CN111738110A CN202010521480.1A CN202010521480A CN111738110A CN 111738110 A CN111738110 A CN 111738110A CN 202010521480 A CN202010521480 A CN 202010521480A CN 111738110 A CN111738110 A CN 111738110A
Authority
CN
China
Prior art keywords
feature
feature map
network
attention
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010521480.1A
Other languages
Chinese (zh)
Inventor
门飞飞
李训根
马琪
潘勉
吕帅帅
李子璇
张战
刘爱林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202010521480.1A priority Critical patent/CN111738110A/en
Publication of CN111738110A publication Critical patent/CN111738110A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image vehicle target detection method based on a multi-scale attention mechanism, which comprises the following steps: s1, extracting the features of the original picture by using a multilayer convolutional neural network, and constructing a bottom-up pyramid network from the generated feature pictures with different scales; s2, for the constructed pyramid network, realizing the feature fusion from top to bottom, and in the fusion process, sequentially carrying out channel attention operation on the high-level feature map and fusing the high-level feature map to the low-level feature map; s3, acquiring the spatial attention information of the fused low-level feature map, and fusing the spatial attention information into the original low-level feature; s4, generating a large number of candidate frames according to the preset size, proportion and the like, determining a used feature map according to the size of the real frame of the detection target, and judging the positive and negative of the candidate frames according to the intersection ratio of the real frame and the candidate frames; and S5, directly predicting the category information and regression information of the obtained positive sample candidate frame, and filtering the obtained overlapped region same category candidate frame by using a non-maximum inhibition method.

Description

Remote sensing image vehicle target detection method based on multi-scale attention mechanism
Technical Field
The invention belongs to the technical field of image processing of deep learning, and particularly relates to a remote sensing image vehicle target detection method based on a multi-scale attention mechanism.
Background
With the development of remote sensing satellite technology, a large number of remote sensing pictures across space and time can be easily acquired. The remote sensing image provides a new visual angle for people to analyze the ground vehicles. The detection of the vehicle target through the aerial visual angle can help the tasks of urban intelligent traffic, urban traffic planning, military target detection and tracking, cross-regional remote monitoring and the like to be smoothly implemented. The identification and detection of vehicle objects is an important and fundamental function in the above task. The quality of the remote sensing image changes along with different acquisition platforms and acquisition modes of the remote sensing image. Different ground sampling distances create different scales for the same target, which presents challenges for the detection of different targets, especially small targets.
The traditional method for identifying the remote sensing image vehicle by using manual feature extraction has high design difficulty and low identification rate, is difficult to accurately identify the vehicle in a small and dense vehicle target area, and is difficult to avoid complex ground environment interference information.
With the development of deep learning technology, vehicle target semantic information can be easily acquired through training of a deep neural network. There are not trivial challenges to accurately identify the specific location of the vehicle. Among them, the feature pyramid formed based on the deep neural network is widely used in the field of detection of multi-scale targets and small targets. The characteristic diagrams with different scales are selected for detection according to the area information of the target, so that certain effect improvement is achieved. However, the vehicle targets are mostly concentrated on lower-layer features due to the large number of small targets, and the lower-layer features obtained by simple up-sampling and addition often do not have rich semantic features.
Disclosure of Invention
In view of the technical problems, the invention is used for providing a remote sensing image vehicle target detection method based on a multi-scale attention mechanism, and aiming at the characteristics of smaller vehicle targets, an attention mechanism strengthening mode is adopted for the low-level characteristics of a characteristic pyramid. By fusing the channel attention mechanism and the space attention mechanism for the feature graph of the lower layer, the features of the lower layer have different weights on channel and space information, more accurate semantic information is provided for target identification and detection of a subsequent network, and interference of background information in a remote sensing image on a vehicle target is reduced.
In order to solve the technical problems, the invention adopts the following technical scheme:
a remote sensing image vehicle target detection method based on a multi-scale attention mechanism comprises the following steps:
s1, extracting the features of the original picture by using a multilayer convolutional neural network, and constructing a bottom-up pyramid network from the generated feature pictures with different scales;
s2, for the constructed pyramid network, realizing the feature fusion from top to bottom, and in the fusion process, sequentially carrying out channel attention operation on the high-level feature map and fusing the high-level feature map to the low-level feature map;
s3, acquiring the spatial attention information of the fused low-level feature map, and fusing the spatial attention information into the original low-level feature;
s4, generating a large number of candidate frames according to the preset size, proportion and the like, determining a used feature map according to the size of the real frame of the detection target, and judging the positive and negative of the candidate frames according to the intersection ratio of the real frame and the candidate frames;
and S5, directly predicting the category information and regression information of the obtained positive sample candidate frame, and filtering the obtained overlapping region same category candidate frame by using a non-maximum inhibition method to obtain a final detection result.
Preferably, the S1 includes: selecting ResNet-50 as a basic convolution neural network, enabling pictures to pass through the network, outputting feature maps with different scales at different layers, enabling each feature map to be output by the neural network through which the next feature map passes, enabling different features to have different channel numbers at the moment, enabling the features at the upper layer to be more in channel number and smaller in scale, firstly performing channel number unification on the different feature maps, and enabling the process to be as follows:
Pi=Conv3×3(Ci,256,3,1,1) (1)
wherein, PiCharacteristic diagram, Conv, representing the i-th layer3x3Denotes the 3 × 3 convolutional layer, CiI-th characteristic diagram showing the input picture obtained through ResNet-50, convolution layer Conv at 3 × 33x3Inner, CiThe number of channels of the input feature map is 256, the number of channels of the output feature map is 3, the size of the convolution kernel used is represented by 3, 1 represents the step size of each movement of the convolution kernel, and 1 represents the number of boundary fillings for the feature map.
Preferably, the S2 includes: the feature map fusion of each time always involves one high-level feature and one low-level feature in operation, the high-level feature map P4 is translated and unchanged, the next high-level feature map P3 fuses information from the feature map P4, the maximum pooling and average pooling of channels is firstly carried out on the low-level features, then the two pooled results are input into 1 × 1 convolution, and a feature block with 256 channels and 1 × 1 scale is obtained; secondly, the feature block and the low-level feature map are subjected to channel multiplication to obtain a low-level feature map containing the attention of a channel, and the process is expressed as the following form:
Figure BDA0002532329470000042
wherein
Figure BDA0002532329470000043
Represents PiObtaining a profile, P, through the channel attentioni-1Is PiOf the next layer network, Conv1×1Represents a convolution operation of 1 × 1, cat () represents a join operation of the feature graph, Cmaxpool() Denotes maximum pooling of channels, Cavgpool() Representing channel-averaged pooling, unomple () is an upsampling of the feature map.
Preferably, the S3 includes: firstly, performing spatial maximum pooling on the feature map obtained in the last step to obtain feature blocks with unchanged scale and 1 channel number, and then simultaneously obtaining average pooled feature blocks; splicing the two feature blocks, and then sending the spliced feature blocks into a convolution block with the convolution kernel size of 1x1 to obtain a feature block with the channel number of 1, wherein the feature block is fused with spatial information in a feature map;
secondly, activating the value of a pixel point in the characteristic diagram to be between 0 and 1 by utilizing a Sigmoid () activation function; finally, the final result is obtained by multiplying the feature map by the matrix of the feature block, and the process can be expressed as the following form:
Figure BDA0002532329470000041
wherein the content of the first and second substances,
Figure BDA0002532329470000051
representing the characteristic map, S, finally obtained by the channel attention and the spatial attentionmaxpool() Representing the spatial maximum pooling, Savgpool() Representing space average pooling, and Sigmoid () representing Sigmoid activation of the feature block obtained after convolution.
Preferably, the S4 includes: after generating a characteristic pyramid and fusing attention information, the network has a plurality of characteristic graphs with 256 channels with different scales from top to bottom, generates a large number of candidate frames in the input remote sensing picture by a candidate region generation method, and judges the positive and negative of the candidate frames according to the intersection of a target frame of a vehicle target in the input remote sensing picture and the candidate after filtering the candidate frames beyond the picture boundary. For the candidate box of the positive sample, the area is considered to have the vehicle object.
Preferably, the S5 includes: the plurality of feature maps obtained in S3 are put into two sub-networks: the target frame type prediction sub-network and the target frame regression sub-network are characterized in that the target frame type prediction sub-network conducts multiple convolution on an input feature graph to obtain a feature block with an unchanged scale, the number of channels is 2, 2 indicates that the prediction types are two, a vehicle target and a non-vehicle target are provided, the target frame regression sub-network conducts multiple convolution on the input feature graph to obtain a feature block with an unchanged scale, the number of channels is 4, and 4 indicates the number of regression parameters of a target frame.
The invention has the following beneficial effects:
(1) according to the embodiment of the invention, the attention information of the vehicle target in the feature map is considered when the feature pyramid is used, and the fused attention information is utilized to extract the important information of the vehicle target in the space and the channel of the feature map.
(2) According to the embodiment of the invention, two attention mechanisms are fused into the characteristic pyramid network, so that the accuracy and recall rate of the target detection result of the remote sensing image are improved on the premise of not greatly increasing the internal memory and running time of the network.
Drawings
FIG. 1 is a schematic diagram of a remote sensing image vehicle target detection method based on a multi-scale attention mechanism according to the invention;
FIG. 2 is a schematic diagram of a method of incorporating attention into a pyramid of features according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 shows a method for detecting a vehicle target based on a remote sensing image of a multi-scale attention mechanism, which includes the following steps:
and S1, performing feature extraction on the original picture by using a multilayer convolutional neural network, and constructing a bottom-up pyramid network from the generated feature maps with different scales.
As a specific implementation mode, ResNet-50 is selected as a basic convolutional neural network. As shown in the left side of fig. 1, the picture passes through the network, and different scales of feature maps are output at different layers, and each previous feature map is output by the next feature map through the neural network. The different features have different channel numbers, and the higher the upper layer features, the more the channel numbers are, but the smaller the dimension is. Firstly, channel numbers of different feature maps are unified. The process is as follows:
Pi=Conv3×3(Ci,256,3,1,1) (1)
wherein, PiRepresenting the characteristic diagram of the ith layer. Conv3x3Represents 3 × 3 convolutional layer. CiI-th characteristic diagram showing input picture obtained through ResNet-50 at 3 × 33x3Inner, CiThe number of channels of the input feature map is 256, the number of channels of the output feature map is 3, the size of the convolution kernel used is represented by 3, 1 represents the step size of each movement of the convolution kernel, and 1 represents the number of boundary fillings for the feature map.
And S2, realizing top-down feature fusion for the constructed pyramid network. In the fusion process, channel attention operation is carried out on the high-level feature map in sequence and the high-level feature map is fused to the low-level feature map.
As a specific implementation mode, each time the feature map fusion is always operated by one high-level feature and one low-level feature. As shown on the right side of fig. 1, the high level feature map P4 is translated unchanged, and the next highest level feature map P3 will fuse the information from feature map P4. As shown in the left side of fig. 2, the channel maximum pooling and average pooling is performed on the low-level features first, and then the two pooled results are input into a 1 × 1 convolution to obtain a feature block with 256 channels and a scale size of 1 × 1. And secondly, channel multiplication is carried out on the feature block and the low-level feature map to obtain the low-level feature map containing the attention of the channel. The process can be expressed in the following form:
Figure BDA0002532329470000071
wherein
Figure BDA0002532329470000072
Represents PiThe feature map is obtained by channel attention. Pi-1Is PiThe next layer of network. Conv1×1Represents the convolution operation of 1 × 1, cat () represents the join operation of the feature graph Cmaxpool() Denotes maximum pooling of channels, Cavgpool() Indicating channel average pooling. Unomple () is an upsampling of a feature map.
And S3, acquiring the spatial attention information of the fused low-level feature map, and fusing the spatial attention information into the original low-level feature.
As a specific embodiment, as shown in the right side of fig. 2, the feature map obtained in the previous step is first pooled to the maximum space to obtain feature blocks with the same scale and the number of channels being 1, and then the feature blocks with the average pooling are obtained at the same time. And splicing the two feature blocks, and sending the spliced feature blocks into a convolution block with the convolution kernel size of 1x1 to obtain a feature block with the channel number of 1, wherein the feature block fuses spatial information in the feature map.
Then, activating the value of the pixel point in the feature map to be between 0 and 1 by using a Sigmoid () activation function. And finally, multiplying the characteristic diagram by the matrix of the characteristic block to obtain a final result. The process can be expressed in the following form:
Figure BDA0002532329470000081
wherein the content of the first and second substances,
Figure BDA0002532329470000082
representing the characteristic map finally obtained by the channel attention and the spatial attention. Smaxpool() Representing the spatial maximum pooling, Savgpool() Representing the spatial average pooling. Sigmoid () represents Sigmoid activation of the feature block obtained after convolution.
At S4, a large number of candidate frames are generated by a predetermined size, ratio, or the like. And determining the used characteristic diagram according to the size of the real frame of the detection target. And judging the positive and negative of the candidate frame through the intersection ratio of the real frame and the candidate frame.
As a specific implementation mode, after the network generates a feature pyramid and fuses attention information, the network has a feature map with 256 channels with different scales from top to bottom, a large number of candidate frames are generated in an input remote sensing picture through a candidate region generation method, after candidate frames exceeding the picture boundary are filtered, the positive and negative of the candidate frames are judged according to the intersection and combination ratio of a target frame of a vehicle target and the candidates in the input remote sensing picture. For the candidate box of the positive sample, the area is considered to have the vehicle object.
S5, the category information and regression information of the positive sample candidate box are directly predicted. And filtering the obtained overlapped region same-class candidate frames by using a non-maximum inhibition method to obtain a final detection result.
As a specific implementation, the plurality of feature maps obtained in step 3 are imported into two sub-networks: target box class prediction subnetwork and target box regression subnetwork. The target frame type prediction sub-network performs multiple convolutions on the input feature map to obtain feature blocks with unchanged scales and 2 channels (2 indicates that two types of prediction types are available, namely, a vehicle target and a non-vehicle target). The target frame regression subnetwork convolutes the input feature graph for multiple times to obtain feature blocks with unchanged scales and 4 channels (4 represents the number of regression parameters of the target frame).
To verify the validity of the inventive scheme, the following simulation experiment was performed.
Firstly, loading a pre-training model ResNet-50 provided by torchvision to initialize network parameters, inputting the processed remote sensing picture with the label into a neural network, and extracting feature maps with different scales and different channel numbers of the picture. And (3) forming a characteristic pyramid network by adopting the mode of the step 1.
Then, attention information fusion is carried out on each feature map except the highest layer in the feature pyramid. The high-level feature map is firstly subjected to global channel maximum pooling and global channel average pooling. The obtained connected feature block is subjected to a 1x1 convolution operation to obtain the channel attention information of a single channel, and the attention information block of the single channel is multiplied by the low-level feature map. And 2 times down sampling the high-level feature map and adding the down-sampled high-level feature map and the low-level feature map fused with the channel attention information.
Secondly, the feature map containing the channel attention information obtained in the last step is subjected to spatial maximum pooling and spatial maximum pooling. Likewise, the concatenated feature blocks are convolved by 1 × 1, and the number of channels is reduced to 1. And then activating the value of each pixel point in the obtained space attention information block to be between 0 and 1 by using a sigmoid () activation function. The closer to 1 pixel point, the higher the importance. And finally, multiplying the feature block by a feature map, and simultaneously obtaining the feature map with channel attention information and space attention information.
Then, a subsequent class prediction sub-network and object box regression sub-network are generated for each feature map. In the class prediction subnetwork, the input feature map is WxHx256 in size. The feature blocks of WxHx2 and WxHx4 are obtained through two FCN-like subnets, respectively. Meanwhile, a large number of candidate frames are generated on feature maps of different scales respectively. These candidate boxes determine whether the candidate box is a positive case by a cross-over ratio (here 0.5) with the true box in the graph.
Finally, after the characteristic diagram of the positive sample candidate box is determined, two sub-networks subsequent to the characteristic diagram of the layer calculate the network loss. For class prediction sub-networks, Focal distance is used, and for target box regression sub-networks, SmoothL1Loss is used. During the inference phase, the sub-network outputs the target box and the confidence of the target box, respectively. And (4) screening confidence degrees through a threshold of 0.05, and filtering some low-confidence-degree target frames with overlapped regions by using a non-maximum inhibition method and taking 0.5 as a reference.
In addition, the scale information and the definition of the vehicle target in satellite images of different sampling distances and different areas have larger deviation because the vehicle target in the remote sensing image is smaller. In a general vehicle region, background information is complex, and interference is caused to vehicle detection in a remote sensing image. The method combines the characteristics of the vehicle target in the remote sensing image, strengthens the semantic information of the low-level features by fusing various attention mechanisms in the feature pyramid, enables the information part which can represent the vehicle in the feature map to be more prominent in a channel and a space, and weakens the influence of background noise information on the detection result.
In conclusion, the invention further improves the vehicle detection performance in the remote sensing image by combining the data characteristics of the vehicle target in the remote sensing image.
It is to be understood that the exemplary embodiments described herein are illustrative and not restrictive. Although one or more embodiments of the present invention have been described with reference to the accompanying drawings, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.

Claims (6)

1. A remote sensing image vehicle target detection method based on a multi-scale attention mechanism is characterized by comprising the following steps:
s1, extracting the features of the original picture by using a multilayer convolutional neural network, and constructing a bottom-up pyramid network from the generated feature pictures with different scales;
s2, for the constructed pyramid network, realizing the feature fusion from top to bottom, and in the fusion process, sequentially carrying out channel attention operation on the high-level feature map and fusing the high-level feature map to the low-level feature map;
s3, acquiring the spatial attention information of the fused low-level feature map, and fusing the spatial attention information into the original low-level feature;
s4, generating a large number of candidate frames according to the preset size, proportion and the like, determining a used feature map according to the size of the real frame of the detection target, and judging the positive and negative of the candidate frames according to the intersection ratio of the real frame and the candidate frames;
and S5, directly predicting the category information and regression information of the obtained positive sample candidate frame, and filtering the obtained overlapping region same category candidate frame by using a non-maximum inhibition method to obtain a final detection result.
2. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S1 comprises: selecting ResNet-50 as a basic convolution neural network, enabling pictures to pass through the network, outputting feature maps with different scales at different layers, enabling each feature map to be output by the neural network through which the next feature map passes, enabling different features to have different channel numbers at the moment, enabling the features at the upper layer to be more in channel number and smaller in scale, firstly performing channel number unification on the different feature maps, and enabling the process to be as follows:
Pi=Conv3×3(Ci,256,3,1,1) (1)
wherein, PiCharacteristic diagram, Conv, representing the i-th layer3x3Denotes the 3 × 3 convolutional layer, CiI-th characteristic diagram showing the input picture obtained through ResNet-50, convolution layer Conv at 3 × 33x3Inner, CiThe number of channels of the input feature map is 256, the number of channels of the output feature map is 3, the size of the convolution kernel used is represented by 3, 1 represents the step size of each movement of the convolution kernel, and 1 represents the number of boundary fillings for the feature map.
3. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S2 comprises: the feature map fusion of each time always involves one high-level feature and one low-level feature in operation, the high-level feature map P4 is translated and unchanged, the next high-level feature map P3 fuses information from the feature map P4, the maximum pooling and average pooling of channels is firstly carried out on the low-level features, then the two pooled results are input into 1 × 1 convolution, and a feature block with 256 channels and 1 × 1 scale is obtained; secondly, the feature block and the low-level feature map are subjected to channel multiplication to obtain a low-level feature map containing the attention of a channel, and the process is expressed as the following form:
Figure FDA0002532329460000021
wherein
Figure FDA0002532329460000022
Represents PiObtaining a profile, P, through the channel attentioni-1Is PiOf the next layer network, Conv1×1Represents a convolution operation of 1 × 1, cat () represents a join operation of the feature graph, Cmaxpool() Denotes maximum pooling of channels, Cavgpool() Representing channel-averaged pooling, unomple () is an upsampling of the feature map.
4. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S3 comprises: firstly, performing spatial maximum pooling on the feature map obtained in the last step to obtain feature blocks with unchanged scale and 1 channel number, and then simultaneously obtaining average pooled feature blocks; splicing the two feature blocks, and sending the spliced two feature blocks into a convolution block with the convolution kernel size of 1 multiplied by 1 to obtain a feature block with the channel number of 1, wherein the feature block is fused with spatial information in a feature map;
secondly, activating the value of a pixel point in the characteristic diagram to be between 0 and 1 by utilizing a Sigmoid () activation function;
finally, the final result is obtained by multiplying the feature map by the matrix of the feature block, and the process can be expressed as the following form:
Figure FDA0002532329460000031
wherein the content of the first and second substances,
Figure FDA0002532329460000032
representing the characteristic map, S, finally obtained by the channel attention and the spatial attentionmaxpool() Representing the spatial maximum pooling, Savgpool() Representing space average pooling, and Sigmoid () representing Sigmoid activation of the feature block obtained after convolution.
5. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S4 comprises: after generating a characteristic pyramid and fusing attention information, the network has a plurality of characteristic graphs with 256 channels with different scales from top to bottom, generates a large number of candidate frames in the input remote sensing picture by a candidate region generation method, and judges the positive and negative of the candidate frames according to the intersection ratio of a target frame of a vehicle target in the input remote sensing picture and the candidates after filtering the candidate frames beyond the picture boundary. For the candidate box of the positive sample, the area is considered to have the vehicle object.
6. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S5 comprises: the plurality of feature maps obtained in S3 are put into two sub-networks: the target frame type prediction sub-network and the target frame regression sub-network are characterized in that the target frame type prediction sub-network conducts multiple convolution on an input feature graph to obtain a feature block with an unchanged scale, the number of channels is 2, 2 indicates that the prediction types are two, a vehicle target and a non-vehicle target are provided, the target frame regression sub-network conducts multiple convolution on the input feature graph to obtain a feature block with an unchanged scale, the number of channels is 4, and 4 indicates the number of regression parameters of a target frame.
CN202010521480.1A 2020-06-10 2020-06-10 Remote sensing image vehicle target detection method based on multi-scale attention mechanism Pending CN111738110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010521480.1A CN111738110A (en) 2020-06-10 2020-06-10 Remote sensing image vehicle target detection method based on multi-scale attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010521480.1A CN111738110A (en) 2020-06-10 2020-06-10 Remote sensing image vehicle target detection method based on multi-scale attention mechanism

Publications (1)

Publication Number Publication Date
CN111738110A true CN111738110A (en) 2020-10-02

Family

ID=72648522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010521480.1A Pending CN111738110A (en) 2020-06-10 2020-06-10 Remote sensing image vehicle target detection method based on multi-scale attention mechanism

Country Status (1)

Country Link
CN (1) CN111738110A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396115A (en) * 2020-11-23 2021-02-23 平安科技(深圳)有限公司 Target detection method and device based on attention mechanism and computer equipment
CN112489001A (en) * 2020-11-23 2021-03-12 石家庄铁路职业技术学院 Tunnel water seepage detection method based on improved deep learning
CN112529005A (en) * 2020-12-11 2021-03-19 西安电子科技大学 Target detection method based on semantic feature consistency supervision pyramid network
CN112560907A (en) * 2020-12-02 2021-03-26 西安电子科技大学 Limited pixel infrared unmanned aerial vehicle target detection method based on mixed domain attention
CN112633158A (en) * 2020-12-22 2021-04-09 广东电网有限责任公司电力科学研究院 Power transmission line corridor vehicle identification method, device, equipment and storage medium
CN112633352A (en) * 2020-12-18 2021-04-09 浙江大华技术股份有限公司 Target detection method and device, electronic equipment and storage medium
CN112800964A (en) * 2021-01-27 2021-05-14 中国人民解放军战略支援部队信息工程大学 Remote sensing image target detection method and system based on multi-module fusion
CN112906718A (en) * 2021-03-09 2021-06-04 西安电子科技大学 Multi-target detection method based on convolutional neural network
CN112990299A (en) * 2021-03-11 2021-06-18 五邑大学 Depth map acquisition method based on multi-scale features, electronic device and storage medium
CN113011443A (en) * 2021-04-23 2021-06-22 电子科技大学 Key point-based target detection feature fusion method
CN113065601A (en) * 2021-04-12 2021-07-02 陕西理工大学 Deep learning forest fire abnormity detection method based on genetic algorithm optimization
CN113095265A (en) * 2021-04-21 2021-07-09 西安电子科技大学 Fungal target detection method based on feature fusion and attention
CN113111718A (en) * 2021-03-16 2021-07-13 苏州海宸威视智能科技有限公司 Fine-grained weak-feature target emergence detection method based on multi-mode remote sensing image
CN113128575A (en) * 2021-04-01 2021-07-16 西安电子科技大学广州研究院 Target detection sample balancing method based on soft label
CN113255443A (en) * 2021-04-16 2021-08-13 杭州电子科技大学 Pyramid structure-based method for positioning time sequence actions of graph attention network
CN113361428A (en) * 2021-06-11 2021-09-07 浙江澄视科技有限公司 Image-based traffic sign detection method
CN113469287A (en) * 2021-07-27 2021-10-01 北京信息科技大学 Spacecraft multi-local component detection method based on instance segmentation network
CN113658114A (en) * 2021-07-29 2021-11-16 南京理工大学 Contact net opening pin defect target detection method based on multi-scale cross attention
CN113743521A (en) * 2021-09-10 2021-12-03 中国科学院软件研究所 Target detection method based on multi-scale context sensing
CN113920468A (en) * 2021-12-13 2022-01-11 松立控股集团股份有限公司 Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN114092813A (en) * 2021-11-25 2022-02-25 中国科学院空天信息创新研究院 Industrial park image extraction method, model, electronic equipment and storage medium
CN114332644A (en) * 2021-12-30 2022-04-12 北京建筑大学 Large-view-field traffic density acquisition method based on video satellite data
CN114445801A (en) * 2022-01-25 2022-05-06 杭州飞步科技有限公司 Lane line detection method based on cross-layer optimization
CN115984661A (en) * 2023-03-20 2023-04-18 北京龙智数科科技服务有限公司 Multi-scale feature map fusion method, device, equipment and medium in target detection
CN116091787A (en) * 2022-10-08 2023-05-09 中南大学 Small sample target detection method based on feature filtering and feature alignment
CN117496132A (en) * 2023-12-29 2024-02-02 数据空间研究院 Scale sensing detection method for small-scale target detection
CN117689880A (en) * 2024-02-01 2024-03-12 东北大学 Method and system for target recognition in biomedical images based on machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2780595A1 (en) * 2011-06-22 2012-12-22 Roman Palenychka Method and multi-scale attention system for spatiotemporal change determination and object detection
CN110084210A (en) * 2019-04-30 2019-08-02 电子科技大学 The multiple dimensioned Ship Detection of SAR image based on attention pyramid network
CN110533084A (en) * 2019-08-12 2019-12-03 长安大学 A kind of multiscale target detection method based on from attention mechanism
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2780595A1 (en) * 2011-06-22 2012-12-22 Roman Palenychka Method and multi-scale attention system for spatiotemporal change determination and object detection
CN110084210A (en) * 2019-04-30 2019-08-02 电子科技大学 The multiple dimensioned Ship Detection of SAR image based on attention pyramid network
CN110533084A (en) * 2019-08-12 2019-12-03 长安大学 A kind of multiscale target detection method based on from attention mechanism
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111179217A (en) * 2019-12-04 2020-05-19 天津大学 Attention mechanism-based remote sensing image multi-scale target detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
庞立新;高凡;何大海;李满勤;刘方尧;: "一种基于注意力机制RetinaNet的小目标检测方法" *
沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络" *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489001B (en) * 2020-11-23 2023-07-25 石家庄铁路职业技术学院 Tunnel water seepage detection method based on improved deep learning
CN112489001A (en) * 2020-11-23 2021-03-12 石家庄铁路职业技术学院 Tunnel water seepage detection method based on improved deep learning
CN112396115A (en) * 2020-11-23 2021-02-23 平安科技(深圳)有限公司 Target detection method and device based on attention mechanism and computer equipment
WO2021208726A1 (en) * 2020-11-23 2021-10-21 平安科技(深圳)有限公司 Target detection method and apparatus based on attention mechanism, and computer device
CN112396115B (en) * 2020-11-23 2023-12-22 平安科技(深圳)有限公司 Attention mechanism-based target detection method and device and computer equipment
CN112560907A (en) * 2020-12-02 2021-03-26 西安电子科技大学 Limited pixel infrared unmanned aerial vehicle target detection method based on mixed domain attention
CN112560907B (en) * 2020-12-02 2024-05-28 西安电子科技大学 Finite pixel infrared unmanned aerial vehicle target detection method based on mixed domain attention
CN112529005B (en) * 2020-12-11 2022-12-06 西安电子科技大学 Target detection method based on semantic feature consistency supervision pyramid network
CN112529005A (en) * 2020-12-11 2021-03-19 西安电子科技大学 Target detection method based on semantic feature consistency supervision pyramid network
CN112633352A (en) * 2020-12-18 2021-04-09 浙江大华技术股份有限公司 Target detection method and device, electronic equipment and storage medium
CN112633352B (en) * 2020-12-18 2023-08-29 浙江大华技术股份有限公司 Target detection method and device, electronic equipment and storage medium
CN112633158A (en) * 2020-12-22 2021-04-09 广东电网有限责任公司电力科学研究院 Power transmission line corridor vehicle identification method, device, equipment and storage medium
CN112800964A (en) * 2021-01-27 2021-05-14 中国人民解放军战略支援部队信息工程大学 Remote sensing image target detection method and system based on multi-module fusion
CN112906718A (en) * 2021-03-09 2021-06-04 西安电子科技大学 Multi-target detection method based on convolutional neural network
CN112906718B (en) * 2021-03-09 2023-08-22 西安电子科技大学 Multi-target detection method based on convolutional neural network
CN112990299A (en) * 2021-03-11 2021-06-18 五邑大学 Depth map acquisition method based on multi-scale features, electronic device and storage medium
CN112990299B (en) * 2021-03-11 2023-10-17 五邑大学 Depth map acquisition method based on multi-scale features, electronic equipment and storage medium
CN113111718A (en) * 2021-03-16 2021-07-13 苏州海宸威视智能科技有限公司 Fine-grained weak-feature target emergence detection method based on multi-mode remote sensing image
CN113128575A (en) * 2021-04-01 2021-07-16 西安电子科技大学广州研究院 Target detection sample balancing method based on soft label
CN113065601A (en) * 2021-04-12 2021-07-02 陕西理工大学 Deep learning forest fire abnormity detection method based on genetic algorithm optimization
CN113255443A (en) * 2021-04-16 2021-08-13 杭州电子科技大学 Pyramid structure-based method for positioning time sequence actions of graph attention network
CN113255443B (en) * 2021-04-16 2024-02-09 杭州电子科技大学 Graph annotation meaning network time sequence action positioning method based on pyramid structure
CN113095265A (en) * 2021-04-21 2021-07-09 西安电子科技大学 Fungal target detection method based on feature fusion and attention
CN113011443B (en) * 2021-04-23 2022-06-03 电子科技大学 Key point-based target detection feature fusion method
CN113011443A (en) * 2021-04-23 2021-06-22 电子科技大学 Key point-based target detection feature fusion method
CN113361428B (en) * 2021-06-11 2023-03-24 浙江澄视科技有限公司 Image-based traffic sign detection method
CN113361428A (en) * 2021-06-11 2021-09-07 浙江澄视科技有限公司 Image-based traffic sign detection method
CN113469287A (en) * 2021-07-27 2021-10-01 北京信息科技大学 Spacecraft multi-local component detection method based on instance segmentation network
CN113658114A (en) * 2021-07-29 2021-11-16 南京理工大学 Contact net opening pin defect target detection method based on multi-scale cross attention
CN113743521A (en) * 2021-09-10 2021-12-03 中国科学院软件研究所 Target detection method based on multi-scale context sensing
CN113743521B (en) * 2021-09-10 2023-06-27 中国科学院软件研究所 Target detection method based on multi-scale context awareness
CN114092813A (en) * 2021-11-25 2022-02-25 中国科学院空天信息创新研究院 Industrial park image extraction method, model, electronic equipment and storage medium
CN114092813B (en) * 2021-11-25 2022-08-05 中国科学院空天信息创新研究院 Industrial park image extraction method and system, electronic equipment and storage medium
CN113920468A (en) * 2021-12-13 2022-01-11 松立控股集团股份有限公司 Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN113920468B (en) * 2021-12-13 2022-03-15 松立控股集团股份有限公司 Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN114332644A (en) * 2021-12-30 2022-04-12 北京建筑大学 Large-view-field traffic density acquisition method based on video satellite data
CN114445801A (en) * 2022-01-25 2022-05-06 杭州飞步科技有限公司 Lane line detection method based on cross-layer optimization
CN116091787A (en) * 2022-10-08 2023-05-09 中南大学 Small sample target detection method based on feature filtering and feature alignment
CN115984661B (en) * 2023-03-20 2023-08-29 北京龙智数科科技服务有限公司 Multi-scale feature map fusion method, device, equipment and medium in target detection
CN115984661A (en) * 2023-03-20 2023-04-18 北京龙智数科科技服务有限公司 Multi-scale feature map fusion method, device, equipment and medium in target detection
CN117496132A (en) * 2023-12-29 2024-02-02 数据空间研究院 Scale sensing detection method for small-scale target detection
CN117689880A (en) * 2024-02-01 2024-03-12 东北大学 Method and system for target recognition in biomedical images based on machine learning
CN117689880B (en) * 2024-02-01 2024-04-16 东北大学 Method and system for target recognition in biomedical images based on machine learning

Similar Documents

Publication Publication Date Title
CN111738110A (en) Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN108764063B (en) Remote sensing image time-sensitive target identification system and method based on characteristic pyramid
CN112818903B (en) Small sample remote sensing image target detection method based on meta-learning and cooperative attention
CN109086668B (en) Unmanned aerial vehicle remote sensing image road information extraction method based on multi-scale generation countermeasure network
CN110119148B (en) Six-degree-of-freedom attitude estimation method and device and computer readable storage medium
CN108305260B (en) Method, device and equipment for detecting angular points in image
CN110516514B (en) Modeling method and device of target detection model
CN112084869A (en) Compact quadrilateral representation-based building target detection method
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN113901900A (en) Unsupervised change detection method and system for homologous or heterologous remote sensing image
CN113313094B (en) Vehicle-mounted image target detection method and system based on convolutional neural network
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN113255589A (en) Target detection method and system based on multi-convolution fusion network
CN114119610B (en) Defect detection method based on rotating target detection
CN114494870A (en) Double-time-phase remote sensing image change detection method, model construction method and device
CN110909656B (en) Pedestrian detection method and system integrating radar and camera
CN115984537A (en) Image processing method and device and related equipment
CN115620141A (en) Target detection method and device based on weighted deformable convolution
CN111860411A (en) Road scene semantic segmentation method based on attention residual error learning
CN115661767A (en) Image front vehicle target identification method based on convolutional neural network
CN113378642B (en) Method for detecting illegal occupation buildings in rural areas
CN114724021A (en) Data identification method and device, storage medium and electronic device
CN115376094B (en) Scale-perception neural network-based road surface identification method and system for unmanned sweeper
CN111160282A (en) Traffic light detection method based on binary Yolov3 network
CN115984712A (en) Multi-scale feature-based remote sensing image small target detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination