CN111738110A - Remote sensing image vehicle target detection method based on multi-scale attention mechanism - Google Patents
Remote sensing image vehicle target detection method based on multi-scale attention mechanism Download PDFInfo
- Publication number
- CN111738110A CN111738110A CN202010521480.1A CN202010521480A CN111738110A CN 111738110 A CN111738110 A CN 111738110A CN 202010521480 A CN202010521480 A CN 202010521480A CN 111738110 A CN111738110 A CN 111738110A
- Authority
- CN
- China
- Prior art keywords
- feature
- feature map
- network
- attention
- channel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Astronomy & Astrophysics (AREA)
- Remote Sensing (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a remote sensing image vehicle target detection method based on a multi-scale attention mechanism, which comprises the following steps: s1, extracting the features of the original picture by using a multilayer convolutional neural network, and constructing a bottom-up pyramid network from the generated feature pictures with different scales; s2, for the constructed pyramid network, realizing the feature fusion from top to bottom, and in the fusion process, sequentially carrying out channel attention operation on the high-level feature map and fusing the high-level feature map to the low-level feature map; s3, acquiring the spatial attention information of the fused low-level feature map, and fusing the spatial attention information into the original low-level feature; s4, generating a large number of candidate frames according to the preset size, proportion and the like, determining a used feature map according to the size of the real frame of the detection target, and judging the positive and negative of the candidate frames according to the intersection ratio of the real frame and the candidate frames; and S5, directly predicting the category information and regression information of the obtained positive sample candidate frame, and filtering the obtained overlapped region same category candidate frame by using a non-maximum inhibition method.
Description
Technical Field
The invention belongs to the technical field of image processing of deep learning, and particularly relates to a remote sensing image vehicle target detection method based on a multi-scale attention mechanism.
Background
With the development of remote sensing satellite technology, a large number of remote sensing pictures across space and time can be easily acquired. The remote sensing image provides a new visual angle for people to analyze the ground vehicles. The detection of the vehicle target through the aerial visual angle can help the tasks of urban intelligent traffic, urban traffic planning, military target detection and tracking, cross-regional remote monitoring and the like to be smoothly implemented. The identification and detection of vehicle objects is an important and fundamental function in the above task. The quality of the remote sensing image changes along with different acquisition platforms and acquisition modes of the remote sensing image. Different ground sampling distances create different scales for the same target, which presents challenges for the detection of different targets, especially small targets.
The traditional method for identifying the remote sensing image vehicle by using manual feature extraction has high design difficulty and low identification rate, is difficult to accurately identify the vehicle in a small and dense vehicle target area, and is difficult to avoid complex ground environment interference information.
With the development of deep learning technology, vehicle target semantic information can be easily acquired through training of a deep neural network. There are not trivial challenges to accurately identify the specific location of the vehicle. Among them, the feature pyramid formed based on the deep neural network is widely used in the field of detection of multi-scale targets and small targets. The characteristic diagrams with different scales are selected for detection according to the area information of the target, so that certain effect improvement is achieved. However, the vehicle targets are mostly concentrated on lower-layer features due to the large number of small targets, and the lower-layer features obtained by simple up-sampling and addition often do not have rich semantic features.
Disclosure of Invention
In view of the technical problems, the invention is used for providing a remote sensing image vehicle target detection method based on a multi-scale attention mechanism, and aiming at the characteristics of smaller vehicle targets, an attention mechanism strengthening mode is adopted for the low-level characteristics of a characteristic pyramid. By fusing the channel attention mechanism and the space attention mechanism for the feature graph of the lower layer, the features of the lower layer have different weights on channel and space information, more accurate semantic information is provided for target identification and detection of a subsequent network, and interference of background information in a remote sensing image on a vehicle target is reduced.
In order to solve the technical problems, the invention adopts the following technical scheme:
a remote sensing image vehicle target detection method based on a multi-scale attention mechanism comprises the following steps:
s1, extracting the features of the original picture by using a multilayer convolutional neural network, and constructing a bottom-up pyramid network from the generated feature pictures with different scales;
s2, for the constructed pyramid network, realizing the feature fusion from top to bottom, and in the fusion process, sequentially carrying out channel attention operation on the high-level feature map and fusing the high-level feature map to the low-level feature map;
s3, acquiring the spatial attention information of the fused low-level feature map, and fusing the spatial attention information into the original low-level feature;
s4, generating a large number of candidate frames according to the preset size, proportion and the like, determining a used feature map according to the size of the real frame of the detection target, and judging the positive and negative of the candidate frames according to the intersection ratio of the real frame and the candidate frames;
and S5, directly predicting the category information and regression information of the obtained positive sample candidate frame, and filtering the obtained overlapping region same category candidate frame by using a non-maximum inhibition method to obtain a final detection result.
Preferably, the S1 includes: selecting ResNet-50 as a basic convolution neural network, enabling pictures to pass through the network, outputting feature maps with different scales at different layers, enabling each feature map to be output by the neural network through which the next feature map passes, enabling different features to have different channel numbers at the moment, enabling the features at the upper layer to be more in channel number and smaller in scale, firstly performing channel number unification on the different feature maps, and enabling the process to be as follows:
Pi=Conv3×3(Ci,256,3,1,1) (1)
wherein, PiCharacteristic diagram, Conv, representing the i-th layer3x3Denotes the 3 × 3 convolutional layer, CiI-th characteristic diagram showing the input picture obtained through ResNet-50, convolution layer Conv at 3 × 33x3Inner, CiThe number of channels of the input feature map is 256, the number of channels of the output feature map is 3, the size of the convolution kernel used is represented by 3, 1 represents the step size of each movement of the convolution kernel, and 1 represents the number of boundary fillings for the feature map.
Preferably, the S2 includes: the feature map fusion of each time always involves one high-level feature and one low-level feature in operation, the high-level feature map P4 is translated and unchanged, the next high-level feature map P3 fuses information from the feature map P4, the maximum pooling and average pooling of channels is firstly carried out on the low-level features, then the two pooled results are input into 1 × 1 convolution, and a feature block with 256 channels and 1 × 1 scale is obtained; secondly, the feature block and the low-level feature map are subjected to channel multiplication to obtain a low-level feature map containing the attention of a channel, and the process is expressed as the following form:
whereinRepresents PiObtaining a profile, P, through the channel attentioni-1Is PiOf the next layer network, Conv1×1Represents a convolution operation of 1 × 1, cat () represents a join operation of the feature graph, Cmaxpool() Denotes maximum pooling of channels, Cavgpool() Representing channel-averaged pooling, unomple () is an upsampling of the feature map.
Preferably, the S3 includes: firstly, performing spatial maximum pooling on the feature map obtained in the last step to obtain feature blocks with unchanged scale and 1 channel number, and then simultaneously obtaining average pooled feature blocks; splicing the two feature blocks, and then sending the spliced feature blocks into a convolution block with the convolution kernel size of 1x1 to obtain a feature block with the channel number of 1, wherein the feature block is fused with spatial information in a feature map;
secondly, activating the value of a pixel point in the characteristic diagram to be between 0 and 1 by utilizing a Sigmoid () activation function; finally, the final result is obtained by multiplying the feature map by the matrix of the feature block, and the process can be expressed as the following form:
wherein the content of the first and second substances,representing the characteristic map, S, finally obtained by the channel attention and the spatial attentionmaxpool() Representing the spatial maximum pooling, Savgpool() Representing space average pooling, and Sigmoid () representing Sigmoid activation of the feature block obtained after convolution.
Preferably, the S4 includes: after generating a characteristic pyramid and fusing attention information, the network has a plurality of characteristic graphs with 256 channels with different scales from top to bottom, generates a large number of candidate frames in the input remote sensing picture by a candidate region generation method, and judges the positive and negative of the candidate frames according to the intersection of a target frame of a vehicle target in the input remote sensing picture and the candidate after filtering the candidate frames beyond the picture boundary. For the candidate box of the positive sample, the area is considered to have the vehicle object.
Preferably, the S5 includes: the plurality of feature maps obtained in S3 are put into two sub-networks: the target frame type prediction sub-network and the target frame regression sub-network are characterized in that the target frame type prediction sub-network conducts multiple convolution on an input feature graph to obtain a feature block with an unchanged scale, the number of channels is 2, 2 indicates that the prediction types are two, a vehicle target and a non-vehicle target are provided, the target frame regression sub-network conducts multiple convolution on the input feature graph to obtain a feature block with an unchanged scale, the number of channels is 4, and 4 indicates the number of regression parameters of a target frame.
The invention has the following beneficial effects:
(1) according to the embodiment of the invention, the attention information of the vehicle target in the feature map is considered when the feature pyramid is used, and the fused attention information is utilized to extract the important information of the vehicle target in the space and the channel of the feature map.
(2) According to the embodiment of the invention, two attention mechanisms are fused into the characteristic pyramid network, so that the accuracy and recall rate of the target detection result of the remote sensing image are improved on the premise of not greatly increasing the internal memory and running time of the network.
Drawings
FIG. 1 is a schematic diagram of a remote sensing image vehicle target detection method based on a multi-scale attention mechanism according to the invention;
FIG. 2 is a schematic diagram of a method of incorporating attention into a pyramid of features according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 shows a method for detecting a vehicle target based on a remote sensing image of a multi-scale attention mechanism, which includes the following steps:
and S1, performing feature extraction on the original picture by using a multilayer convolutional neural network, and constructing a bottom-up pyramid network from the generated feature maps with different scales.
As a specific implementation mode, ResNet-50 is selected as a basic convolutional neural network. As shown in the left side of fig. 1, the picture passes through the network, and different scales of feature maps are output at different layers, and each previous feature map is output by the next feature map through the neural network. The different features have different channel numbers, and the higher the upper layer features, the more the channel numbers are, but the smaller the dimension is. Firstly, channel numbers of different feature maps are unified. The process is as follows:
Pi=Conv3×3(Ci,256,3,1,1) (1)
wherein, PiRepresenting the characteristic diagram of the ith layer. Conv3x3Represents 3 × 3 convolutional layer. CiI-th characteristic diagram showing input picture obtained through ResNet-50 at 3 × 33x3Inner, CiThe number of channels of the input feature map is 256, the number of channels of the output feature map is 3, the size of the convolution kernel used is represented by 3, 1 represents the step size of each movement of the convolution kernel, and 1 represents the number of boundary fillings for the feature map.
And S2, realizing top-down feature fusion for the constructed pyramid network. In the fusion process, channel attention operation is carried out on the high-level feature map in sequence and the high-level feature map is fused to the low-level feature map.
As a specific implementation mode, each time the feature map fusion is always operated by one high-level feature and one low-level feature. As shown on the right side of fig. 1, the high level feature map P4 is translated unchanged, and the next highest level feature map P3 will fuse the information from feature map P4. As shown in the left side of fig. 2, the channel maximum pooling and average pooling is performed on the low-level features first, and then the two pooled results are input into a 1 × 1 convolution to obtain a feature block with 256 channels and a scale size of 1 × 1. And secondly, channel multiplication is carried out on the feature block and the low-level feature map to obtain the low-level feature map containing the attention of the channel. The process can be expressed in the following form:
whereinRepresents PiThe feature map is obtained by channel attention. Pi-1Is PiThe next layer of network. Conv1×1Represents the convolution operation of 1 × 1, cat () represents the join operation of the feature graph Cmaxpool() Denotes maximum pooling of channels, Cavgpool() Indicating channel average pooling. Unomple () is an upsampling of a feature map.
And S3, acquiring the spatial attention information of the fused low-level feature map, and fusing the spatial attention information into the original low-level feature.
As a specific embodiment, as shown in the right side of fig. 2, the feature map obtained in the previous step is first pooled to the maximum space to obtain feature blocks with the same scale and the number of channels being 1, and then the feature blocks with the average pooling are obtained at the same time. And splicing the two feature blocks, and sending the spliced feature blocks into a convolution block with the convolution kernel size of 1x1 to obtain a feature block with the channel number of 1, wherein the feature block fuses spatial information in the feature map.
Then, activating the value of the pixel point in the feature map to be between 0 and 1 by using a Sigmoid () activation function. And finally, multiplying the characteristic diagram by the matrix of the characteristic block to obtain a final result. The process can be expressed in the following form:
wherein the content of the first and second substances,representing the characteristic map finally obtained by the channel attention and the spatial attention. Smaxpool() Representing the spatial maximum pooling, Savgpool() Representing the spatial average pooling. Sigmoid () represents Sigmoid activation of the feature block obtained after convolution.
At S4, a large number of candidate frames are generated by a predetermined size, ratio, or the like. And determining the used characteristic diagram according to the size of the real frame of the detection target. And judging the positive and negative of the candidate frame through the intersection ratio of the real frame and the candidate frame.
As a specific implementation mode, after the network generates a feature pyramid and fuses attention information, the network has a feature map with 256 channels with different scales from top to bottom, a large number of candidate frames are generated in an input remote sensing picture through a candidate region generation method, after candidate frames exceeding the picture boundary are filtered, the positive and negative of the candidate frames are judged according to the intersection and combination ratio of a target frame of a vehicle target and the candidates in the input remote sensing picture. For the candidate box of the positive sample, the area is considered to have the vehicle object.
S5, the category information and regression information of the positive sample candidate box are directly predicted. And filtering the obtained overlapped region same-class candidate frames by using a non-maximum inhibition method to obtain a final detection result.
As a specific implementation, the plurality of feature maps obtained in step 3 are imported into two sub-networks: target box class prediction subnetwork and target box regression subnetwork. The target frame type prediction sub-network performs multiple convolutions on the input feature map to obtain feature blocks with unchanged scales and 2 channels (2 indicates that two types of prediction types are available, namely, a vehicle target and a non-vehicle target). The target frame regression subnetwork convolutes the input feature graph for multiple times to obtain feature blocks with unchanged scales and 4 channels (4 represents the number of regression parameters of the target frame).
To verify the validity of the inventive scheme, the following simulation experiment was performed.
Firstly, loading a pre-training model ResNet-50 provided by torchvision to initialize network parameters, inputting the processed remote sensing picture with the label into a neural network, and extracting feature maps with different scales and different channel numbers of the picture. And (3) forming a characteristic pyramid network by adopting the mode of the step 1.
Then, attention information fusion is carried out on each feature map except the highest layer in the feature pyramid. The high-level feature map is firstly subjected to global channel maximum pooling and global channel average pooling. The obtained connected feature block is subjected to a 1x1 convolution operation to obtain the channel attention information of a single channel, and the attention information block of the single channel is multiplied by the low-level feature map. And 2 times down sampling the high-level feature map and adding the down-sampled high-level feature map and the low-level feature map fused with the channel attention information.
Secondly, the feature map containing the channel attention information obtained in the last step is subjected to spatial maximum pooling and spatial maximum pooling. Likewise, the concatenated feature blocks are convolved by 1 × 1, and the number of channels is reduced to 1. And then activating the value of each pixel point in the obtained space attention information block to be between 0 and 1 by using a sigmoid () activation function. The closer to 1 pixel point, the higher the importance. And finally, multiplying the feature block by a feature map, and simultaneously obtaining the feature map with channel attention information and space attention information.
Then, a subsequent class prediction sub-network and object box regression sub-network are generated for each feature map. In the class prediction subnetwork, the input feature map is WxHx256 in size. The feature blocks of WxHx2 and WxHx4 are obtained through two FCN-like subnets, respectively. Meanwhile, a large number of candidate frames are generated on feature maps of different scales respectively. These candidate boxes determine whether the candidate box is a positive case by a cross-over ratio (here 0.5) with the true box in the graph.
Finally, after the characteristic diagram of the positive sample candidate box is determined, two sub-networks subsequent to the characteristic diagram of the layer calculate the network loss. For class prediction sub-networks, Focal distance is used, and for target box regression sub-networks, SmoothL1Loss is used. During the inference phase, the sub-network outputs the target box and the confidence of the target box, respectively. And (4) screening confidence degrees through a threshold of 0.05, and filtering some low-confidence-degree target frames with overlapped regions by using a non-maximum inhibition method and taking 0.5 as a reference.
In addition, the scale information and the definition of the vehicle target in satellite images of different sampling distances and different areas have larger deviation because the vehicle target in the remote sensing image is smaller. In a general vehicle region, background information is complex, and interference is caused to vehicle detection in a remote sensing image. The method combines the characteristics of the vehicle target in the remote sensing image, strengthens the semantic information of the low-level features by fusing various attention mechanisms in the feature pyramid, enables the information part which can represent the vehicle in the feature map to be more prominent in a channel and a space, and weakens the influence of background noise information on the detection result.
In conclusion, the invention further improves the vehicle detection performance in the remote sensing image by combining the data characteristics of the vehicle target in the remote sensing image.
It is to be understood that the exemplary embodiments described herein are illustrative and not restrictive. Although one or more embodiments of the present invention have been described with reference to the accompanying drawings, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.
Claims (6)
1. A remote sensing image vehicle target detection method based on a multi-scale attention mechanism is characterized by comprising the following steps:
s1, extracting the features of the original picture by using a multilayer convolutional neural network, and constructing a bottom-up pyramid network from the generated feature pictures with different scales;
s2, for the constructed pyramid network, realizing the feature fusion from top to bottom, and in the fusion process, sequentially carrying out channel attention operation on the high-level feature map and fusing the high-level feature map to the low-level feature map;
s3, acquiring the spatial attention information of the fused low-level feature map, and fusing the spatial attention information into the original low-level feature;
s4, generating a large number of candidate frames according to the preset size, proportion and the like, determining a used feature map according to the size of the real frame of the detection target, and judging the positive and negative of the candidate frames according to the intersection ratio of the real frame and the candidate frames;
and S5, directly predicting the category information and regression information of the obtained positive sample candidate frame, and filtering the obtained overlapping region same category candidate frame by using a non-maximum inhibition method to obtain a final detection result.
2. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S1 comprises: selecting ResNet-50 as a basic convolution neural network, enabling pictures to pass through the network, outputting feature maps with different scales at different layers, enabling each feature map to be output by the neural network through which the next feature map passes, enabling different features to have different channel numbers at the moment, enabling the features at the upper layer to be more in channel number and smaller in scale, firstly performing channel number unification on the different feature maps, and enabling the process to be as follows:
Pi=Conv3×3(Ci,256,3,1,1) (1)
wherein, PiCharacteristic diagram, Conv, representing the i-th layer3x3Denotes the 3 × 3 convolutional layer, CiI-th characteristic diagram showing the input picture obtained through ResNet-50, convolution layer Conv at 3 × 33x3Inner, CiThe number of channels of the input feature map is 256, the number of channels of the output feature map is 3, the size of the convolution kernel used is represented by 3, 1 represents the step size of each movement of the convolution kernel, and 1 represents the number of boundary fillings for the feature map.
3. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S2 comprises: the feature map fusion of each time always involves one high-level feature and one low-level feature in operation, the high-level feature map P4 is translated and unchanged, the next high-level feature map P3 fuses information from the feature map P4, the maximum pooling and average pooling of channels is firstly carried out on the low-level features, then the two pooled results are input into 1 × 1 convolution, and a feature block with 256 channels and 1 × 1 scale is obtained; secondly, the feature block and the low-level feature map are subjected to channel multiplication to obtain a low-level feature map containing the attention of a channel, and the process is expressed as the following form:
whereinRepresents PiObtaining a profile, P, through the channel attentioni-1Is PiOf the next layer network, Conv1×1Represents a convolution operation of 1 × 1, cat () represents a join operation of the feature graph, Cmaxpool() Denotes maximum pooling of channels, Cavgpool() Representing channel-averaged pooling, unomple () is an upsampling of the feature map.
4. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S3 comprises: firstly, performing spatial maximum pooling on the feature map obtained in the last step to obtain feature blocks with unchanged scale and 1 channel number, and then simultaneously obtaining average pooled feature blocks; splicing the two feature blocks, and sending the spliced two feature blocks into a convolution block with the convolution kernel size of 1 multiplied by 1 to obtain a feature block with the channel number of 1, wherein the feature block is fused with spatial information in a feature map;
secondly, activating the value of a pixel point in the characteristic diagram to be between 0 and 1 by utilizing a Sigmoid () activation function;
finally, the final result is obtained by multiplying the feature map by the matrix of the feature block, and the process can be expressed as the following form:
wherein the content of the first and second substances,representing the characteristic map, S, finally obtained by the channel attention and the spatial attentionmaxpool() Representing the spatial maximum pooling, Savgpool() Representing space average pooling, and Sigmoid () representing Sigmoid activation of the feature block obtained after convolution.
5. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S4 comprises: after generating a characteristic pyramid and fusing attention information, the network has a plurality of characteristic graphs with 256 channels with different scales from top to bottom, generates a large number of candidate frames in the input remote sensing picture by a candidate region generation method, and judges the positive and negative of the candidate frames according to the intersection ratio of a target frame of a vehicle target in the input remote sensing picture and the candidates after filtering the candidate frames beyond the picture boundary. For the candidate box of the positive sample, the area is considered to have the vehicle object.
6. The method for remotely sensing an image vehicle object based on the multi-scale attention mechanism as claimed in claim 1, wherein said S5 comprises: the plurality of feature maps obtained in S3 are put into two sub-networks: the target frame type prediction sub-network and the target frame regression sub-network are characterized in that the target frame type prediction sub-network conducts multiple convolution on an input feature graph to obtain a feature block with an unchanged scale, the number of channels is 2, 2 indicates that the prediction types are two, a vehicle target and a non-vehicle target are provided, the target frame regression sub-network conducts multiple convolution on the input feature graph to obtain a feature block with an unchanged scale, the number of channels is 4, and 4 indicates the number of regression parameters of a target frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010521480.1A CN111738110A (en) | 2020-06-10 | 2020-06-10 | Remote sensing image vehicle target detection method based on multi-scale attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010521480.1A CN111738110A (en) | 2020-06-10 | 2020-06-10 | Remote sensing image vehicle target detection method based on multi-scale attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111738110A true CN111738110A (en) | 2020-10-02 |
Family
ID=72648522
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010521480.1A Pending CN111738110A (en) | 2020-06-10 | 2020-06-10 | Remote sensing image vehicle target detection method based on multi-scale attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111738110A (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112396115A (en) * | 2020-11-23 | 2021-02-23 | 平安科技(深圳)有限公司 | Target detection method and device based on attention mechanism and computer equipment |
CN112489001A (en) * | 2020-11-23 | 2021-03-12 | 石家庄铁路职业技术学院 | Tunnel water seepage detection method based on improved deep learning |
CN112529005A (en) * | 2020-12-11 | 2021-03-19 | 西安电子科技大学 | Target detection method based on semantic feature consistency supervision pyramid network |
CN112560907A (en) * | 2020-12-02 | 2021-03-26 | 西安电子科技大学 | Limited pixel infrared unmanned aerial vehicle target detection method based on mixed domain attention |
CN112633158A (en) * | 2020-12-22 | 2021-04-09 | 广东电网有限责任公司电力科学研究院 | Power transmission line corridor vehicle identification method, device, equipment and storage medium |
CN112633352A (en) * | 2020-12-18 | 2021-04-09 | 浙江大华技术股份有限公司 | Target detection method and device, electronic equipment and storage medium |
CN112800964A (en) * | 2021-01-27 | 2021-05-14 | 中国人民解放军战略支援部队信息工程大学 | Remote sensing image target detection method and system based on multi-module fusion |
CN112906718A (en) * | 2021-03-09 | 2021-06-04 | 西安电子科技大学 | Multi-target detection method based on convolutional neural network |
CN112990299A (en) * | 2021-03-11 | 2021-06-18 | 五邑大学 | Depth map acquisition method based on multi-scale features, electronic device and storage medium |
CN113011443A (en) * | 2021-04-23 | 2021-06-22 | 电子科技大学 | Key point-based target detection feature fusion method |
CN113065601A (en) * | 2021-04-12 | 2021-07-02 | 陕西理工大学 | Deep learning forest fire abnormity detection method based on genetic algorithm optimization |
CN113095265A (en) * | 2021-04-21 | 2021-07-09 | 西安电子科技大学 | Fungal target detection method based on feature fusion and attention |
CN113111718A (en) * | 2021-03-16 | 2021-07-13 | 苏州海宸威视智能科技有限公司 | Fine-grained weak-feature target emergence detection method based on multi-mode remote sensing image |
CN113128575A (en) * | 2021-04-01 | 2021-07-16 | 西安电子科技大学广州研究院 | Target detection sample balancing method based on soft label |
CN113255443A (en) * | 2021-04-16 | 2021-08-13 | 杭州电子科技大学 | Pyramid structure-based method for positioning time sequence actions of graph attention network |
CN113361428A (en) * | 2021-06-11 | 2021-09-07 | 浙江澄视科技有限公司 | Image-based traffic sign detection method |
CN113469287A (en) * | 2021-07-27 | 2021-10-01 | 北京信息科技大学 | Spacecraft multi-local component detection method based on instance segmentation network |
CN113658114A (en) * | 2021-07-29 | 2021-11-16 | 南京理工大学 | Contact net opening pin defect target detection method based on multi-scale cross attention |
CN113743521A (en) * | 2021-09-10 | 2021-12-03 | 中国科学院软件研究所 | Target detection method based on multi-scale context sensing |
CN113920468A (en) * | 2021-12-13 | 2022-01-11 | 松立控股集团股份有限公司 | Multi-branch pedestrian detection method based on cross-scale feature enhancement |
CN114092813A (en) * | 2021-11-25 | 2022-02-25 | 中国科学院空天信息创新研究院 | Industrial park image extraction method, model, electronic equipment and storage medium |
CN114332644A (en) * | 2021-12-30 | 2022-04-12 | 北京建筑大学 | Large-view-field traffic density acquisition method based on video satellite data |
CN114445801A (en) * | 2022-01-25 | 2022-05-06 | 杭州飞步科技有限公司 | Lane line detection method based on cross-layer optimization |
CN115984661A (en) * | 2023-03-20 | 2023-04-18 | 北京龙智数科科技服务有限公司 | Multi-scale feature map fusion method, device, equipment and medium in target detection |
CN116091787A (en) * | 2022-10-08 | 2023-05-09 | 中南大学 | Small sample target detection method based on feature filtering and feature alignment |
CN117496132A (en) * | 2023-12-29 | 2024-02-02 | 数据空间研究院 | Scale sensing detection method for small-scale target detection |
CN117689880A (en) * | 2024-02-01 | 2024-03-12 | 东北大学 | Method and system for target recognition in biomedical images based on machine learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2780595A1 (en) * | 2011-06-22 | 2012-12-22 | Roman Palenychka | Method and multi-scale attention system for spatiotemporal change determination and object detection |
CN110084210A (en) * | 2019-04-30 | 2019-08-02 | 电子科技大学 | The multiple dimensioned Ship Detection of SAR image based on attention pyramid network |
CN110533084A (en) * | 2019-08-12 | 2019-12-03 | 长安大学 | A kind of multiscale target detection method based on from attention mechanism |
CN110909642A (en) * | 2019-11-13 | 2020-03-24 | 南京理工大学 | Remote sensing image target detection method based on multi-scale semantic feature fusion |
CN111179217A (en) * | 2019-12-04 | 2020-05-19 | 天津大学 | Attention mechanism-based remote sensing image multi-scale target detection method |
-
2020
- 2020-06-10 CN CN202010521480.1A patent/CN111738110A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2780595A1 (en) * | 2011-06-22 | 2012-12-22 | Roman Palenychka | Method and multi-scale attention system for spatiotemporal change determination and object detection |
CN110084210A (en) * | 2019-04-30 | 2019-08-02 | 电子科技大学 | The multiple dimensioned Ship Detection of SAR image based on attention pyramid network |
CN110533084A (en) * | 2019-08-12 | 2019-12-03 | 长安大学 | A kind of multiscale target detection method based on from attention mechanism |
CN110909642A (en) * | 2019-11-13 | 2020-03-24 | 南京理工大学 | Remote sensing image target detection method based on multi-scale semantic feature fusion |
CN111179217A (en) * | 2019-12-04 | 2020-05-19 | 天津大学 | Attention mechanism-based remote sensing image multi-scale target detection method |
Non-Patent Citations (2)
Title |
---|
庞立新;高凡;何大海;李满勤;刘方尧;: "一种基于注意力机制RetinaNet的小目标检测方法" * |
沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络" * |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112489001B (en) * | 2020-11-23 | 2023-07-25 | 石家庄铁路职业技术学院 | Tunnel water seepage detection method based on improved deep learning |
CN112489001A (en) * | 2020-11-23 | 2021-03-12 | 石家庄铁路职业技术学院 | Tunnel water seepage detection method based on improved deep learning |
CN112396115A (en) * | 2020-11-23 | 2021-02-23 | 平安科技(深圳)有限公司 | Target detection method and device based on attention mechanism and computer equipment |
WO2021208726A1 (en) * | 2020-11-23 | 2021-10-21 | 平安科技(深圳)有限公司 | Target detection method and apparatus based on attention mechanism, and computer device |
CN112396115B (en) * | 2020-11-23 | 2023-12-22 | 平安科技(深圳)有限公司 | Attention mechanism-based target detection method and device and computer equipment |
CN112560907A (en) * | 2020-12-02 | 2021-03-26 | 西安电子科技大学 | Limited pixel infrared unmanned aerial vehicle target detection method based on mixed domain attention |
CN112560907B (en) * | 2020-12-02 | 2024-05-28 | 西安电子科技大学 | Finite pixel infrared unmanned aerial vehicle target detection method based on mixed domain attention |
CN112529005B (en) * | 2020-12-11 | 2022-12-06 | 西安电子科技大学 | Target detection method based on semantic feature consistency supervision pyramid network |
CN112529005A (en) * | 2020-12-11 | 2021-03-19 | 西安电子科技大学 | Target detection method based on semantic feature consistency supervision pyramid network |
CN112633352A (en) * | 2020-12-18 | 2021-04-09 | 浙江大华技术股份有限公司 | Target detection method and device, electronic equipment and storage medium |
CN112633352B (en) * | 2020-12-18 | 2023-08-29 | 浙江大华技术股份有限公司 | Target detection method and device, electronic equipment and storage medium |
CN112633158A (en) * | 2020-12-22 | 2021-04-09 | 广东电网有限责任公司电力科学研究院 | Power transmission line corridor vehicle identification method, device, equipment and storage medium |
CN112800964A (en) * | 2021-01-27 | 2021-05-14 | 中国人民解放军战略支援部队信息工程大学 | Remote sensing image target detection method and system based on multi-module fusion |
CN112906718A (en) * | 2021-03-09 | 2021-06-04 | 西安电子科技大学 | Multi-target detection method based on convolutional neural network |
CN112906718B (en) * | 2021-03-09 | 2023-08-22 | 西安电子科技大学 | Multi-target detection method based on convolutional neural network |
CN112990299A (en) * | 2021-03-11 | 2021-06-18 | 五邑大学 | Depth map acquisition method based on multi-scale features, electronic device and storage medium |
CN112990299B (en) * | 2021-03-11 | 2023-10-17 | 五邑大学 | Depth map acquisition method based on multi-scale features, electronic equipment and storage medium |
CN113111718A (en) * | 2021-03-16 | 2021-07-13 | 苏州海宸威视智能科技有限公司 | Fine-grained weak-feature target emergence detection method based on multi-mode remote sensing image |
CN113128575A (en) * | 2021-04-01 | 2021-07-16 | 西安电子科技大学广州研究院 | Target detection sample balancing method based on soft label |
CN113065601A (en) * | 2021-04-12 | 2021-07-02 | 陕西理工大学 | Deep learning forest fire abnormity detection method based on genetic algorithm optimization |
CN113255443A (en) * | 2021-04-16 | 2021-08-13 | 杭州电子科技大学 | Pyramid structure-based method for positioning time sequence actions of graph attention network |
CN113255443B (en) * | 2021-04-16 | 2024-02-09 | 杭州电子科技大学 | Graph annotation meaning network time sequence action positioning method based on pyramid structure |
CN113095265A (en) * | 2021-04-21 | 2021-07-09 | 西安电子科技大学 | Fungal target detection method based on feature fusion and attention |
CN113011443B (en) * | 2021-04-23 | 2022-06-03 | 电子科技大学 | Key point-based target detection feature fusion method |
CN113011443A (en) * | 2021-04-23 | 2021-06-22 | 电子科技大学 | Key point-based target detection feature fusion method |
CN113361428B (en) * | 2021-06-11 | 2023-03-24 | 浙江澄视科技有限公司 | Image-based traffic sign detection method |
CN113361428A (en) * | 2021-06-11 | 2021-09-07 | 浙江澄视科技有限公司 | Image-based traffic sign detection method |
CN113469287A (en) * | 2021-07-27 | 2021-10-01 | 北京信息科技大学 | Spacecraft multi-local component detection method based on instance segmentation network |
CN113658114A (en) * | 2021-07-29 | 2021-11-16 | 南京理工大学 | Contact net opening pin defect target detection method based on multi-scale cross attention |
CN113743521A (en) * | 2021-09-10 | 2021-12-03 | 中国科学院软件研究所 | Target detection method based on multi-scale context sensing |
CN113743521B (en) * | 2021-09-10 | 2023-06-27 | 中国科学院软件研究所 | Target detection method based on multi-scale context awareness |
CN114092813A (en) * | 2021-11-25 | 2022-02-25 | 中国科学院空天信息创新研究院 | Industrial park image extraction method, model, electronic equipment and storage medium |
CN114092813B (en) * | 2021-11-25 | 2022-08-05 | 中国科学院空天信息创新研究院 | Industrial park image extraction method and system, electronic equipment and storage medium |
CN113920468A (en) * | 2021-12-13 | 2022-01-11 | 松立控股集团股份有限公司 | Multi-branch pedestrian detection method based on cross-scale feature enhancement |
CN113920468B (en) * | 2021-12-13 | 2022-03-15 | 松立控股集团股份有限公司 | Multi-branch pedestrian detection method based on cross-scale feature enhancement |
CN114332644A (en) * | 2021-12-30 | 2022-04-12 | 北京建筑大学 | Large-view-field traffic density acquisition method based on video satellite data |
CN114445801A (en) * | 2022-01-25 | 2022-05-06 | 杭州飞步科技有限公司 | Lane line detection method based on cross-layer optimization |
CN116091787A (en) * | 2022-10-08 | 2023-05-09 | 中南大学 | Small sample target detection method based on feature filtering and feature alignment |
CN115984661B (en) * | 2023-03-20 | 2023-08-29 | 北京龙智数科科技服务有限公司 | Multi-scale feature map fusion method, device, equipment and medium in target detection |
CN115984661A (en) * | 2023-03-20 | 2023-04-18 | 北京龙智数科科技服务有限公司 | Multi-scale feature map fusion method, device, equipment and medium in target detection |
CN117496132A (en) * | 2023-12-29 | 2024-02-02 | 数据空间研究院 | Scale sensing detection method for small-scale target detection |
CN117689880A (en) * | 2024-02-01 | 2024-03-12 | 东北大学 | Method and system for target recognition in biomedical images based on machine learning |
CN117689880B (en) * | 2024-02-01 | 2024-04-16 | 东北大学 | Method and system for target recognition in biomedical images based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111738110A (en) | Remote sensing image vehicle target detection method based on multi-scale attention mechanism | |
CN108764063B (en) | Remote sensing image time-sensitive target identification system and method based on characteristic pyramid | |
CN112818903B (en) | Small sample remote sensing image target detection method based on meta-learning and cooperative attention | |
CN109086668B (en) | Unmanned aerial vehicle remote sensing image road information extraction method based on multi-scale generation countermeasure network | |
CN110119148B (en) | Six-degree-of-freedom attitude estimation method and device and computer readable storage medium | |
CN108305260B (en) | Method, device and equipment for detecting angular points in image | |
CN110516514B (en) | Modeling method and device of target detection model | |
CN112084869A (en) | Compact quadrilateral representation-based building target detection method | |
CN113609896A (en) | Object-level remote sensing change detection method and system based on dual-correlation attention | |
CN113901900A (en) | Unsupervised change detection method and system for homologous or heterologous remote sensing image | |
CN113313094B (en) | Vehicle-mounted image target detection method and system based on convolutional neural network | |
CN115631344B (en) | Target detection method based on feature self-adaptive aggregation | |
CN113255589A (en) | Target detection method and system based on multi-convolution fusion network | |
CN114119610B (en) | Defect detection method based on rotating target detection | |
CN114494870A (en) | Double-time-phase remote sensing image change detection method, model construction method and device | |
CN110909656B (en) | Pedestrian detection method and system integrating radar and camera | |
CN115984537A (en) | Image processing method and device and related equipment | |
CN115620141A (en) | Target detection method and device based on weighted deformable convolution | |
CN111860411A (en) | Road scene semantic segmentation method based on attention residual error learning | |
CN115661767A (en) | Image front vehicle target identification method based on convolutional neural network | |
CN113378642B (en) | Method for detecting illegal occupation buildings in rural areas | |
CN114724021A (en) | Data identification method and device, storage medium and electronic device | |
CN115376094B (en) | Scale-perception neural network-based road surface identification method and system for unmanned sweeper | |
CN111160282A (en) | Traffic light detection method based on binary Yolov3 network | |
CN115984712A (en) | Multi-scale feature-based remote sensing image small target detection method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |