CN116883859A - Remote sensing image target detection method based on YOLOv7-RS - Google Patents

Remote sensing image target detection method based on YOLOv7-RS Download PDF

Info

Publication number
CN116883859A
CN116883859A CN202310818961.2A CN202310818961A CN116883859A CN 116883859 A CN116883859 A CN 116883859A CN 202310818961 A CN202310818961 A CN 202310818961A CN 116883859 A CN116883859 A CN 116883859A
Authority
CN
China
Prior art keywords
remote sensing
sensing image
loss
yolov7
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310818961.2A
Other languages
Chinese (zh)
Inventor
梁琦
曹亚明
杨晓文
薛红新
贾彩琴
郭磊
孙福盛
焦世超
赵融
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North University of China
Original Assignee
North University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North University of China filed Critical North University of China
Priority to CN202310818961.2A priority Critical patent/CN116883859A/en
Publication of CN116883859A publication Critical patent/CN116883859A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of computer vision, and particularly relates to a remote sensing image target detection method based on YOLOv 7-RS. In order to improve the accuracy of target detection in a remote sensing image, the invention designs a remote sensing image target detection network based on YOLOv7-RS, a D-ELAN module is redesigned in the network, a SimAM attention mechanism is fused in a backbone network, a SIOU loss function is used for replacing the CIOU loss function, and a positive and negative sample distribution strategy is optimized.

Description

Remote sensing image target detection method based on YOLOv7-RS
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a remote sensing image target detection method based on YOLOv 7-RS.
Background
With the continuous development of remote sensing technology, remote sensing image target detection has become an important research direction in the field of remote sensing image interpretation. The target detection based on the remote sensing image has important significance in the fields of military operations, national defense safety and the like, the target detection accuracy is improved, the focused target information can be quickly detected in a large amount of image data, and the intelligence searching capability is improved.
In recent years, the rapid development of deep learning provides an advantageous technical support for the feature extraction of remote sensing images. The target detection method based on deep learning mostly uses the convolutional neural network as a main network, because the convolutional neural network can automatically extract high-level semantic features, compared with the traditional manual feature extraction method, the method has stronger feature representation capability, and meanwhile, the capability of the convolutional neural network to actively learn features has stronger advantages in big data age. The rapid development of convolutional neural networks solves a plurality of problems in the field of computer vision, and has great success in the field of image target detection. However, because the remote sensing image target has the characteristics of multiple scales, multiple rotation angles, complex scenes and the like, under the condition of limited high-quality mark samples, deep learning still faces a great challenge in remote sensing image target detection application.
The current target detection algorithm based on deep learning is mainly divided into double-stage target detection and single-stage target detection. The YOLO series algorithm is a typical single-stage object detection algorithm. YOLOv1 was first proposed in 2015, and the problem of low network reasoning speed in two-stage detection is effectively solved. YOLO v2 improves from three angles of faster, more, and more accurate, and recognition objects are also expanded to 9000, and thus is also called YOLO9000.YOLOv3 introduces a feature pyramid FPN and a residual error module Darknet-53, supports detection of three different-scale object detection, and realizes multi-scale fusion. The popular technologies such as YOLOv4 and YOLOv5 combined with Weighted Residual Connection (WRC), cross-phase partial Connection (CSP), mosaic data enhancement and the like further improve the detection precision and speed. The YOLOX combines an Anchor-Free network to replace a YOLOv5 coupling detection head with a decoupling detection head, so that the convergence rate of the network is improved, and a positive and negative sample matching strategy SimOTA is provided on the basis of OTA. YOLOv6 is a target detection framework for research and development optimization of the visual intelligence of the beauty community, and is widely applied in the industry. The generation of YOLOv7 in 2022 is 7, an E-ELAN architecture and an auxiliary training module are provided for network performance, and the speed and the accuracy of an algorithm are further improved.
Many achievements and developments have been made by existing research efforts, but there are also problems that need to be further studied and addressed: the object detection algorithm based on the YOLO is good in natural images, but the remote sensing image has the characteristics of complex and various backgrounds and large scale difference due to different imaging modes of the remote sensing image and the natural images, so that the object detection effect of the remote sensing image based on the YOLO is poor.
Disclosure of Invention
The invention aims to solve the problem of poor detection precision of a remote sensing image target, provides a remote sensing image target detection method based on YOLOv7-RS, designs a remote sensing image target detection network based on YOLOv7-RS, and can efficiently solve the problem existing in remote sensing image target detection.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a remote sensing image target detection method based on YOLOv7-RS comprises the following steps:
step 1, acquiring a remote sensing image and preprocessing the remote sensing image;
step 2, constructing a remote sensing image target detection model based on a YOLOv7-RS network structure;
and step 3, inputting the preprocessed remote sensing image and the weight file into the constructed model, and detecting the target of the remote sensing image.
Further, in the step 1, the remote sensing image is preprocessed, specifically: and scaling the acquired remote sensing image to 640X640, wherein the insufficient part is supplemented by adopting a pixel filling mode.
Further, the YOLOv7-RS network structure in the step 1 includes a D-ELAN module, a SIOU loss function part, an Input stage, a Backbone network (Backbone) stage, a Neck network (neg) stage, and a Head network (Head) stage;
the D-ELAN module is designed according to the split gradient flow idea of CSPNet: i.e. the first branch is directly convolved with a 1x1 convolution; the second branch passes through three groups of two 3x3 convolutions on the basis, and finally, the convolution of 1x1 and the convolution results of three groups of 3x3 are spliced, so that the feature extraction capability is improved in a mode of improving the block utilization rate and increasing the network depth.
Still further, in the backbone network stage, a three-dimensional attention module SimAM that merges channel attention and spatial attention is introduced, and the calculation formula is as follows:
wherein ,for output features, X is the input feature, E represents the channel and spatially all neurons minimum energy functionEnergy tensor of individual neurons minimum energy function +.>As shown in formula (2):
wherein t is a target neuron, lambda is a super parameter,is the average value of all neurons on a single channel,/->Is the variance of all neurons on a single channel, as shown in equations (3) and (4):
wherein M represents the number of neurons per channel, X i Representing the ith neuron of the input signature on a single channel.
Still further, in the SIOU loss function, an Angle deviation between a real frame and a predicted frame is defined as an Angle loss, and calculation of a Distance loss is added, namely, the SIOU loss function consists of four parts of an Angle loss (Angle cost), a Distance loss (Distance cost), a Shape loss (Shape cost) and an IOU loss, and a calculation formula is as follows:
wherein, IOU is IOU loss, delta is distance loss, omega is shape loss, and the calculation formulas of the three are as follows:
wherein Λ is the angle loss, C h and Cw The height and width of the minimum circumscribed rectangle of the real frame and the predicted frame are respectively, gamma is given as a distance value with limited time, ρ x Is the difference between the width of the real frame and the width of the predicted frame is C w Specific gravity ρ of (B) y Is the difference between the height of the real frame and the height of the predicted frame is C h Specific gravity of x gt and ygt Respectively the abscissa of the center point of the real frame, x and y are the abscissa of the center point of the predicted frame, sigma is the distance between the center points of the real frame and the predicted frame, and w gt and hgt For the width and height of the real frame, w and h are the width and height of the predicted frame, ω w Is the specific weight, omega of the difference between the width of the real frame and the width of the predicted frame in the maximum value of the two h θ is a degree of attention for controlling the shape loss, which is the specific gravity of the difference between the height of the real frame and the height of the predicted frame in the maximum value of the two.
Still further, only positive samples in the SIOU loss function participate in the calculation of the loss function.
Furthermore, the positive and negative sample distribution strategy is optimized in the SIOU loss function, namely, on the basis of the positive and negative sample distribution strategy of the Yolov7, the rotation invariance of the remote sensing image target is comprehensively considered, and three positive sample candidate frames are added to four positive sample candidate frames.
Compared with the prior art, the invention has the following advantages:
(1) In order to improve the defect of the capability of the YOLOv7 network for extracting the characteristics of the remote sensing image, a D-ELAN module is redesigned.
(2) In order to reduce the interference of background noise in the remote sensing image, a SimAM attention mechanism is fused in the YOLOv7 network, so that the network can pay attention to more valuable information in the remote sensing image.
(3) To increase the convergence speed of the network, the CIOU loss function is replaced with the SIOU loss function.
(4) In order to solve the problem of missed detection when small targets are densely arranged in a remote sensing image, a positive and negative sample distribution strategy is optimized.
(5) The YOLOv7-RS provided by the invention is superior to most of the existing methods, has competitive detection capability on an NWPU VHR-10 data set and a DOTA data set, can be well adapted to the complexity and diversity of remote sensing images, and shows the effectiveness of the method.
Drawings
FIG. 1 is a flow chart of a remote sensing image target detection method based on YOLOv 7-RS;
FIG. 2 is a network structure diagram of a remote sensing image target detection method based on YOLOv7-RS of the invention;
FIG. 3 is a block diagram comparison of the D-ELAN module of the present invention and the ELAN module of Yolov 7;
FIG. 4 is a diagram showing parameters of a real frame and a predicted frame in a SIOU according to the present invention;
FIG. 5 is a schematic diagram of positive and negative sample strategy optimization;
FIG. 6 is a comparison of the results of the visualization of the NWPU VHR-10 dataset;
fig. 7 is a comparison of DOTA dataset visualization.
Detailed Description
The present invention will be described more fully hereinafter in order to facilitate an understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Example 1
As shown in FIG. 1, the method for detecting the target of the remote sensing image based on the YOLOv7-RS comprises the following steps:
step 1, acquiring a remote sensing image and preprocessing the remote sensing image, namely scaling the image to 640x640;
step 2, constructing a remote sensing image target detection model based on a Yolov7-RS network structure (shown in figure 2), wherein the Yolov7-RS network structure comprises a D-ELAN module, a SIOU loss function part, an input end stage, a backbone network stage, a neck network stage and a head network stage;
wherein the D-ELAN (deep-ELAN) module is designed according to the split gradient flow idea of CSPNet: the first branch is directly convolved with a 1x1 convolution; the second branch passes through three groups of two 3x3 convolutions on the basis, and finally, the convolution of 1x1 and the convolution results of three groups of 3x3 are spliced, so that the feature extraction capability is improved in a mode of improving the block utilization rate and increasing the network depth. The D-ELAN module and the pair of block diagrams of the ELAN module are shown in FIG. 3.
In the backbone network stage, a three-dimensional attention module SimAM which fuses channel attention and space attention is introduced, and the calculation formula is as follows:
wherein ,for output features, X is the input feature, E represents the channel and spatially all neurons minimum energy functionEnergy tensor of individual neurons minimum energy function +.>As shown in formula (2):
wherein t is a target neuron, lambda is a super parameter,is the average value of all neurons on a single channel,/->Is the variance of all neurons on a single channel, as shown in equations (3) and (4):
wherein M represents the number of neurons per channel, X i Representing the ith neuron of the input signature on a single channel.
In the SIOU loss function, the angle deviation between the real frame and the predicted frame is defined as an angle loss, and the calculation of the distance loss is added, namely the SIOU loss function consists of four parts of the angle loss, the distance loss, the shape loss and the IOU loss, wherein the calculation formula is as follows:
wherein, IOU is IOU loss, delta is distance loss, omega is shape loss, and the calculation formulas of the three are as follows:
wherein Λ is the angle loss, C h and Cw The height and width of the minimum circumscribed rectangle of the real frame and the predicted frame are respectively, gamma is given as a distance value with limited time, ρ x Is the difference between the width of the real frame and the width of the predicted frame is C w Specific gravity ρ of (B) y Is the difference between the height of the real frame and the height of the predicted frame is C h Specific gravity of x gt and ygt Respectively the abscissa of the center point of the real frame, x and y are the abscissa of the center point of the predicted frame, sigma is the distance between the center points of the real frame and the predicted frame, and w gt and hgt For the width and height of the real frame, w and h are the width and height of the predicted frame, ω w Is the specific weight, omega of the difference between the width of the real frame and the width of the predicted frame in the maximum value of the two h For the specific gravity of the difference between the height of the real frame and the height of the predicted frame in the maximum value of the two, theta is the attention degree of controlling the shape loss, and the parameter range is [2,6]The schematic diagrams of the parameters are shown in fig. 4, wherein the lower left side is a prediction frame, and the upper right side is a real frame.
Only positive samples in the SIOU loss function participate in the calculation of the loss function, and the problem of missed detection in dense arrangement of small targets in the remote sensing image is solved by optimizing a positive and negative sample distribution strategy. The positive and negative sample distribution strategy is optimized by comprehensively considering the rotation invariance of the remote sensing target on the basis of the positive and negative sample distribution strategy of YOLOv7, adding three positive sample candidate frames to four positive sample candidate frames, and calculating to reduce the positive sample loss rate from 46% to 28% under the condition that the remote sensing image rotates by 45 degrees as shown in fig. 5.
And 3, inputting the preprocessed remote sensing image and the weight file into a constructed model, and detecting the target of the remote sensing image, wherein the weight file in the embodiment is an optimal weight file obtained by performing iterative training on the NWPU VHR-10 data set and the DOTA data set for 300 epochs.
Example 2
The remote sensing image target detection network model provided by the invention is applied to the NWPU VHR-10 data set, and experiments show that the network model is effective.
NWPU VHR-10 data sets were published by the northwest industrial university in 2014, with images extracted from Google Earth and Vaihingen, including 10 categories of aircraft (PL), ship (SH), tank (ST), baseball field (BD), tennis Court (TC), basketball Court (BC), ground Track (GTF), port (HA), bridge (BR), and Vehicle (VE), 800 telemetry images (containing 150 background images). The data annotation adopts an HBB (Horizontal Bounding Boxes, horizontal bounding box) annotation format, and total 3651 examples. 90% was randomly divided from the dataset as training set and 10% as test set.
mAP (MeanAverage Precision) is generally used for measuring the overall performance of the model in a target detection task; mAP is the average of the average accuracy AP (Avanrage Precision) of the multiple categories in the dataset; each category can draw a curve in the coordinates from 0 to 1 according to Precision and Recall, and the area enclosed by the curve and the coordinate axis is the average Precision, as shown in the formula (10):
wherein, accuracy Precision represents the proportion of TP in the positive sample predicted in the detector, as shown in formula (11); recall represents the ratio of correctly predicted positive samples to total number of samples in the detector, as shown in equation (12):
where TP is the true case, FN is the false case, and FP is the false case.
The YOLOv7-RS proposed by the invention is experimentally compared with SSD, faster R-CNN, YOLOv3, YOLOv5s, YOLOv7 algorithm on NWPU VHR-10 dataset, and the results are shown in Table 1.
TABLE 1 experimental results of different algorithms on NWPU VHR-10 dataset
As shown in Table 1, YOLOv7-RS showed an improvement of 14.3%, 10.9%, 20.3%, 6.3%, 5.3% and 2.6% in terms of SSD, faster R-CNN, YOLOv3, YOLOv4, YOLOv5s and YOLOv7, respectively. The detection precision of the YOLOv7-RS in each category is above 89%, the overall detection precision is good, and in the target detection of an airplane (PL) and a Storage Tank (ST), the precision is optimal to 99.6% compared with other algorithms; compared with original YOLOv7, the detection accuracy of the aircraft (PL), the Storage Tank (ST), the Tennis Court (TC), the Basketball Court (BC) and the Vehicle (VE) is improved.
The detection results of the Yolov7 and the Yolov7-RS are compared through a large number of experiments, and two groups of visual results are selected for analysis, as shown in FIG. 6, the left side is the detection result of the Yolov7 algorithm, and the right side is the detection result of the Yolov7-RS algorithm.
In fig. 6 (a) the yellow landmark is erroneously detected as an aircraft, and in fig. 6 (b), the bridge is missed. The YOLOv7-RS can accurately detect the target, and the detection effect of the YOLOv7-RS under a complex background is effectively improved.
Example 3
The remote sensing image target detection network model provided by the invention is applied to the DOTA data set, and the validity of the network model is shown through experiments.
The dotav1.0 data set is from google earth, chinese resource satellite data and application center provided GF-2 and JL-1 satellite images, and cyclimedia b.v provided aerial images, including aircraft (PL), ship (SH), small Vehicle (SV), large Vehicle (LV), storage Tank (ST), tennis Court (TC), playground runway (GTF), bridge (BR), loop (RA), swimming Pool (SP), baseball field (BD), basketball Court (BC), harbour (HA), helicopter (HC) and soccer field (SBF) 15 categories, 2806 aerial images from different sensors and platforms, the image sizes varying from 800x800 to 4000x4000, for a total of 188282 examples. In the embodiment, the DOTA_devkit is adopted to preprocess the data set of the HBB labeling mode, the original image is cut into sub-images with 1024x1024 and overlapping pixels of 200, and the image with the resolution not reaching the specified pixels after cutting is filled in a pixel filling mode. The training set after treatment has 15749 pictures, and the test set has 5297 pictures.
The results of experimental comparison of the proposed YOLOv7-RS with SSD, fast R-CNN, YOLOv3, YOLOv5s, YOLOv7 algorithm on NWPU VHR-10 dataset are shown in table 2.
Table 2 comparison of results of different models on DOTA dataset
As can be seen from Table 2, YOLOv7-RS showed 21.7%, 32.1%, 9.6%, 5.7%, 4.6% and 2.4% improvement in mAP over SSD, faster R-CNN, YOLOv3, YOLOv4, YOLOv5s and YOLOv7, respectively. The detection precision of the YOLOv7-RS in a baseball field (BD), a Bridge (BR), a Large Vehicle (LV), a football field (SBF) and a loop (RA) is optimal compared with other algorithms; compared with the original YOLOv7, the detection accuracy of the three categories of the Tennis Court (TC), the Basketball Court (BC) and the oil Storage Tank (ST) is reduced by 0.1-0.2%, and the rest of the detection accuracy is obviously improved.
The detection results of the Yolov7 and the Yolov7-RS are compared through a large number of experiments, and two groups of visual results are selected for analysis, as shown in fig. 7, the left side is the detection result of the Yolov7 algorithm, and the right side is the detection result of the Yolov7-RS algorithm.
YOLOv7 detected 5 ports in fig. 7 (a), 153 carts and 3 carts in fig. 7 (b). The YOLOv7-RS detects 5 ports and 6 cars in fig. 7 (a), and 272 cars and 4 large cars in fig. 7 (b), so that the problem of missed detection under the condition of complex background and dense arrangement of small targets is effectively improved by the YOLOv 7-RS.
In conclusion, the YOLOv7-RS provided by the invention is superior to most of the existing methods, mAP on NWPU VHR-10 and DOTA data sets reaches 95.4% and 74.1%, and the method can be well adapted to complexity and diversity of remote sensing images, and shows the effectiveness of the method.

Claims (7)

1. A remote sensing image target detection method based on YOLOv7-RS is characterized by comprising the following steps:
step 1, acquiring a remote sensing image and preprocessing the remote sensing image;
step 2, constructing a remote sensing image target detection model based on a YOLOv7-RS network structure;
and step 3, inputting the preprocessed remote sensing image and the weight file into the constructed model, and detecting the target of the remote sensing image.
2. The YOLOv 7-RS-based remote sensing image target detection method according to claim 1, wherein the preprocessing of the remote sensing image in step 1 is specifically as follows: and scaling the acquired remote sensing image to 640X640, and supplementing the insufficient part by adopting a pixel filling mode.
3. The method for detecting the target of the remote sensing image based on the YOLOv7-RS according to claim 1, wherein the YOLOv7-RS network structure in the step 1 comprises a D-ELAN module, a SIOU loss function part, an input end stage, a backbone network stage, a neck network stage and a head network stage;
the first branch of the D-ELAN module directly passes through a convolution of 1x 1; the second branch passes through three groups of two 3x3 convolutions on the basis, and finally, the convolution of 1x1 and the convolution results of three groups of 3x3 are spliced, so that the feature extraction capability is improved in a mode of improving the block utilization rate and increasing the network depth.
4. A remote sensing image target detection method based on YOLOv7-RS according to claim 3, wherein in the backbone network stage, a three-dimensional attention module SimAM which merges channel attention and spatial attention is introduced, and the calculation formula is as follows:
wherein ,for the output features, X is the input feature, E represents the channel and spatially all neurons minimum energy function +.>Energy tensor of individual neurons minimum energy function +.>As shown in formula (2):
wherein t is a target neuron, lambda is a super parameter,is the average value of all neurons on a single channel,/->Is the variance of all neurons on a single channel, as shown in equations (3) and (4):
wherein M represents the number of neurons per channel,X i Representing the ith neuron of the input signature on a single channel.
5. A remote sensing image target detection method based on YOLOv7-RS according to claim 3, wherein in the SIOU loss function, an angle deviation between a real frame and a predicted frame is defined as an angle loss, and calculation of a distance loss is added, that is, the SIOU loss function is composed of four parts of angle loss, distance loss, shape loss and IOU loss, and the calculation formula is as follows:
wherein, IOU is IOU loss, delta is distance loss, omega is shape loss, and the calculation formulas of the three are as follows:
wherein Λ is the angle loss, C h and Cw The height and width of the minimum circumscribed rectangle of the real frame and the predicted frame are respectively, gamma is given as a distance value with limited time, ρ x Is the difference between the width of the real frame and the width of the predicted frame is C w Specific gravity ρ of (B) y Is the difference between the height of the real frame and the height of the predicted frame is C h Specific gravity of x gt and ygt Respectively the abscissa of the center point of the real frame, x and y are the abscissa of the center point of the predicted frame, sigma is the distance between the center points of the real frame and the predicted frame, and w gt and hgt For the width and height of the real frame, w and h are the width and height of the predicted frame, ω w Is the specific weight, omega of the difference between the width of the real frame and the width of the predicted frame in the maximum value of the two h θ is a degree of attention for controlling the shape loss, which is the specific gravity of the difference between the height of the real frame and the height of the predicted frame in the maximum value of the two.
6. A YOLOv 7-RS-based remote sensing image target detection method according to claim 3, wherein only positive samples in the SIOU loss function participate in the calculation of the loss function.
7. The YOLOv 7-RS-based remote sensing image target detection method according to claim 6, wherein positive and negative sample allocation strategies are optimized in the SIOU loss function, namely three positive sample candidate boxes are added to four positive sample candidate boxes on the basis of the positive and negative sample allocation strategies of YOLOv7 by comprehensively considering rotation invariance of the remote sensing image target.
CN202310818961.2A 2023-07-05 2023-07-05 Remote sensing image target detection method based on YOLOv7-RS Pending CN116883859A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310818961.2A CN116883859A (en) 2023-07-05 2023-07-05 Remote sensing image target detection method based on YOLOv7-RS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310818961.2A CN116883859A (en) 2023-07-05 2023-07-05 Remote sensing image target detection method based on YOLOv7-RS

Publications (1)

Publication Number Publication Date
CN116883859A true CN116883859A (en) 2023-10-13

Family

ID=88259690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310818961.2A Pending CN116883859A (en) 2023-07-05 2023-07-05 Remote sensing image target detection method based on YOLOv7-RS

Country Status (1)

Country Link
CN (1) CN116883859A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611998A (en) * 2023-11-22 2024-02-27 盐城工学院 Optical remote sensing image target detection method based on improved YOLOv7

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611998A (en) * 2023-11-22 2024-02-27 盐城工学院 Optical remote sensing image target detection method based on improved YOLOv7

Similar Documents

Publication Publication Date Title
CN110443208A (en) A kind of vehicle target detection method, system and equipment based on YOLOv2
CN110175576A (en) A kind of driving vehicle visible detection method of combination laser point cloud data
CN111914795A (en) Method for detecting rotating target in aerial image
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN113807464B (en) Unmanned aerial vehicle aerial image target detection method based on improved YOLO V5
CN109034035A (en) Pedestrian's recognition methods again based on conspicuousness detection and Fusion Features
CN113160246A (en) Image semantic segmentation method based on depth supervision
Zheng et al. A review of remote sensing image object detection algorithms based on deep learning
CN116883859A (en) Remote sensing image target detection method based on YOLOv7-RS
Han et al. Research on remote sensing image target recognition based on deep convolution neural network
Liu et al. CAFFNet: channel attention and feature fusion network for multi-target traffic sign detection
CN115937552A (en) Image matching method based on fusion of manual features and depth features
CN112529065A (en) Target detection method based on feature alignment and key point auxiliary excitation
Chen et al. Object detection of optical remote sensing image based on improved faster RCNN
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN112102241A (en) Single-stage remote sensing image target detection algorithm
CN116721398A (en) Yolov5 target detection method based on cross-stage route attention module and residual information fusion module
CN116385876A (en) Optical remote sensing image ground object detection method based on YOLOX
CN114494893B (en) Remote sensing image feature extraction method based on semantic reuse context feature pyramid
Zhao et al. Street-view Change Detection via Siamese Encoder-decoder Structured Convolutional Neural Networks.
CN115100516A (en) Relation learning-based remote sensing image target detection method
Schuegraf et al. Deep Learning for the Automatic Division of Building Constructions into Sections on Remote Sensing Images
CN116385477A (en) Tower image registration method based on image segmentation
CN115035429A (en) Aerial photography target detection method based on composite backbone network and multiple measuring heads
CN115100428A (en) Target detection method using context sensing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination