CN116343027A - YOLOv5 remote sensing image target detection method utilizing attention mechanism fusion - Google Patents

YOLOv5 remote sensing image target detection method utilizing attention mechanism fusion Download PDF

Info

Publication number
CN116343027A
CN116343027A CN202310177081.1A CN202310177081A CN116343027A CN 116343027 A CN116343027 A CN 116343027A CN 202310177081 A CN202310177081 A CN 202310177081A CN 116343027 A CN116343027 A CN 116343027A
Authority
CN
China
Prior art keywords
remote sensing
sensing image
yolov5
attention mechanism
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310177081.1A
Other languages
Chinese (zh)
Inventor
王龙博
刘建辉
张贝贝
江刚武
麻顺顺
魏祥坡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202310177081.1A priority Critical patent/CN116343027A/en
Publication of CN116343027A publication Critical patent/CN116343027A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for detecting a target of a YOLOv5 remote sensing image by using attention mechanism fusion, which comprises the following steps: step 1: constructing a remote sensing image target detection network, wherein the remote sensing image target detection network comprises a fusion attention mechanism at any one position of a backbone layer, a neck layer and an output end; the fusion attention mechanism in the diaphysis layer of YOLOv5 specifically includes: adding an attention module after each CSP structure in the backbone layer; step 2: designing a loss function, and training the remote sensing image target detection network based on the loss function to obtain a remote sensing image target detection network model; step 3: and inputting the remote sensing image to be detected into the trained target detection network model to obtain a detection result.

Description

YOLOv5 remote sensing image target detection method utilizing attention mechanism fusion
Technical Field
The invention relates to the technical field of image encryption, in particular to a method for detecting a target of a YOLOv5 remote sensing image by means of attention mechanism fusion.
Background
The detection and identification of targets in remote sensing images by using target detection technology has become a current research hotspot. Under the challenge of mass data, the difficulty of meeting the accuracy and timeliness of target detection is more and more increased. The traditional remote sensing image target detection algorithm depends on manual work, has the problems of low detection instantaneity, low detection precision and the like, and is difficult to meet the actual application demands. Thus, with the intensive research of the deep learning technology, the target detection technology has been gradually combined with the deep learning technology from the conventional technology relying on a large number of manpower. From the flow of algorithm processing, the target detection algorithm based on deep learning mainly comprises two types: two-stage and single-stage detection algorithms. The method is characterized in that a region to be detected is established firstly, then, detection and judgment are carried out on a target, so that the algorithm detection precision is high, the method is suitable for scenes such as high-precision detection, but the timeliness of the algorithm is poor due to excessive model parameters and complex construction process, and typical algorithms include R-CNN, fastR-CNN, fasterR-CNN and the like. The single-stage target detection algorithm completes the generation, classification and regression of the region to be detected in one step, so that the algorithm has high instantaneity, is suitable for scenes such as real-time target detection, and represents algorithms such as SSD, YOLO series and the like. YOLOv5 refers to a design method of YOLOv4, optimizes by adopting a lighter network design, a self-adaptive anchoring method and a GIoU loss function, is a relatively perfect single-stage detection algorithm at present, and has both detection efficiency and accuracy. However, in existing detection tasks, the algorithm still faces a number of problems. For example, due to the problems of complex background, different scales, mutual shielding and the like of the remote sensing image target, the difficulty of a detection task is greatly increased, and the detection precision of an algorithm is limited, so that a plurality of scholars improve the YOLOv5 algorithm.
For example, document 1 (Tianheng, wang Ling, wang Peng, etc.. Computer engineering and applications based on the objective detection algorithm study [ J ] of improved Yolov5, 2022, 58 (13): 63-73) proposes a lightweight improved model Yolov-G, which has improved detection performance by integrating the attention mechanism of parallel mode into the backbone network by improving the feature pyramid structure of Yolov 5. Document 2 (Zhao Rui, liu Hui, liu Peilin, etc.. Safety helmet detection algorithm based on improved YOLOv5s [ J/OL ]. University of Beijing aviation aerospace university report: 1-16[2022-10-02]. DOI: 10.13700/j.bh.1001-5965.2021.0595.) uses a Denseblock module to replace the slice structure in the YOLOv5 backbone network, and a SE-Net channel attention module is added at the neck, improving the detection capability of the algorithm for a target dense distribution scene. The scholars all carry out the improvement of the added attention mechanism on the YOLOv5 algorithm, the detection precision under partial scenes is effectively improved, but the requirements of the target detection field on rapidness and accuracy are still difficult to meet, and the inventor believes that the core problem of the document is to neglect the influence of different positions of a network structure on the fused attention mechanism.
Disclosure of Invention
In order to further meet the high requirements of the target detection field on detection efficiency and detection accuracy, the invention provides a method for detecting a target of a YOLOv5 remote sensing image by using attention mechanism fusion.
The invention provides a method for detecting a target of a YOLOv5 remote sensing image by using attention mechanism fusion, which comprises the following steps:
step 1: constructing a remote sensing image target detection network, wherein the remote sensing image target detection network comprises a fusion attention mechanism at any one position of a backbone layer, a neck layer and an output end; the fusion attention mechanism in the diaphysis layer of YOLOv5 specifically includes: adding an attention module after each CSP structure in the backbone layer;
step 2: designing a loss function, and training the remote sensing image target detection network based on the loss function to obtain a remote sensing image target detection network model;
step 3: and inputting the remote sensing image to be detected into the trained target detection network model to obtain a detection result.
Further, in step 1, the fusion attention mechanism in the neck layer of YOLOv5 specifically includes: three Concat structures are selected in the neck layer, and one attention module is added before or after each Concat structure is selected.
Further, in step 1, the fused attention mechanism in the output terminal of YOLOv5 specifically includes: one attention module is added after each Conv layer at the output.
Further, the attention mechanism or module employs any one of CA, SE, ECA and CBAM attention.
Further, in step 2, the ciou_loss function is used as a Loss function.
The invention has the beneficial effects that:
the detection precision of the whole method can be effectively improved by fusing an attention mechanism in YOLOv5, especially fusing a CA attention mechanism in a backbone layer;
based on the fusion attention mechanism, the CIOU_LOSS LOSS function is adopted, and the improvement of the CIOU_LOSS LOSS function and the CIOU_LOSS LOSS function are combined, so that the number of false detection and missed detection in the target detection task can be further effectively reduced, the positioning accuracy of the target bounding box is improved, and the detection speed can be improved.
Drawings
Fig. 1 is a schematic flow chart of a method for detecting a target of a YOLOv5 remote sensing image by using attention mechanism fusion according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a mechanism for fusing CA attention at different locations of Yolov5s according to an embodiment of the present invention: (a) is a backbone layer position at YOLOv5s; (b) is the neck layer location at YOLOv5s; (c) is the output position at YOLOv5s;
FIG. 3 is a diagram showing the positional relationship between a predicted frame and a real frame according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a real frame including a prediction frame according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a CIoU loss function according to an embodiment of the present invention;
fig. 6 is a schematic diagram of examples of different targets in an RSOD dataset according to an embodiment of the present invention: (a) an aircraft image; (b) oil drum images; (c) an overpass image; (d) a playground image;
fig. 7 is a visual comparison of detection results of target detection by using the method of the present invention and the existing method according to the embodiment of the present invention: (a) is SSD; (b) is YOLOv3; (c) is YOLOv5s; (d) is the Yolov5s_CA_CIoU of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a method for detecting a target of YOLOv5 remote sensing image by using attention mechanism fusion, which includes the following steps:
s101: constructing a remote sensing image target detection network, wherein the remote sensing image target detection network comprises a fusion attention mechanism at any one position of a backbone layer, a neck layer and an output end of the YOLOv 5;
specifically, due to the complexity and diversity of the remote sensing image, the YOLOv5 algorithm is directly applied to the target detection task, so that the conditions that dense targets are difficult to detect, the positioning accuracy of multi-scale targets is low, small targets are easy to miss detection and error detection and the like can occur, the effectiveness of the target detection model is greatly reduced, and therefore, further structural optimization and adjustment of the YOLOv5 network are required. The YOLOv5 includes 4 versions in total, and in this embodiment, a basic YOLOv5s version is adopted to construct a remote sensing image target detection network.
The attention mechanism mainly acts on the feature map, so that the feature extraction capability of the network can be effectively improved by fusing the attention mechanism at a proper position of the network. However, since the backbone layer, the neck layer and the output end of YOLOv5s respectively perform different operation treatments on the characteristics, the improvement effect brought by fusing attention to different positions of the YOLOv5s network is also different. Meanwhile, since the input end of YOLOv5s is considered to perform operations such as data preprocessing, and the like, and is irrelevant to the extraction or fusion of target features, the fusion design of the attention mechanism in the input end is not considered in the embodiment.
Further, there are various types of attention mechanisms, any of CA, SE, ECA and CBAM attention can be employed. In order to maximize the performance of the remote sensing image target detection network, the CA attention is preferably fused in the Yolov5s network in the embodiment. For convenience of description, a network in which a CA attention mechanism is fused in a BackBone layer of Yolov5s is referred to as a Yolov5s_BackBone_CA model; the network after the CA attention mechanism is fused in the Neck layer of Yolov5 is named as a Yolov5s_Neck_CA model; the network after the CA attention mechanism is fused in the output of Yolov5 is denoted as the Yolov5s_prediction_CA model.
The CA attention mechanism is fused in the backbone layer of the Yolov5s, and specifically comprises the following steps: one CA attention module is added after each CSP structure in the backbone layer, as shown in FIG. 2 (a).
Among these, the fusion of CA attention mechanisms in the cervical layer of Yolov5 specifically includes: three Concat structures are selected in the neck layer, and one CA attention module is added before or after each Concat structure is selected, as shown in fig. 2 (b).
The mechanism of merging CA attention in the output end of Yolov5 specifically comprises: a CA attention module is added after each Conv layer at the output as shown in fig. 2 (c).
S102: designing a loss function, and training the remote sensing image target detection network based on the loss function to obtain a remote sensing image target detection network model;
in this embodiment, the original LOSS function giou_loss of YOLOv5s is used as the LOSS function. The formula is shown as formula (1).
Figure SMS_1
In the formula, A, B represents a prediction frame and a real frame, C represents the minimum bounding boxes of a and B, and the specific positional relationship is shown in fig. 3.
S103: and inputting the remote sensing image to be detected into the trained target detection network model to obtain a detection result.
In order to verify the detection performance of the remote sensing image target detection network provided by the embodiment of the invention, the following experimental data are also provided.
To explore the difference in improvement effect caused by the three fusion models, a comparison experiment was performed on the obtained attention fusion model and the original YOLOv5s model by using the RSOD dataset, and the experimental results are shown in table 1. The definition of the evaluation index in table 1 is shown in example 3, and will not be repeated here.
TABLE 1 results of CA attention module fusion experiments
Figure SMS_2
As shown in Table 1, because the CA attention is fused in the BackBone layer, the contour information and the positioning information of the target can be fully utilized, thereby more effectively inhibiting the complex background information in the feature mAP, the mAP50 of the YOLOv5s_BackBone_CA model is far higher than that of the other two fusion models, and is improved by 2.5% compared with the original YOLOv5s model, which indicates that the improvement effect brought by the fusion of the CA attention in the BackBone layer is best, and the detection precision is greatly improved. Compared with the original YOLOv5s model, mAP50 fused with CA attention at the neck layer is improved by 1.1%, which shows that the fused CA attention at the neck layer can effectively enhance the feature extraction capability of the network, but the improvement effect brought by the fused CA attention at the neck layer is lower than that of the backbone layer because the feature is transmitted and fused at the neck layer, so that part of information is lost. As the extraction and fusion operations are completed when the features enter the output end, the receptive field of the features is reduced, semantic information is lost, and the Yolov5s_BackBone_CA model obtained by fusing CA attention at the output end is reduced by 2.1% compared with the original Yolov5s model, but the accuracy is improved. From this, it can be seen that: in this embodiment, the fusion of CA attention is best at the Yolov5s backbone layer position.
Based on the yolov5s Backbone layer position fusion attention module, to further verify the effectiveness of CA attention fusion in the yolov5s Backbone layer, the CA attention in the model was replaced with SE, ECA and CBAM attention, respectively, to obtain three new models yolov5s_backbone_cbam, yolov5s_backbone_se, yolov5s_backbone_eca, and comparative experiments were performed on the RSOD dataset, and the experimental results are shown in table 2.
Table 2 results of comparative experiments fusing different attention modules
Figure SMS_3
As shown in Table 2, the improved models obtained by respectively fusing SE, ECA, CBAM attention at the backbone layer of Yolov5s are improved in mAP50, and the P, R and mAP50 indexes fused with the CA attention model are the highest, so that the effectiveness of fusing CA attention at the backbone layer of the network is proved.
Example 2
The giou_loss introduces a minimum circumscribed rectangle on the basis of IoU _loss, but since giou_loss only considers the coincidence degree between the real frame and the predicted frame, the regression relation of the target frame cannot be well described. On the other hand, when the target prediction frame is located within the range of the real frame, i.e., B n a=a, the giou_loss cannot accurately identify the position of the different prediction frame, as shown in fig. 4.
Therefore, on the basis of the above embodiment 1, the embodiment of the present invention is different from the above embodiment 1 in that the present embodiment modifies the Loss function, and a more perfect ciou_loss is selected as the Loss function of YOLOv5 s.
The CIoU_Loss solves the problems of the GIoU_Loss, considers the scale information of the boundary frame, increases the scale and the length-width ratio Loss of the detection frame, ensures that the prediction frame is more in line with the real frame, and realizes the effective fitting of the prediction frame and the real frame. Ciou_loss is shown in fig. 5.
The CIoU_Loss is calculated as follows.
Figure SMS_4
Figure SMS_5
Figure SMS_6
Wherein:
Figure SMS_7
and->
Figure SMS_8
The aspect ratios of the target frame and the prediction frame are respectively represented.
Based on the multidimensional consideration, the CIoU_LOSS improves the model positioning precision while increasing the boundary regression performance, so that the regression effect of the prediction frame is better, the convergence speed is increased, and the robustness for multi-scale target detection is enhanced.
Example 3
The present embodiment uses the RSOD data set to train and test the method of the present invention. First, the effectiveness of two improvement points (one of which is a fused attention mechanism and the other of which is a modified loss function) was evaluated by an ablation experiment. The method of the invention was then compared to SSD, YOLOv3 and original YOLOv5s algorithms and part of the test results were selected for visualization to verify the effectiveness of the method of the invention.
Experimental data and environment
The experiment used a RSOD dataset containing images of different scale features, which contained a total of 2326 images of four classes of targets. Wherein fig. 6 (a) is an aircraft target; FIG. 6 (b) shows oil drum targets, which are closely arranged and of different sizes; FIG. 6 (c) is an overpass target, and it can be seen that the background information of the overpass is more complex; fig. 6 (d) is a playground target image.
The experiment uses Windows 10-64-bit operating system, the GPU is GeForce RTX 3080Ti, python3.8 version is selected, the programming platform uses Pycharm, and the deep learning framework is Pytorch1.8.0 and CUDA11.1. The iteration number Epoch was set to 150 times, the Batch Size was set to 16, and the specific experimental environment configuration is shown in table 3.
TABLE 3 Experimental Environment
Figure SMS_9
(II) evaluation index
The improved algorithm is evaluated from the two angles of accuracy and timeliness of target detection. The accuracy judgment index adopts average accuracy (mean average precision, mAP) and average accuracy (average precision, AP); the timeliness evaluation index uses a maximum image Frame number (FPS) Per Second. The calculation formulas of the respective indexes are shown below.
Figure SMS_10
Figure SMS_11
Figure SMS_12
Figure SMS_13
Figure SMS_14
Wherein P represents accuracy (Precision) and R represents Recall (Recall); AP refers to the area enclosed by the P-R curve; mAP value is obtained by average value of each AP; TP represents the number of frames detected correctly; FP represents the number of frames in which errors are detected; FN does not detect the number of GTs; the configurationnumber represents the total number of detected pictures; totalcime represents the total duration of detection.
IOU=0.5 is a common standard for testing the performance of an algorithm, and can reflect the comprehensive classification capability of the algorithm on various targets, so mAP50 is used as an mAP evaluation index.
(III) ablation experiments
To verify the effectiveness of the two improvement modules CA attention and ciou_loss in the model, an ablation experiment was performed on the improvement algorithm based on YOLOv5s, and the experimental results are shown in table 4.
Table 4 ablation experimental results
Figure SMS_15
As can be seen from the experimental results in table 4, the yolov5s_ca model obtained by the CA attention module alone and the yolov5s_ciou model obtained by the ciou_loss function module alone were fused, so that the mAP50 of the original yolov5s algorithm was improved, and the effectiveness of the improvement module was demonstrated. In a comprehensive view, even though the accuracy of the improved algorithm yolov5s_ca_ciou is slightly lower than that of the yolov5s_ciou model obtained by independently introducing the ciou_loss function module, the recall rate and the mAP50 of the yolov5s_ca_ciou respectively reach 87.4% and 91.1%, which are respectively improved by 1.8% and 2.9% compared with the original yolov5s algorithm, which indicates that the improved effect caused by combining the attention of fusion CA with the replacement of ciou_loss function is optimal, and the effectiveness of the method is further verified.
(IV) comparative experiments
In order to more comprehensively verify the effectiveness of the method and further evaluate the improvement of the method in detection precision, speed and the like, the method is compared with the algorithm of YOLOv3, SSD and YOLOv5s for research. Experiments were performed under the same training conditions using the same data set, and the results are shown in table 5.
Table 5 comparison of results of mainstream algorithm detection
Figure SMS_16
As can be seen from Table 5, the detection accuracy of the Yolov5s_CA_CIoU of the invention is highest, mAP50 reaches 91.1%, and the detection accuracy is improved by 8.9%, 5.3% and 2.9% respectively compared with SSD, yolov3 and Yolov5s algorithms. Even though the parameter quantity of the method is slightly larger than that of the YOLOv5s algorithm, the detection speed FPS is improved, which shows that the method can obtain higher detection precision at the cost of increasing a small part of parameter quantity. The SSD algorithm has a relatively weak feature extraction capability due to a relatively simple network structure, so that the detection accuracy is relatively limited when facing a complex detection task of the target background. The YOLOv3 algorithm enhances feature fusion, so that the detection capability is improved compared with an SSD model, but the detection precision is still far lower than that of the method.
In order to further evaluate the detection effect of the method, the detection results of partial remote sensing images in the RSOD data set are selected for visual comparison, and the detection effects of the four targets including the airplane, the oil drum, the overpass and the playground are evaluated from the angles of whether missed detection exists, false detection exists, the positioning accuracy of the target boundary box and the like. As shown in fig. 7, the visual detection results of four algorithms, i.e., SSD, YOLOv3, YOLOv5s, and YOLOv5s_ca_ciou, are sequentially from left to right, wherein the box with the largest gray value corresponds to the correct detection result, the box with the middle gray value represents the false detection, and the box with the smallest gray value represents the false detection result.
As can be obtained from the visual result, for the detection of the aircraft target, the aircraft target in the image has compact layout and different sizes, and the SSD algorithm in fig. 7 (a) has a missing detection condition; in fig. 7 (b), the YOLOv3 algorithm also has the condition of missed detection and false detection, and misjudges the blank area between two airplanes in the image as an airplane target; the YOLOv5s algorithm in fig. 7 (c) also has false detection and missing detection conditions, and cannot detect the small aircraft target on the right side of the image; in fig. 7 (d), the improved algorithm yolov5s_ca_ciou has no false detection and no missing detection, which proves that the detection accuracy of the improved algorithm is greatly improved compared with that of the original yolov5s algorithm in the small target detection task scene. For the detection of the oil drum target, the SSD algorithm in FIG. 7 (a) and the YOLOv3 algorithm in FIG. 7 (b) have the condition of missing detection, and partial oil drum targets are not effectively detected; the YOLOv5s algorithm in fig. 7 (c) has a false detection condition, and misjudges a blank area between objects to be detected as an oil drum object; while the improved algorithm yolov5s_ca_ciou in fig. 7 (d) accurately detects all targets, it is proved that the method of the invention effectively improves the target detection capability of the original yolov5s algorithm in dense scenes. Compared with other three algorithms, the improved algorithm yolov5s_ca_ciou in fig. 7 (d) has no missing detection and false detection, and is more accurate in positioning the target bounding box for the detection of the overpass and the playground target.
In conclusion, compared with three algorithms of SSD, YOLOv3 and YOLOv5s, the method disclosed by the invention has the advantages that the detection precision of YOLOv5s_CA_CIoU is higher, and the bounding box positioning of a detection target is also more accurate.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (5)

1. The method for detecting the target of the YOLOv5 remote sensing image by using the attention mechanism fusion is characterized by comprising the following steps of:
step 1: constructing a remote sensing image target detection network, wherein the remote sensing image target detection network comprises a fusion attention mechanism at any one position of a backbone layer, a neck layer and an output end; the fusion attention mechanism in the diaphysis layer of YOLOv5 specifically includes: adding an attention module after each CSP structure in the backbone layer;
step 2: designing a loss function, and training the remote sensing image target detection network based on the loss function to obtain a remote sensing image target detection network model;
step 3: and inputting the remote sensing image to be detected into the trained target detection network model to obtain a detection result.
2. The method for detecting a target of YOLOv5 remote sensing image by using attention mechanism fusion according to claim 1, wherein in step 1, the attention mechanism fusion in the neck layer of YOLOv5 specifically comprises: three Concat structures are selected in the neck layer, and one attention module is added before or after each Concat structure is selected.
3. The method for detecting a target of YOLOv5 remote sensing image by using attention mechanism fusion according to claim 1, wherein in step 1, the attention mechanism fusion in the output end of YOLOv5 specifically comprises: one attention module is added after each Conv layer at the output.
4. A YOLOv5 remote sensing image target detection method using attention mechanism fusion according to any one of claims 1-3, wherein the attention mechanism or attention module employs any one of CA, SE, ECA and CBAM attention.
5. The method for detecting an object of a YOLOv5 remote sensing image by using attention mechanism fusion according to claim 1, wherein in step 2, ciou_loss function is adopted as a Loss function.
CN202310177081.1A 2023-02-28 2023-02-28 YOLOv5 remote sensing image target detection method utilizing attention mechanism fusion Pending CN116343027A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310177081.1A CN116343027A (en) 2023-02-28 2023-02-28 YOLOv5 remote sensing image target detection method utilizing attention mechanism fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310177081.1A CN116343027A (en) 2023-02-28 2023-02-28 YOLOv5 remote sensing image target detection method utilizing attention mechanism fusion

Publications (1)

Publication Number Publication Date
CN116343027A true CN116343027A (en) 2023-06-27

Family

ID=86876727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310177081.1A Pending CN116343027A (en) 2023-02-28 2023-02-28 YOLOv5 remote sensing image target detection method utilizing attention mechanism fusion

Country Status (1)

Country Link
CN (1) CN116343027A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116645502A (en) * 2023-07-27 2023-08-25 云南大学 Power transmission line image detection method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116645502A (en) * 2023-07-27 2023-08-25 云南大学 Power transmission line image detection method and device and electronic equipment
CN116645502B (en) * 2023-07-27 2023-10-13 云南大学 Power transmission line image detection method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Tan et al. Improved YOLOv5 network model and application in safety helmet detection
CN112668696A (en) Unmanned aerial vehicle power grid inspection method and system based on embedded deep learning
CN116343027A (en) YOLOv5 remote sensing image target detection method utilizing attention mechanism fusion
AU2020316538A1 (en) Meteorological parameter-based high-speed train positioning method and system in navigation blind zone
CN109711242A (en) Modification method, device and the storage medium of lane line
CN113076891A (en) Human body posture prediction method and system based on improved high-resolution network
Wang et al. Accurate real-time ship target detection using Yolov4
Zhan et al. [Retracted] The System Research and Implementation for Autorecognition of the Ship Draft via the UAV
Zhang et al. Recognition of bird nests on power transmission lines in aerial images based on improved YOLOv4
CN109766780A (en) A kind of ship smog emission on-line checking and method for tracing based on deep learning
CN117788471A (en) Method for detecting and classifying aircraft skin defects based on YOLOv5
CN115374880B (en) Offshore target identification-oriented multistage incremental data fusion system
CN112130166A (en) AGV positioning method and device based on reflector network
CN112529836A (en) High-voltage line defect detection method and device, storage medium and electronic equipment
CN116664549A (en) Photovoltaic power station hot spot defect detection method based on feature perception
Ma et al. WeldNet: A deep learning based method for weld seam type identification and initial point guidance
Li et al. Study on semantic image segmentation based on convolutional neural network
Liu et al. Mob-YOLO: A Lightweight UAV Object Detection Method
CN111858816B (en) Method and system for improving track association efficiency of single monitoring source and multiple monitoring sources
Yildirim et al. Autonomous Ground Refuelling Approach for Civil Aircrafts using Computer Vision and Robotics
Li et al. Algorithm for the detection of thin strip-shaped structural small diseases on airport pavement based on improved pyramid and feature fusion
Mu et al. UAV image defect detection method for steel structure of high-speed railway bridge girder
CN114092415A (en) Large-size light guide plate defect visual detection method
Wang et al. Combination of point-cloud model and FCN for dam crack detection and scale calculation
Sui et al. Research on multi-task perception network of traffic scene based on feature fusion 1

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination