CN115272242B - YOLOv 5-based optical remote sensing image target detection method - Google Patents

YOLOv 5-based optical remote sensing image target detection method Download PDF

Info

Publication number
CN115272242B
CN115272242B CN202210909740.1A CN202210909740A CN115272242B CN 115272242 B CN115272242 B CN 115272242B CN 202210909740 A CN202210909740 A CN 202210909740A CN 115272242 B CN115272242 B CN 115272242B
Authority
CN
China
Prior art keywords
remote sensing
optical remote
image
features
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210909740.1A
Other languages
Chinese (zh)
Other versions
CN115272242A (en
Inventor
侯彪
李智德
汤奇
任仲乐
任博
杨晨
焦李成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210909740.1A priority Critical patent/CN115272242B/en
Publication of CN115272242A publication Critical patent/CN115272242A/en
Application granted granted Critical
Publication of CN115272242B publication Critical patent/CN115272242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention relates to a YOLOv 5-based optical remote sensing image target detection method, which is characterized by comprising the following steps of: step 1: acquiring an optical remote sensing image to be detected, wherein the optical remote sensing image to be detected contains a target to be detected; step 2: cutting the optical remote sensing image to be detected into a plurality of optical remote sensing sub-images to be detected; step 3: inputting an optical remote sensing sub-image to be detected into a pre-trained YOLOv5 target detection model to obtain a corresponding sub-image detection result, wherein the detection result comprises a target detection frame and a classification-intersection ratio; step 4: merging the sub-image detection results to obtain a detection result of the optical remote sensing image to be detected; the YOLOv5 target detection model comprises a backbone network, a neck network and a detection head which are cascaded, wherein the neck network is a CSP-BiFPN network. The method for detecting the optical remote sensing image target based on the YOLOv5 has higher detection precision and stronger capability of distinguishing targets with different scales.

Description

YOLOv 5-based optical remote sensing image target detection method
Technical Field
The invention belongs to the technical field of optical remote sensing image airplane detection, and particularly relates to a YOLOv 5-based optical remote sensing image target detection method.
Background
Conventional target detection methods are typically designed based on manually extracted features, are typically specific to a particular scene, and require extensive parameter optimization, so these methods are inferior in generalization. The conventional method is not applicable to optical remote sensing images with increasingly complex scenes.
With the rapid development of deep learning, the generalization performance of the features extracted by using the convolutional neural network is far higher than that of the features extracted by the traditional manual method. Current object detection models generally comprise two parts: some networks for feature extraction include a backbone network and a neck network, wherein the backbone network is usually pre-trained on a large-scale picture dataset to obtain better generalization capability and stronger feature extraction function, and the neck network fuses feature layers with different downsampling magnifications to identify and locate targets with different sizes. The other part is a detection head, and the extracted characteristic network is used for classifying and returning coordinates.
The object detection model may be further divided into a single-stage model and a two-stage model according to whether or not the pre-extracted candidate region exists. The two-stage model is represented by a fast-RCNN, the two-stage model brings precision improvement, meanwhile, no single-stage model is fast in reasoning speed, and meanwhile, the fact that an Anchor frame Anchor needs to be set manually in an RPN network is one of problems. The most representative models in the single-stage model are YOLO, SSD, retinaNet, respectively. The method of the single-stage model is simpler than that of the two-stage model, and classification and regression are only carried out on the extracted feature images, so that the method has the advantages of relatively high reasoning speed, but relatively poorer precision than that of the two-stage model.
The neck network adopted by most of the current target detection models is a feature pyramid network FPN and a path aggregation network PAN, including a YOLOv5 model, and the neck network has the problem that input features with different downsampling multiplying power are simply added together, and the contribution degree of the input features to the final fusion features is not considered, so that the advantages of the fusion features are not fully exerted, and the capability of the model to distinguish targets with different scales is reduced. In addition, the classification branch and the regression branch in the detection head of the current target detection model are usually independent, and are not directly connected, so that when the predicted classification score is high, the predicted detection frame deviation is large, or when the predicted detection frame is accurate, the classification score is low, and the model precision is low.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a YOLOv 5-based optical remote sensing image target detection method. The technical problems to be solved by the invention are realized by the following technical scheme:
the invention provides a YOLOv 5-based optical remote sensing image target detection method, which comprises the following steps:
step 1: acquiring an optical remote sensing image to be detected, wherein the optical remote sensing image to be detected contains a target to be detected;
step 2: cutting the optical remote sensing image to be detected into a plurality of optical remote sensing sub-images to be detected;
step 3: inputting the optical remote sensing sub-image to be detected into a pre-trained YOLOv5 target detection model to obtain a corresponding sub-image detection result, wherein the detection result comprises a target detection frame and a classification-cross ratio;
step 4: combining the sub-image detection results to obtain a detection result of the optical remote sensing image to be detected;
the YOLOv5 target detection model comprises a backbone network, a neck network and a detection head which are cascaded, wherein the neck network is a CSP-BiFPN network.
In one embodiment of the present invention, the backbone network is configured to perform feature extraction on the input optical remote sensing sub-image to be detected, so as to obtain a high-level semantic feature, a middle-level semantic feature and a low-level semantic feature corresponding to the optical remote sensing sub-image to be detected;
the neck network is used for fusing the high-level semantic features, the middle-level semantic features and the low-level semantic features to obtain high-level semantic fusion features, middle-level semantic fusion features and low-level semantic fusion features;
the detection head is used for determining and outputting a detection result of the optical remote sensing sub-image to be detected according to the high-level semantic fusion feature, the middle-level semantic fusion feature and the low-level semantic fusion feature.
In one embodiment of the present invention, the CSP-BiFPN network comprises a plurality of CSP-BiFPN sub-networks in series, the CSP-BiFPN sub-networks comprising a high level CSP2-n cell, a middle level first CSP2-n cell, a middle level second CSP2-n cell, and a low level CSP2-n cell, wherein,
the low-level CSP2-n unit performs feature fusion on the low-level semantic features and the intermediate fusion features subjected to the up-sampling operation to obtain the low-level semantic fusion features;
the middle-layer first CSP2-n unit performs feature fusion on the high-layer semantic features subjected to the up-sampling operation and the middle-layer semantic features to obtain middle fusion features;
the middle-layer second CSP2-n unit performs feature fusion on the middle-layer semantic features, the middle fusion features and the lower-layer semantic fusion features subjected to downsampling operation to obtain the middle-layer semantic fusion features;
and the high-level CSP2-n unit performs feature fusion on the high-level semantic features and the middle-level semantic fusion features subjected to the downsampling operation to obtain the high-level semantic fusion features.
In one embodiment of the present invention, the detection head includes a regression branch that outputs a target detection frame of the optical remote sensing sub-image to be detected and a classification branch that outputs a classification-to-intersection ratio of the optical remote sensing sub-image to be detected.
In one embodiment of the present invention, the YOLOv5 target detection model is obtained based on a plurality of training image samples and label training corresponding to each training image sample, wherein the labels comprise target coordinate labels and target classification labels.
In one embodiment of the invention, the object class label is a continuity number between 0 and 1.
In one embodiment of the present invention, the classification loss function of the YOLOv5 object detection model is:
Loss=-|y-σ| β ((1-y)log(1-σ)+y logσ);
wherein y represents a classification-to-cross ratio, sigma represents the output of a classification branch of the detection head, and beta represents a frequency modulation factor;
the calculation formula of the classification-intersection ratio of the training image sample is as follows: y=a×l;
wherein a represents an intersection ratio IoU of the predicted coordinates of the training image sample and the corresponding target coordinate label, the predicted coordinates are obtained by decoding the output of the regression branch of the detection head, and l represents the target classification label of the training image sample.
In one embodiment of the present invention, the step 4 includes:
step 4.1: filtering and re-processing the sub-image detection result;
step 4.2: and merging the sub-image detection results after the filtering and de-duplication treatment to obtain the detection result of the optical remote sensing image to be detected.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the method for detecting the optical remote sensing image target based on the YOLOv5, the optical remote sensing image target detection is completed by utilizing the trained YOLOv5 target detection model, the neck network of the YOLOv5 target detection model is a CSP-BiFPN network, and the two-way characteristic pyramid network BiFPN and the cross-level part connecting CSP convolution are combined. The BiFPN network introduces the importance of learning parameters to automatically learn the features of different downsampling magnifications, fully utilizes the information among the features of different downsampling magnifications, improves the distinguishing degree of the features of each level, and can better distinguish targets with different sizes in the optical remote sensing image. CSP convolution refers to a bottleneck connection mode, input features are divided into two features according to channels, the features are extracted through different convolution operations, the two features are spliced together according to the channels, finally, a convolution layer is connected to extract the features again, the feature extraction capacity is greatly enhanced, semantic fusion features with higher recognition degree are generated, and powerful support is provided for further distinguishing targets with different scales for the model.
2. According to the method for detecting the optical remote sensing image target based on the YOLOv5, the detection head classification branch of the YOLOv5 target detection model is converted from direct prediction type information into prediction classification-cross ratio, the relevance of classification branch output and regression branch output is further enhanced, and the new loss function guide model is used for combining the classification information and coordinate regression information, so that the detection precision of the model is improved.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention, as well as the preferred embodiments thereof, together with the following detailed description of the invention, given by way of illustration only, together with the accompanying drawings.
Drawings
FIG. 1 is a schematic diagram of an optical remote sensing image target detection method based on YOLOv5 according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for detecting an optical remote sensing image target based on YOLOv5 according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a CSP-BiFPN subnetwork according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a CSP2-n unit according to an embodiment of the present invention.
Detailed Description
In order to further explain the technical means and effects adopted by the invention to achieve the preset aim, the following describes in detail an optical remote sensing image target detection method based on YOLOv5 according to the invention with reference to the attached drawings and the detailed description.
The foregoing and other features, aspects, and advantages of the present invention will become more apparent from the following detailed description of the preferred embodiments when taken in conjunction with the accompanying drawings. The technical means and effects adopted by the present invention to achieve the intended purpose can be more deeply and specifically understood through the description of the specific embodiments, however, the attached drawings are provided for reference and description only, and are not intended to limit the technical scheme of the present invention.
Example 1
Referring to fig. 1 and fig. 2, fig. 1 is a schematic diagram of a YOLOv 5-based optical remote sensing image target detection method according to an embodiment of the present invention; fig. 2 is a flowchart of a YOLOv 5-based optical remote sensing image target detection method according to an embodiment of the present invention. As shown in the figure, the YOLOv 5-based optical remote sensing image target detection method of the embodiment includes:
step 1: acquiring an optical remote sensing image to be detected, wherein the optical remote sensing image to be detected contains a target to be detected;
step 2: cutting the optical remote sensing image to be detected into a plurality of optical remote sensing sub-images to be detected;
specifically, the optical remote sensing image to be detected is cut so that the cut image is adapted to the YOLOv5 target detection model, and in this embodiment, the optical remote sensing image to be detected is cut into 1024 x 1024 optical remote sensing sub-images to be detected.
Step 3: inputting an optical remote sensing sub-image to be detected into a pre-trained YOLOv5 target detection model to obtain a corresponding sub-image detection result, wherein the detection result comprises a target detection frame and a classification-intersection ratio;
in this embodiment, the YOLOv5 target detection model includes a cascaded backbone network, a neck network, and a detection head.
The backbone network is used for extracting features of the input optical remote sensing sub-image to be detected, and obtaining high-level semantic features, middle-level semantic features and low-level semantic features corresponding to the optical remote sensing sub-image to be detected.
The characteristics output by the network layer of the backbone network are divided into low-level semantic characteristics, middle-level semantic characteristics and high-level semantic characteristics according to different downsampling multiplying power. The lower-layer semantic features represent features output by a network layer with lower downsampling multiplying power, the higher-layer semantic features represent features output by a network layer with higher downsampling multiplying power, the middle-layer semantic features represent features output by a network layer with downsampling multiplying power between the sampling multiplying power of the lower-layer semantic features and the sampling multiplying power of the higher-layer semantic features, and the deeper the network extracted features have more semantic information.
In this embodiment, the lower-level semantic features, the middle-level semantic features, the higher-level semantic features, and the features representing the network-layer outputs of the backbone networks with downsampling magnifications of 8, 16, 32, respectively.
In this embodiment, the backbone network of the YOLOv5 target detection model is the backbone network of YOLOv5 itself, and the specific structure is not described here again.
Further, the neck network is used for fusing the high-level semantic features, the middle-level semantic features and the low-level semantic features to obtain high-level semantic fusion features, middle-level semantic fusion features and low-level semantic fusion features.
In this embodiment, the neck network of the YOLOv5 object detection model is a CSP-bippn network. The CSP-BiFPN network comprises a plurality of CSP-BiFPN subnetworks connected in series, wherein the CSP-BiFPN subnetworks comprise a high-level CSP2-n unit, a middle-level first CSP2-n unit, a middle-level second CSP2-n unit and a low-level CSP2-n unit.
In this embodiment, the CSP-BiFPN network includes three CSP-BiFPN sub-networks with the same structure, i.e., three CSP-BiFPN sub-networks with the same structure are connected in series to obtain a complete CSP-BiFPN network, and the structure of the CSP-BiFPN sub-network is shown in fig. 3.
Since the YOLOv5 backbone network outputs three layers of characteristics, namely, outputs: high-level semantic features, middle-level semantic features, and low-level semantic features. Therefore, in the neck network design of the present embodiment, corresponding three-layer fusion features are also output, namely, respectively: high-level semantic fusion features, middle-level semantic fusion features, and low-level semantic fusion features.
That is, the CSP-bippn sub-network receives as input three feature maps of different downsampling magnifications, and outputs a fused feature map of three different downsampling magnifications, where the downsampling magnifications between the input and output correspond one-to-one.
Specifically, for a CSP-BiFPN sub-network connected with a backbone network, a low-level CSP2-n unit thereof performs feature fusion on low-level semantic features and intermediate fusion features subjected to up-sampling operation to obtain low-level semantic fusion features; the middle-layer first CSP2-n unit performs feature fusion on the high-layer semantic features and the middle-layer semantic features subjected to the up-sampling operation to obtain middle fusion features; the middle layer second CSP2-n unit performs feature fusion on the middle layer semantic features, the middle fusion features and the lower layer semantic fusion features subjected to downsampling operation to obtain middle layer semantic fusion features; and the high-level CSP2-n unit performs feature fusion on the high-level semantic features and the middle-level semantic fusion features subjected to the downsampling operation to obtain the high-level semantic fusion features.
Taking the intermediate fusion feature as an example to describe a calculation formula of the intermediate fusion feature, the calculation formula of the intermediate fusion feature is as follows:
wherein CSP2-n represents the middle layer first CSP2-n unit, F high Representing high-level semantic features of the input, F mid Representing the mid-level semantic features of the input, ε=0.0001, UP represents a bilinear interpolation upsampling operation, w 1 Representing the contribution weight of high-level semantic features to intermediate fusion features, w 2 And the contribution weight of the middle semantic features to the middle fusion features is represented.
In the present embodiment, w 1 And w 2 Is YOLOv5The initial value of the learnable parameter in the training process of the target detection model is 1.
Similarly, the contribution weight of the low-level semantic features and the middle fusion features to the low-level semantic fusion features is required to be set; setting contribution weights corresponding to middle semantic fusion features, middle fusion features and lower semantic fusion features; and setting contribution weights of the high-level semantic features and the middle-level semantic fusion features to the high-level semantic fusion features. In this embodiment, the 7 weights are also used as the learnable parameters in the YOLOv5 target detection model training process, and the initial value is 1.
It should be noted that, the process of fusing the input features by other CSP-BiFPN subnetworks in the CSP-BiFPN network is similar to the above, and will not be repeated here. Similarly, the corresponding weight value needs to be set as a learnable parameter in the training process of the YOLOv5 target detection model, and the initial value is set to be 1.
Specifically, as shown in fig. 4, the structure diagram of the CSP2-n unit is in a bottleneck connection mode, and includes a first branch and a second branch, the input features are divided into two features according to channels, the two features are respectively input into the first branch and the second branch, the features input into the first branch are subjected to convolution operation by one CBH convolution layer, then subjected to convolution operation by 3 cascaded CBH convolution layers, and finally output by a two-dimensional convolution layer (Conv 2D); inputting the characteristics of the second branch and outputting the characteristics through convolution operation by a two-dimensional convolution layer; after the output features of the two branches are spliced in the channel dimension, the features are extracted again by sequentially inputting a BN layer (batch standardization layer), an L-ReLu layer and a CBH convolution layer.
In this embodiment, the CBH convolutional layer includes a concatenated two-dimensional convolutional layer, a BN layer, and an H-swish layer. The number of convolution kernels of the last CBH convolution layer in the CSP2-n unit is 256, the number of convolution kernels of the rest CBH convolution layers is 128, and the sizes of all the convolution kernels are 3*3.L-ReLU and H-swish represent activation functions.
The neck network of the YOLOv5 object detection model of this embodiment is a CSP-BiFPN network, combining a bi-directional feature pyramid network BiFPN with a cross-level part connection CSP convolution. The BiFPN network introduces the importance of learning parameters to automatically learn the features of different downsampling magnifications, fully utilizes the information among the features of different downsampling magnifications, improves the distinguishing degree of the features of each level, and can better distinguish targets with different sizes in the optical remote sensing image. CSP convolution refers to a bottleneck connection mode, input features are divided into two features according to channels, the features are extracted through different convolution operations, the two features are spliced together according to the channels, finally, a convolution layer is connected to extract the features again, the feature extraction capacity is greatly enhanced, semantic fusion features with higher recognition degree are generated, and powerful support is provided for further distinguishing targets with different scales for the model.
Further, the detection head is used for determining and outputting a detection result of the optical remote sensing sub-image to be detected according to the high-level semantic fusion feature, the middle-level semantic fusion feature and the low-level semantic fusion feature.
In this embodiment, the detection head includes a regression branch and a classification branch, the regression branch outputs a target detection frame of the optical remote sensing sub-image to be detected, and the classification branch outputs a classification-intersection ratio of the optical remote sensing sub-image to be detected. Wherein the classification-to-intersection ratio represents a joint distribution between the predicted class of the target and the intersection ratio of its predicted coordinates and true coordinates.
For better clarity of the solution, the following describes an exemplary training procedure of the YOLOv5 object detection model: firstly, a training data set is acquired, wherein the training data set comprises a plurality of training image samples, and target coordinate labels and target classification labels corresponding to the image samples. Note that, unlike YOLOv5, the original target class label is a discrete number of 0 and 1, and in this embodiment, the target class label corresponding to the training image sample is a continuous number between 0 and 1. Inputting a training data set into the above-described YOLOv5 target detection model, calculating a loss value of the YOLOv5 target detection model in training by using a loss function, and optimizing model parameters by using a random gradient descent (SGD) optimizer, wherein the model parameters comprise network structure parameters of the YOLOv5 and set contribution weight values for various fusion features; and when the calculated loss value is smaller than a preset threshold value after a certain batch of training image samples are input into the YOLOv5 target detection model, the YOLOv5 target detection model is considered to be converged, and training is completed.
In this embodiment, the classification loss function of the YOLOv5 target detection model is a mass focus loss function:
Loss=-|y-σ| β ((1-y)log(1-σ)+y logσ)(2)
where y represents the class-to-cross ratio, σ represents the output of the class branch of the detection head, and β represents the frequency modulation factor, typically set to 2.
The calculation formula of the classification-cross ratio of the training image sample is as follows: y=a×l;
wherein a represents an intersection ratio IoU of the predicted coordinates of the training image sample and the corresponding target coordinate label, the predicted coordinates are obtained by decoding the output of the regression branch of the detection head, and l represents the target classification label of the training image sample.
In the embodiment, the YOLOv5 detection head classification branch is converted into the prediction classification-cross ratio from the direct prediction type information, so that the relevance of classification branch output and regression branch output is further enhanced, and the original binary cross entropy loss function is replaced by the quality focusing loss function. The quality focusing loss function is designed for enabling the model to better learn the prediction classification-intersection ratio, and the model is guided to combine classification information and coordinate regression information, so that the accuracy of the model is improved.
Step 4: merging the sub-image detection results to obtain a detection result of the optical remote sensing image to be detected;
specifically, step 4 includes:
step 4.1: filtering and re-processing the sub-image detection result;
in this embodiment, for the sub-image detection result, the target with smaller prediction probability is filtered, the target with larger probability is retained, and the overlapped prediction target is removed by a non-maximum suppression algorithm.
Step 4.2: and merging the sub-image detection results after the filtering and de-duplication treatment to obtain the detection result of the optical remote sensing image to be detected.
According to the optical remote sensing image target detection method based on the YOLOv5, the optical remote sensing image target detection is completed by utilizing a trained YOLOv5 target detection model, and a neck network of the YOLOv5 target detection model is a CSP-BiFPN network, on one hand, the CSP-BiFPN network introduces a learnable parameter to automatically learn the importance of different downsampling magnification characteristics, fully utilizes the information among the different downsampling magnification characteristics, improves the distinguishing degree of each level of characteristics, and can better distinguish targets with different sizes in the optical remote sensing image; on the other hand, the CSP-BiFPN network greatly enhances the feature extraction capability, generates semantic fusion features with more discrimination, and provides powerful support for further distinguishing targets with different scales for the model.
In addition, according to the YOLOv 5-based optical remote sensing image target detection method, the classification branch of the YOLOv5 detection head is converted from direct prediction type information into prediction classification-intersection ratio, so that the relevance of classification branch output and regression branch output is further enhanced, and the accuracy of a model is improved.
It should be noted that in this document relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in an article or apparatus that comprises the element. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims (5)

1. The method for detecting the optical remote sensing image target based on the YOLOv5 is characterized by comprising the following steps of:
step 1: acquiring an optical remote sensing image to be detected, wherein the optical remote sensing image to be detected contains a target to be detected;
step 2: cutting the optical remote sensing image to be detected into a plurality of optical remote sensing sub-images to be detected;
step 3: inputting the optical remote sensing sub-image to be detected into a pre-trained YOLOv5 target detection model to obtain a corresponding sub-image detection result, wherein the detection result comprises a target detection frame and a classification-cross ratio;
step 4: combining the sub-image detection results to obtain a detection result of the optical remote sensing image to be detected;
the YOLOv5 target detection model comprises a backbone network, a neck network and a detection head which are cascaded, wherein the neck network is a CSP-BiFPN network;
the backbone network is used for extracting the characteristics of the input optical remote sensing sub-image to be detected to obtain high-level semantic characteristics, middle-level semantic characteristics and low-level semantic characteristics corresponding to the optical remote sensing sub-image to be detected;
the neck network is used for fusing the high-level semantic features, the middle-level semantic features and the low-level semantic features to obtain high-level semantic fusion features, middle-level semantic fusion features and low-level semantic fusion features; the CSP-BiFPN network comprises a plurality of CSP-BiFPN subnetworks connected in series, wherein the CSP-BiFPN subnetworks comprise a high-level CSP2-n unit, a middle-level first CSP2-n unit, a middle-level second CSP2-n unit and a low-level CSP2-n unit,
the low-level CSP2-n unit performs feature fusion on the low-level semantic features and the intermediate fusion features subjected to the up-sampling operation to obtain the low-level semantic fusion features; the middle-layer first CSP2-n unit performs feature fusion on the high-layer semantic features subjected to the up-sampling operation and the middle-layer semantic features to obtain middle fusion features; the middle-layer second CSP2-n unit performs feature fusion on the middle-layer semantic features, the middle fusion features and the lower-layer semantic fusion features subjected to downsampling operation to obtain the middle-layer semantic fusion features; the high-level CSP2-n unit performs feature fusion on the high-level semantic features and the middle-level semantic fusion features subjected to downsampling operation to obtain the high-level semantic fusion features;
the detection head is used for determining and outputting a detection result of the optical remote sensing sub-image to be detected according to the high-level semantic fusion feature, the middle-level semantic fusion feature and the low-level semantic fusion feature; the detection head comprises a regression branch and a classification branch, the regression branch outputs a target detection frame of the optical remote sensing sub-image to be detected, and the classification branch outputs a classification-intersection ratio of the optical remote sensing sub-image to be detected.
2. The YOLOv 5-based optical remote sensing image target detection method of claim 1, wherein the YOLOv5 target detection model is obtained based on a plurality of training image samples and label training corresponding to each training image sample, wherein the labels comprise target coordinate labels and target classification labels.
3. The YOLOv 5-based optical remote sensing image object detection method of claim 2, wherein the object classification label is a consecutive number between 0 and 1.
4. The YOLOv 5-based optical remote sensing image target detection method of claim 2, wherein the classification loss function of the YOLOv5 target detection model is:
Loss=-|y-σ| β ((1-y)log(1-σ)+ylogσ);
wherein y represents a classification-to-cross ratio, sigma represents the output of a classification branch of the detection head, and beta represents a frequency modulation factor;
the calculation formula of the classification-intersection ratio of the training image sample is as follows: y=a×l;
wherein a represents an intersection ratio IoU of the predicted coordinates of the training image sample and the corresponding target coordinate label, the predicted coordinates are obtained by decoding the output of the regression branch of the detection head, and l represents the target classification label of the training image sample.
5. The YOLOv 5-based optical remote sensing image target detection method according to claim 1, wherein the step 4 comprises:
step 4.1: filtering and re-processing the sub-image detection result;
step 4.2: and merging the sub-image detection results after the filtering and de-duplication treatment to obtain the detection result of the optical remote sensing image to be detected.
CN202210909740.1A 2022-07-29 2022-07-29 YOLOv 5-based optical remote sensing image target detection method Active CN115272242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210909740.1A CN115272242B (en) 2022-07-29 2022-07-29 YOLOv 5-based optical remote sensing image target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210909740.1A CN115272242B (en) 2022-07-29 2022-07-29 YOLOv 5-based optical remote sensing image target detection method

Publications (2)

Publication Number Publication Date
CN115272242A CN115272242A (en) 2022-11-01
CN115272242B true CN115272242B (en) 2024-02-27

Family

ID=83746239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210909740.1A Active CN115272242B (en) 2022-07-29 2022-07-29 YOLOv 5-based optical remote sensing image target detection method

Country Status (1)

Country Link
CN (1) CN115272242B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385903B (en) * 2023-05-29 2023-09-19 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Anti-distortion on-orbit target detection method and model for 1-level remote sensing data

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633661A (en) * 2019-08-31 2019-12-31 南京理工大学 Semantic segmentation fused remote sensing image target detection method
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN111126202A (en) * 2019-12-12 2020-05-08 天津大学 Optical remote sensing image target detection method based on void feature pyramid network
WO2020232905A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium
CN112364719A (en) * 2020-10-23 2021-02-12 西安科锐盛创新科技有限公司 Method for rapidly detecting remote sensing image target
CN112507777A (en) * 2020-10-10 2021-03-16 厦门大学 Optical remote sensing image ship detection and segmentation method based on deep learning
CN112668390A (en) * 2020-11-17 2021-04-16 福建省星云大数据应用服务有限公司 High-efficiency single remote sensing image target detection method and system
CN113569194A (en) * 2021-06-10 2021-10-29 中国人民解放军海军工程大学 Rotating rectangular box representation and regression method for target detection
CN114359565A (en) * 2021-12-14 2022-04-15 阿里巴巴(中国)有限公司 Image detection method, storage medium and computer terminal

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020232905A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium
CN110633661A (en) * 2019-08-31 2019-12-31 南京理工大学 Semantic segmentation fused remote sensing image target detection method
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111126202A (en) * 2019-12-12 2020-05-08 天津大学 Optical remote sensing image target detection method based on void feature pyramid network
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN112507777A (en) * 2020-10-10 2021-03-16 厦门大学 Optical remote sensing image ship detection and segmentation method based on deep learning
CN112364719A (en) * 2020-10-23 2021-02-12 西安科锐盛创新科技有限公司 Method for rapidly detecting remote sensing image target
CN112668390A (en) * 2020-11-17 2021-04-16 福建省星云大数据应用服务有限公司 High-efficiency single remote sensing image target detection method and system
CN113569194A (en) * 2021-06-10 2021-10-29 中国人民解放军海军工程大学 Rotating rectangular box representation and regression method for target detection
CN114359565A (en) * 2021-12-14 2022-04-15 阿里巴巴(中国)有限公司 Image detection method, storage medium and computer terminal

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Improved YOLOv5 with BiFPN on PCB Defect Detection;Xiaoqi Wang 等;2021 2nd International Conference on Artificial Intelligence and Computer Engineering (ICAICE);摘要,第I-IV节 *
基于 YOLOX-Tiny 的轻量级遥感图像目标检测模型;郎磊 等;激光与光电子学进展;1-18 *
基于改进 YOLOV5算法的交警手势识别;王新 等;电子测量技术;摘要,第0-4节 *
基于改进YOLOv5s的光学遥感图像舰船分类检测方法;周旗开 等;激光与光电子学进展;摘要,第1-5节 *
基于自适应特征增强的小目标检测网络;吴萌萌 等;激光与光电子学进展;1-14 *

Also Published As

Publication number Publication date
CN115272242A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
WO2020221298A1 (en) Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
EP3964998A1 (en) Text processing method and model training method and apparatus
CN110263786B (en) Road multi-target identification system and method based on feature dimension fusion
CN110796199B (en) Image processing method and device and electronic medical equipment
CN111651474B (en) Method and system for converting natural language into structured query language
CN111931859B (en) Multi-label image recognition method and device
CN107992937B (en) Unstructured data judgment method and device based on deep learning
CN112381763A (en) Surface defect detection method
CN114187311A (en) Image semantic segmentation method, device, equipment and storage medium
CN111612789A (en) Defect detection method based on improved U-net network
CN109766918B (en) Salient object detection method based on multilevel context information fusion
CN115272242B (en) YOLOv 5-based optical remote sensing image target detection method
CN112381837A (en) Image processing method and electronic equipment
CN110533068B (en) Image object identification method based on classification convolutional neural network
CN115393606A (en) Method and system for image recognition
CN114373092A (en) Progressive training fine-grained vision classification method based on jigsaw arrangement learning
CN112085164A (en) Area recommendation network extraction method based on anchor-frame-free network
CN111340124A (en) Method and device for identifying entity category in image
CN115830342A (en) Method and device for determining detection frame, storage medium and electronic device
US9378466B2 (en) Data reduction in nearest neighbor classification
CN115457385A (en) Building change detection method based on lightweight network
CN111126513B (en) Universal object real-time learning and recognition system and learning and recognition method thereof
CN113032612A (en) Construction method of multi-target image retrieval model, retrieval method and device
Yang et al. Road Damage Detection and Classification Based on Multi-Scale Contextual Features
CN113963150B (en) Pedestrian re-identification method based on multi-scale twin cascade network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant