CN115272242A - YOLOv 5-based optical remote sensing image target detection method - Google Patents

YOLOv 5-based optical remote sensing image target detection method Download PDF

Info

Publication number
CN115272242A
CN115272242A CN202210909740.1A CN202210909740A CN115272242A CN 115272242 A CN115272242 A CN 115272242A CN 202210909740 A CN202210909740 A CN 202210909740A CN 115272242 A CN115272242 A CN 115272242A
Authority
CN
China
Prior art keywords
remote sensing
optical remote
features
detected
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210909740.1A
Other languages
Chinese (zh)
Other versions
CN115272242B (en
Inventor
侯彪
李智德
汤奇
任仲乐
任博
杨晨
焦李成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210909740.1A priority Critical patent/CN115272242B/en
Publication of CN115272242A publication Critical patent/CN115272242A/en
Application granted granted Critical
Publication of CN115272242B publication Critical patent/CN115272242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention relates to a method for detecting an optical remote sensing image target based on YOLOv5, which is characterized by comprising the following steps: step 1: acquiring an optical remote sensing image to be detected, wherein the optical remote sensing image to be detected contains a target to be detected; step 2: cutting the optical remote sensing image to be detected into a plurality of optical remote sensing subimages to be detected; and step 3: inputting the optical remote sensing subimages to be detected into a previously trained YOLOv5 target detection model to obtain corresponding subimage detection results, wherein the detection results comprise a target detection frame and a classification-intersection ratio; and 4, step 4: merging the sub-image detection results to obtain a detection result of the optical remote sensing image to be detected; the YOLOv5 target detection model comprises a backbone network, a neck network and a detection head which are connected in cascade, wherein the neck network is a CSP-BiFPN network. The method for detecting the optical remote sensing image target based on the YOLOv5 has higher detection precision and stronger capability of distinguishing the targets with different scales.

Description

YOLOv 5-based optical remote sensing image target detection method
Technical Field
The invention belongs to the technical field of optical remote sensing image airplane detection, and particularly relates to an optical remote sensing image target detection method based on YOLOv 5.
Background
Traditional target detection methods, which are usually designed based on manually extracted features, are usually specific to a specific scene and require a lot of parameter optimization, so that the methods have poor performance in generalization. In the face of optical remote sensing images with increasingly complex scenes, the traditional method is not applicable.
With the rapid development of deep learning, the generalization performance of the features extracted by using the convolutional neural network is far superior to that of the features extracted by the traditional manual method. Current object detection models typically include two parts: and part of the network is used for extracting features, and the network comprises a backbone network and a neck network, wherein the backbone network is usually pre-trained on a large-scale picture data set to obtain better generalization capability and stronger feature extraction function, and the neck network fuses feature layers with different down-sampling multiplying factors to identify and position targets with different sizes. The other part is a detection head, and classification and coordinate regression are carried out by utilizing the extracted feature network.
The target detection model can be divided into a single-stage model and a two-stage model according to whether a pre-extraction candidate region exists or not. The two-stage model is represented by fast-RCNN, the precision of the two-stage model is improved, meanwhile, the speed of inference is not as high as that of a single-stage model, and meanwhile, an Anchor frame Anchor needs to be set manually by an RPN network. The most representative models in the single-stage model are YOLO, SSD and RetinaNet. The method of the single-stage model is simple compared with the two-stage model, only classification and regression are carried out on the extracted feature map, and the method has the advantages that the reasoning speed is relatively high, but the precision is slightly worse than that of the two-stage model.
The neck networks adopted by most of current target detection models are a feature pyramid network FPN and a path aggregation network PAN, including a Yolov5 model, and the neck networks have the problems that input features with different down-sampling multiplying ratios are simply added, and the contribution degree of the input features to the final fusion features is not considered, so that the advantages of the fusion features are not fully exerted, and the capability of the model for distinguishing targets with different scales is reduced. In addition, the classification branch and the regression branch in the current target detection model detection head are usually independent and are not directly connected, so that when the predicted classification score is high, the predicted detection frame deviation is large, or when the predicted detection frame is accurate, the classification score is low, and the model precision is low.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an optical remote sensing image target detection method based on YOLOv 5. The technical problem to be solved by the invention is realized by the following technical scheme:
the invention provides an optical remote sensing image target detection method based on YOLOv5, which comprises the following steps:
step 1: acquiring an optical remote sensing image to be detected, wherein the optical remote sensing image to be detected contains a target to be detected;
step 2: cutting the optical remote sensing image to be detected into a plurality of optical remote sensing subimages to be detected;
and step 3: inputting the optical remote sensing subimages to be detected into a previously trained YOLOv5 target detection model to obtain corresponding subimage detection results, wherein the detection results comprise a target detection frame and a classification-intersection ratio;
and 4, step 4: merging the sub-image detection results to obtain a detection result of the optical remote sensing image to be detected;
the YOLOv5 target detection model comprises a backbone network, a neck network and a detection head which are connected in a cascade mode, wherein the neck network is a CSP-BiFPN network.
In an embodiment of the invention, the backbone network is configured to perform feature extraction on the input optical remote sensing subimage to be detected to obtain a high-level semantic feature, a middle-level semantic feature and a low-level semantic feature corresponding to the optical remote sensing subimage to be detected;
the neck network is used for fusing the high-level semantic features, the middle-level semantic features and the low-level semantic features to obtain high-level semantic fusion features, middle-level semantic fusion features and low-level semantic fusion features;
the detection head is used for determining and outputting the detection result of the optical remote sensing subimage to be detected according to the high-level semantic fusion characteristic, the middle-level semantic fusion characteristic and the low-level semantic fusion characteristic.
In one embodiment of the present invention, the CSP-bipfn network includes a plurality of CSP-bipfn subnetworks connected in series, the CSP-bipfn subnetworks including a higher level CSP2-n unit, a middle level first CSP2-n unit, a middle level second CSP2-n unit, and a lower level CSP2-n unit, wherein,
the low-level CSP2-n unit performs feature fusion on the low-level semantic features and the intermediate fusion features subjected to the up-sampling operation to obtain the low-level semantic fusion features;
the middle-layer first CSP2-n unit performs feature fusion on the high-layer semantic features subjected to the up-sampling operation and the middle-layer semantic features to obtain middle fusion features;
the middle layer second CSP2-n unit performs feature fusion on the middle layer semantic features, the middle fusion features and the low layer semantic fusion features subjected to down-sampling operation to obtain middle layer semantic fusion features;
and the high-level CSP2-n unit performs feature fusion on the high-level semantic features and the middle-level semantic fusion features subjected to down-sampling operation to obtain the high-level semantic fusion features.
In one embodiment of the invention, the detection head comprises a regression branch and a classification branch, the regression branch outputs a target detection frame of the optical remote sensing sub-image to be detected, and the classification branch outputs a classification-intersection ratio of the optical remote sensing sub-image to be detected.
In one embodiment of the present invention, the YOLOv5 target detection model is obtained based on a plurality of training image samples and a label training corresponding to each training image sample, where the label includes a target coordinate label and a target classification label.
In one embodiment of the invention, the object classification labels are consecutive numbers between 0 and 1.
In one embodiment of the present invention, the classification loss function of the YOLOv5 target detection model is:
Loss=-|y-σ|β((1-y)log(1-σ)+y logσ);
wherein y represents the classification-cross-over ratio, σ represents the output of the classification branch of the detection head, and β represents the frequency modulation factor;
the calculation formula of the classification-intersection ratio of the training image samples is as follows: y = a × l;
wherein, A represents the intersection ratio IoU of the predicted coordinate of the training image sample and the corresponding target coordinate label, the predicted coordinate is obtained by decoding the output of the regression branch of the detection head, and l represents the target classification label of the training image sample.
In one embodiment of the present invention, the step 4 comprises:
step 4.1: filtering out and reprocessing the sub-image detection result;
and 4.2: and combining the sub-image detection results after the filtering and de-duplication processing to obtain the detection result of the optical remote sensing image to be detected.
Compared with the prior art, the invention has the beneficial effects that:
1. the method for detecting the target of the optical remote sensing image based on the YOLOv5 completes the target detection of the optical remote sensing image by utilizing a trained YOLOv5 target detection model, wherein a neck network of the YOLOv5 target detection model is a CSP-BiFPN network, and a bidirectional characteristic pyramid network BiFPN and a cross-level part connecting CSP convolution are combined. The BiFPN network introduces the importance of learning parameters for automatically learning different down-sampling multiplying power characteristics, makes full use of information among the different down-sampling multiplying power characteristics, improves the discrimination of each level characteristic, and can better distinguish targets with different sizes in optical remote sensing images. The CSP convolution refers to a mode of bottleneck connection, divides input features into two features according to channels, extracts the features of the input features through different convolution operations, then splices the two features according to the channels, and finally connects a convolution layer to extract the features again, so that the extraction capability of the features is greatly enhanced, semantic fusion features with more identification degrees are generated, and powerful support is provided for further distinguishing targets with different scales by the model.
2. According to the YOLOv 5-based optical remote sensing image target detection method, the detection head classification branch of the YOLOv5 target detection model is converted from direct prediction type information into prediction classification-intersection ratio, the relevance between the classification branch output and the regression branch output is further enhanced, and the classification information and the coordinate regression information are combined by using a new loss function guide model, so that the detection precision of the model is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
Fig. 1 is a schematic diagram of an optical remote sensing image target detection method based on YOLOv5 according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for detecting an optical remote sensing image target based on YOLOv5 according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a CSP-BiFPN subnetwork provided in the embodiment of the present invention;
fig. 4 is a schematic structural diagram of a CSP2-n unit according to an embodiment of the present invention.
Detailed Description
In order to further explain the technical means and effects of the present invention adopted to achieve the predetermined object, the following describes in detail an optical remote sensing image target detection method based on YOLOv5 according to the present invention with reference to the accompanying drawings and the detailed description.
The foregoing and other technical matters, features and effects of the present invention will be apparent from the following detailed description of the embodiments, which is to be read in connection with the accompanying drawings. The technical means and effects of the present invention adopted to achieve the predetermined purpose can be more deeply and specifically understood through the description of the specific embodiments, however, the attached drawings are provided for reference and description only and are not used for limiting the technical scheme of the present invention.
Example one
Referring to fig. 1 and fig. 2, fig. 1 is a schematic diagram of an optical remote sensing image target detection method based on YOLOv5 according to an embodiment of the present invention; fig. 2 is a flowchart of an optical remote sensing image target detection method based on YOLOv5 according to an embodiment of the present invention. As shown in the figure, the method for detecting an optical remote sensing image target based on YOLOv5 in the embodiment includes:
step 1: acquiring an optical remote sensing image to be detected, wherein the optical remote sensing image to be detected contains a target to be detected;
step 2: cutting the optical remote sensing image to be detected into a plurality of optical remote sensing subimages to be detected;
specifically, the optical remote sensing image to be detected is cut, so that the cut image is adapted to the YOLOv5 target detection model, and in this embodiment, the optical remote sensing image to be detected is cut into optical remote sensing sub-images to be detected with sizes of 1024 × 1024.
And step 3: inputting an optical remote sensing subimage to be detected into a YOLOv5 target detection model which is trained in advance to obtain a corresponding subimage detection result, wherein the detection result comprises a target detection frame and a classification-intersection ratio;
in this embodiment, the YOLOv5 target detection model includes a backbone network, a neck network, and a detection head in cascade.
The backbone network is used for extracting features of input optical remote sensing subimages to be detected to obtain high-level semantic features, middle-level semantic features and low-level semantic features corresponding to the optical remote sensing subimages to be detected.
The features output by the network layer of the backbone network are divided into low-level semantic features, middle-level semantic features and high-level semantic features according to the difference of the down-sampling multiplying power. The lower-layer semantic features represent features output by a network layer with a lower down-sampling multiplying power, the higher-layer semantic features represent features output by a network layer with a higher down-sampling multiplying power, the middle-layer semantic features represent features output by a network layer with a down-sampling multiplying power between the lower-layer semantic features and the sampling multiplying power of the higher-layer semantic features, and the deeper the network extracts the more semantic information.
In this embodiment, the low-level semantic feature, the middle-level semantic feature, the high-level semantic feature, and the feature representing the network layer output of the backbone network with the down-sampling magnification of 8, 16, and 32, respectively.
In this embodiment, the backbone network of the YOLOv5 target detection model is a backbone network of the YOLOv5 itself, and the specific structure is not described herein again.
Further, the neck network is used for fusing the high-level semantic features, the middle-level semantic features and the low-level semantic features to obtain high-level semantic fusion features, middle-level semantic fusion features and low-level semantic fusion features.
In this embodiment, the neck network of the YOLOv5 target detection model is a CSP-bipfn network. The CSP-BiFPN network comprises a plurality of CSP-BiFPN subnetworks which are connected in series, and each CSP-BiFPN subnetwork comprises a high-level CSP2-n unit, a middle-level first CSP2-n unit, a middle-level second CSP2-n unit and a low-level CSP2-n unit.
In this embodiment, the CSP-bipfn network includes three CSP-bipfn subnetworks with the same structure, that is, three CSP-bipfn subnetworks with the same structure are connected in series to obtain a complete CSP-bipfn network, and the structure of the CSP-bipfn subnetwork is shown in fig. 3.
Due to the fact that the backbone network of YOLOv5 outputs three-layer characteristics, namely, outputs: high-level semantic features, middle-level semantic features, and low-level semantic features. Therefore, the neck network design of the present embodiment also outputs the corresponding three-layer fusion features, that is, outputs: a high-level semantic fusion feature, a middle-level semantic fusion feature, and a low-level semantic fusion feature.
That is, the CSP-bipfn sub-network receives as input three feature maps of different down-sampling magnifications, and also outputs a fused feature map of three different down-sampling magnifications, where the down-sampling magnifications between the input and the output are in one-to-one correspondence.
Specifically, for a CSP-BiFPN sub-network connected with a backbone network, a low-level CSP2-n unit performs feature fusion on low-level semantic features and intermediate fusion features subjected to up-sampling operation to obtain low-level semantic fusion features; the middle-layer first CSP2-n unit performs feature fusion on the high-layer semantic features and the middle-layer semantic features subjected to the upsampling operation to obtain middle fusion features; the middle layer second CSP2-n unit performs feature fusion on the middle layer semantic features, the middle fusion features and the low layer semantic fusion features subjected to down-sampling operation to obtain middle layer semantic fusion features; and the high-level CSP2-n unit performs feature fusion on the high-level semantic features and the middle-level semantic fusion features subjected to the down-sampling operation to obtain high-level semantic fusion features.
The calculation formula of the intermediate fusion feature is described by taking the intermediate fusion feature as an example, and the calculation formula of the intermediate fusion feature is as follows:
Figure BDA0003773678600000081
wherein CSPs 2-n represent the middle first CSP2-n units, FhighRepresenting high-level semantic features of the input, FmidRepresents the middle level semantic features of the input, ε =0.0001, UP represents the bilinear interpolation upsampling operation, w1Representing the contribution weight, w, of high-level semantic features to intermediate fusion features2Representing the contribution weight of the middle-level semantic features to the middle-level fusion features.
In this embodimentIn, w1And w2The initial value of the learnable parameter in the training process of the Yolov5 target detection model is 1.
Similarly, the contribution weight of the low-level semantic features and the contribution weight of the intermediate fusion features to the low-level semantic fusion features are required to be set; setting contribution weights corresponding to the middle-layer semantic fusion features, the middle-layer semantic fusion features and the low-layer semantic fusion features; and setting contribution weights of the high-level semantic features and the middle-level semantic fusion features corresponding to the high-level semantic fusion features. In this embodiment, the above 7 weights are also used as learnable parameters in the YOLOv5 target detection model training process, and the initial value thereof is 1.
It should be noted that the process of fusing the input features by other CSP-bipfn subnetworks in the CSP-bipfn network is similar to that described above, and will not be described herein again. Similarly, a corresponding weight value needs to be set as a learnable parameter in the YOLOv5 target detection model training process, and the initial value is set to 1.
Specifically, a schematic structural diagram of the CSP2-n unit is shown in fig. 4, the CSP2-n unit is a mode of bottleneck connection, and includes a first branch and a second branch, an input feature is divided into two features according to a channel, the two features are respectively input into the first branch and the second branch, the feature input into the first branch is subjected to convolution operation by one CBH convolutional layer, then subjected to convolution operation by 3 cascaded CBH convolutional layers, and finally output by a two-dimensional convolutional layer (Conv 2D); inputting the characteristics of the second branch circuit, and performing convolution operation on the characteristics through a two-dimensional convolution layer to output the characteristics; and after the output characteristics of the two branches are spliced in the channel dimension, finally inputting the output characteristics into a BN layer (batch standardization layer), an L-ReLu layer and a CBH convolution layer in sequence to extract the characteristics again.
In this embodiment, the CBH convolutional layer includes a cascaded two-dimensional convolutional layer, a BN layer, and an H-swish layer. The convolution kernels of the last CBH convolutional layer in the CSP2-n unit are 256, the convolution kernels of the rest CBH convolutional layers are all 128, and the sizes of all the convolution kernels are 3 × 3.L-ReLU and H-swish represent activation functions.
The neck network of the YOLOv5 target detection model in this embodiment is a CSP-bipfn network, and the bidirectional feature pyramid network bipfn and the cross-level partial connection CSP convolution are combined. The BiFPN network introduces the importance of learning parameters for automatically learning different down-sampling multiplying power characteristics, makes full use of information among the different down-sampling multiplying power characteristics, improves the discrimination of each level characteristic, and can better distinguish targets with different sizes in optical remote sensing images. The CSP convolution refers to a mode of bottleneck connection, divides input features into two features according to channels, extracts the features of the input features through different convolution operations, then splices the two features according to the channels, and finally connects a convolution layer to extract the features again, so that the extraction capability of the features is greatly enhanced, semantic fusion features with more identification degrees are generated, and powerful support is provided for further distinguishing targets with different scales by the model.
Furthermore, the detection head is used for determining and outputting a detection result of the optical remote sensing subimage to be detected according to the high-level semantic fusion characteristic, the middle-level semantic fusion characteristic and the low-level semantic fusion characteristic.
In this embodiment, the detection head includes a regression branch and a classification branch, the regression branch outputs a target detection frame of the optical remote sensing subimage to be detected, and the classification branch outputs a classification-cross-over ratio of the optical remote sensing subimage to be detected. Wherein the class-intersection ratio represents a joint distribution between a predicted class of the object and an intersection ratio of its predicted coordinates and real coordinates.
For the sake of clarity, the following describes an exemplary training process of the YOLOv5 target detection model: firstly, a training data set is obtained, wherein the training data set comprises a plurality of training image samples, and target coordinate labels and target classification labels corresponding to the image samples. It should be noted that unlike YOLOv5, which is a discrete number of 0 and 1 as the original target classification label, in this embodiment, the target classification label corresponding to the training image sample is a continuous number between 0 and 1. Inputting a training data set into the above-described YOLOv5 target detection model, calculating a loss value of the YOLOv5 target detection model in training by using a loss function, and optimizing model parameters by using a random gradient descent (SGD) optimizer, wherein the model parameters comprise a network structure parameter of YOLOv5 and a set contribution weight value to each fusion feature; when the calculated loss value after a certain batch of training image samples are input into the YOLOv5 target detection model is smaller than a preset threshold value, the YOLOv5 target detection model is considered to be converged, and the training is completed.
In this embodiment, the classification loss function of the YOLOv5 target detection model is a mass focus loss function:
Loss=-|y-σ|β((1-y)log(1-σ)+y logσ)(2)
where y denotes a classification-cross ratio, σ denotes an output of a classification branch of the detection head, and β denotes a frequency modulation factor, and is normally set to 2.
The calculation formula of the classification-intersection ratio of the training image samples is as follows: y = a × l;
wherein, A represents the intersection ratio IoU of the predicted coordinate of the training image sample and the corresponding target coordinate label, the predicted coordinate is obtained by decoding the output of the regression branch of the detection head, and l represents the target classification label of the training image sample.
In this embodiment, the YOLOv5 detection head classification branch is converted from direct prediction category information into a prediction classification-cross-merge ratio, so that the relevance between the classification branch output and the regression branch output is further enhanced, and the original binary cross entropy loss function is replaced by a quality focus loss function. The quality focus loss function is designed for enabling the model to better learn the prediction classification-intersection ratio, and the guide model combines the classification information and the coordinate regression information, so that the accuracy of the model is improved.
And 4, step 4: merging the sub-image detection results to obtain a detection result of the optical remote sensing image to be detected;
specifically, step 4 comprises:
step 4.1: filtering out the sub-image detection result to be processed again;
in this embodiment, for the sub-image detection result, the target with a smaller prediction probability is filtered out, the target with a larger probability is retained, and the overlapped prediction target is removed by a non-maximum suppression algorithm.
Step 4.2: and merging the sub-image detection results after the filtering and de-duplication processing to obtain the detection result of the optical remote sensing image to be detected.
According to the method for detecting the target of the optical remote sensing image based on the YOLOv5, the trained YOLOv5 target detection model is used for detecting the target of the optical remote sensing image, the neck network of the YOLOv5 target detection model is a CSP-BiFPN network, on one hand, learnable parameters are introduced into the CSP-BiFPN network to automatically learn the importance of different downsampling multiplying power characteristics, the information among the different downsampling multiplying power characteristics is fully utilized, the discrimination of each hierarchical characteristic is improved, and the targets with different sizes in the optical remote sensing image can be better distinguished; on the other hand, the CSP-BiFPN network greatly enhances the feature extraction capability, generates semantic fusion features with more identification degree, and provides powerful support for further distinguishing targets with different scales for the model.
In addition, in the method for detecting the target of the optical remote sensing image based on YOLOv5 of the embodiment, the YOLOv5 detection head classification branch is converted from direct prediction type information into a prediction classification-intersection ratio, and the relevance between the classification branch output and the regression branch output is further enhanced, so that the accuracy of the model is improved.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or device that comprises a list of elements does not include only those elements but may include other elements not expressly listed. Without further limitation, an element defined by the phrases "comprising one of \8230;" does not exclude the presence of additional like elements in an article or device comprising the element. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (8)

1. A method for detecting an optical remote sensing image target based on YOLOv5 is characterized by comprising the following steps:
step 1: acquiring an optical remote sensing image to be detected, wherein the optical remote sensing image to be detected contains a target to be detected;
step 2: cutting the optical remote sensing image to be detected into a plurality of optical remote sensing subimages to be detected;
and step 3: inputting the optical remote sensing subimages to be detected into a previously trained YOLOv5 target detection model to obtain corresponding subimage detection results, wherein the detection results comprise a target detection frame and a classification-intersection ratio;
and 4, step 4: merging the sub-image detection results to obtain a detection result of the optical remote sensing image to be detected;
the YOLOv5 target detection model comprises a backbone network, a neck network and a detection head which are cascaded, wherein the neck network is a CSP-BiFPN network.
2. The YOLOv 5-based optical remote sensing image target detection method according to claim 1,
the backbone network is used for extracting the characteristics of the input optical remote sensing subimages to be detected to obtain high-level semantic characteristics, middle-level semantic characteristics and low-level semantic characteristics corresponding to the optical remote sensing subimages to be detected;
the neck network is used for fusing the high-level semantic features, the middle-level semantic features and the low-level semantic features to obtain high-level semantic fusion features, middle-level semantic fusion features and low-level semantic fusion features;
the detection head is used for determining and outputting the detection result of the optical remote sensing subimage to be detected according to the high-level semantic fusion characteristic, the middle-level semantic fusion characteristic and the low-level semantic fusion characteristic.
3. The method for target detection based on YOLOv5 optical remote sensing image as claimed in claim 2, wherein the CSP-BiFPN network comprises a plurality of CSP-BiFPN sub-networks connected in series, the CSP-BiFPN sub-networks comprise a high-level CSP2-n unit, a middle-level first CSP2-n unit, a middle-level second CSP2-n unit and a low-level CSP2-n unit, wherein,
the low-level CSP2-n unit performs feature fusion on the low-level semantic features and the intermediate fusion features subjected to the up-sampling operation to obtain the low-level semantic fusion features;
the middle-layer first CSP2-n unit performs feature fusion on the high-layer semantic features and the middle-layer semantic features subjected to the up-sampling operation to obtain middle fusion features;
the middle layer second CSP2-n unit performs feature fusion on the middle layer semantic features, the middle fusion features and the low layer semantic fusion features subjected to down-sampling operation to obtain middle layer semantic fusion features;
and the high-layer CSP2-n unit performs feature fusion on the high-layer semantic features and the middle-layer semantic fusion features subjected to down-sampling operation to obtain the high-layer semantic fusion features.
4. The YOLOv 5-based optical remote sensing image target detection method as claimed in claim 2, wherein the detection head comprises a regression branch and a classification branch, the regression branch outputs a target detection frame of the optical remote sensing sub-image to be detected, and the classification branch outputs a classification-intersection ratio of the optical remote sensing sub-image to be detected.
5. The method for detecting the target of the YOLOv 5-based optical remote sensing image according to claim 1, wherein the YOLOv5 target detection model is obtained by training based on a plurality of training image samples and a label corresponding to each training image sample, and the label comprises a target coordinate label and a target classification label.
6. The YOLOv 5-based optical remote sensing image target detection method according to claim 5, wherein the target classification label is a continuity number between 0 and 1.
7. The method for detecting the target of the YOLOv 5-based optical remote sensing image as claimed in claim 5, wherein the classification loss function of the YOLOv5 target detection model is as follows:
Loss=-|y-σ|β((1-y)log(1-σ)+ylogσ);
wherein y represents the classification-cross-over ratio, σ represents the output of the classification branch of the detection head, and β represents the frequency modulation factor;
the calculation formula of the classification-intersection ratio of the training image samples is as follows: y = a × l;
wherein, A represents the intersection ratio IoU of the predicted coordinate of the training image sample and the corresponding target coordinate label, the predicted coordinate is obtained by decoding the output of the regression branch of the detection head, and l represents the target classification label of the training image sample.
8. The method for detecting the target of the YOLOv 5-based optical remote sensing image as claimed in claim 1, wherein the step 4 comprises:
step 4.1: filtering and reprocessing the sub-image detection result;
and 4.2: and merging the sub-image detection results after the filtering and de-duplication processing to obtain the detection result of the optical remote sensing image to be detected.
CN202210909740.1A 2022-07-29 2022-07-29 YOLOv 5-based optical remote sensing image target detection method Active CN115272242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210909740.1A CN115272242B (en) 2022-07-29 2022-07-29 YOLOv 5-based optical remote sensing image target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210909740.1A CN115272242B (en) 2022-07-29 2022-07-29 YOLOv 5-based optical remote sensing image target detection method

Publications (2)

Publication Number Publication Date
CN115272242A true CN115272242A (en) 2022-11-01
CN115272242B CN115272242B (en) 2024-02-27

Family

ID=83746239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210909740.1A Active CN115272242B (en) 2022-07-29 2022-07-29 YOLOv 5-based optical remote sensing image target detection method

Country Status (1)

Country Link
CN (1) CN115272242B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385903A (en) * 2023-05-29 2023-07-04 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Anti-distortion on-orbit target detection method and model for 1-level remote sensing data

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633661A (en) * 2019-08-31 2019-12-31 南京理工大学 Semantic segmentation fused remote sensing image target detection method
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN111126202A (en) * 2019-12-12 2020-05-08 天津大学 Optical remote sensing image target detection method based on void feature pyramid network
WO2020232905A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium
CN112364719A (en) * 2020-10-23 2021-02-12 西安科锐盛创新科技有限公司 Method for rapidly detecting remote sensing image target
CN112507777A (en) * 2020-10-10 2021-03-16 厦门大学 Optical remote sensing image ship detection and segmentation method based on deep learning
CN112668390A (en) * 2020-11-17 2021-04-16 福建省星云大数据应用服务有限公司 High-efficiency single remote sensing image target detection method and system
CN113569194A (en) * 2021-06-10 2021-10-29 中国人民解放军海军工程大学 Rotating rectangular box representation and regression method for target detection
CN114359565A (en) * 2021-12-14 2022-04-15 阿里巴巴(中国)有限公司 Image detection method, storage medium and computer terminal

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020232905A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium
CN110633661A (en) * 2019-08-31 2019-12-31 南京理工大学 Semantic segmentation fused remote sensing image target detection method
CN110909642A (en) * 2019-11-13 2020-03-24 南京理工大学 Remote sensing image target detection method based on multi-scale semantic feature fusion
CN111126202A (en) * 2019-12-12 2020-05-08 天津大学 Optical remote sensing image target detection method based on void feature pyramid network
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN112507777A (en) * 2020-10-10 2021-03-16 厦门大学 Optical remote sensing image ship detection and segmentation method based on deep learning
CN112364719A (en) * 2020-10-23 2021-02-12 西安科锐盛创新科技有限公司 Method for rapidly detecting remote sensing image target
CN112668390A (en) * 2020-11-17 2021-04-16 福建省星云大数据应用服务有限公司 High-efficiency single remote sensing image target detection method and system
CN113569194A (en) * 2021-06-10 2021-10-29 中国人民解放军海军工程大学 Rotating rectangular box representation and regression method for target detection
CN114359565A (en) * 2021-12-14 2022-04-15 阿里巴巴(中国)有限公司 Image detection method, storage medium and computer terminal

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
XIAOQI WANG 等: "Improved YOLOv5 with BiFPN on PCB Defect Detection", 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTER ENGINEERING (ICAICE) *
吴萌萌 等: "基于自适应特征增强的小目标检测网络", 激光与光电子学进展, pages 1 - 14 *
周旗开 等: "基于改进YOLOv5s的光学遥感图像舰船分类检测方法", 激光与光电子学进展, pages 1 - 5 *
王新 等: "基于改进 YOLOV5算法的交警手势识别", 电子测量技术, pages 0 - 4 *
郎磊 等: "基于 YOLOX-Tiny 的轻量级遥感图像目标检测模型", 激光与光电子学进展, pages 1 - 18 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385903A (en) * 2023-05-29 2023-07-04 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Anti-distortion on-orbit target detection method and model for 1-level remote sensing data
CN116385903B (en) * 2023-05-29 2023-09-19 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Anti-distortion on-orbit target detection method and model for 1-level remote sensing data

Also Published As

Publication number Publication date
CN115272242B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN110610166B (en) Text region detection model training method and device, electronic equipment and storage medium
CN112380921A (en) Road detection method based on Internet of vehicles
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN113936256A (en) Image target detection method, device, equipment and storage medium
CN111651474B (en) Method and system for converting natural language into structured query language
CN111368636B (en) Object classification method, device, computer equipment and storage medium
CN111260666B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN111931859B (en) Multi-label image recognition method and device
CN111062451A (en) Image description generation method based on text guide graph model
CN111612789A (en) Defect detection method based on improved U-net network
CN112381837A (en) Image processing method and electronic equipment
CN114996511A (en) Training method and device for cross-modal video retrieval model
CN115272242B (en) YOLOv 5-based optical remote sensing image target detection method
CN111898608B (en) Natural scene multi-language character detection method based on boundary prediction
CN116258931B (en) Visual finger representation understanding method and system based on ViT and sliding window attention fusion
CN113780241B (en) Acceleration method and device for detecting remarkable object
CN116188361A (en) Deep learning-based aluminum profile surface defect classification method and device
CN115457385A (en) Building change detection method based on lightweight network
CN115270754A (en) Cross-modal matching method, related device, electronic equipment and storage medium
CN114140806A (en) End-to-end real-time exercise detection method
CN115063831A (en) High-performance pedestrian retrieval and re-identification method and device
CN114743045A (en) Small sample target detection method based on double-branch area suggestion network
CN113223006A (en) Lightweight target semantic segmentation method based on deep learning
Yang et al. Road Damage Detection and Classification Based on Multi-Scale Contextual Features
CN113963150B (en) Pedestrian re-identification method based on multi-scale twin cascade network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant