CN113971764A - Remote sensing image small target detection method based on improved YOLOv3 - Google Patents

Remote sensing image small target detection method based on improved YOLOv3 Download PDF

Info

Publication number
CN113971764A
CN113971764A CN202111269827.9A CN202111269827A CN113971764A CN 113971764 A CN113971764 A CN 113971764A CN 202111269827 A CN202111269827 A CN 202111269827A CN 113971764 A CN113971764 A CN 113971764A
Authority
CN
China
Prior art keywords
training
yolov3
remote sensing
network
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111269827.9A
Other languages
Chinese (zh)
Other versions
CN113971764B (en
Inventor
李国强
常轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202111269827.9A priority Critical patent/CN113971764B/en
Publication of CN113971764A publication Critical patent/CN113971764A/en
Application granted granted Critical
Publication of CN113971764B publication Critical patent/CN113971764B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image small target detection method based on improved YOLOv3, which belongs to the technical field of deep learning and target detection and comprises data set preprocessing; optimizing a YOLOv3 network, and adding a cavity convolution group module, a feature enhancement module and a channel attention mechanism module into the Neck; enhancing online data; forward reasoning; improving a loss function; and selecting a YOLOv3 network model with the highest detection precision and recall rate on the verification set to load into the network. According to the invention, the cavity convolution group module, the characteristic strengthening module and the channel attention mechanism module are added in the original YOLOv3 network to improve the YOLOv3 detection network by improving the loss function, so that the performance is obviously improved, the target detection in the remote sensing image is more comprehensive and higher in precision, and the training speed and the overall detection precision are improved.

Description

Remote sensing image small target detection method based on improved YOLOv3
Technical Field
The invention relates to the field of deep learning and target detection, in particular to a remote sensing image small target detection method based on improved YOLOv 3.
Background
With the development of deep learning and neural networks, computer vision has developed at a rapid pace. In the field, target detection and identification technologies are widely researched and applied to practice, and great convenience is brought to life of people. For example, the method is applied to unmanned aerial vehicles, can automatically identify specific targets in remote sensing images, and can replace manual work to efficiently complete repeated work and the like. However, in many target detection efforts, the following problems exist:
1. the target is mostly small in size and only has dozens of pixel points, so that the target is not beneficial to searching and identifying;
2. the background is complicated, and interference factor is many, if shoot the angle, illumination change, similar target, object shelter from scheduling problem, lead to erroneous judgement easily, be unfavorable for the detection.
After comparing several conventional target detection networks commonly used at present, a YOLOv3(You Only Look one v3) detection algorithm is selected and used, the algorithm detects a speed block, the recognition accuracy is high, and better detection performance can be obtained by improving on the basis of the algorithm. The improvement idea of the network is as follows:
1. in the basic backbone network, as the depth increases, although the field of view is enlarged after each down-sampling, the image size is reduced, the resolution is reduced, and even though the up-sampling is performed, a certain target feature is lost. This can be solved by hole convolution: the receptive field is enlarged, and the model can observe a larger range in the picture, so that the target can be detected more comprehensively; larger size feature maps will contain more target information, which is beneficial for localization and classification.
2. When people look at things, the people focus on interested parts after observing the whole area, and more useful information is obtained from a complex background, namely an attention mechanism algorithm. Computer vision has many similarities with human vision, and the basic idea is to let machines learn to eliminate influencing factors and capture key information. The algorithm is used for detection, and the accuracy is improved to a certain extent.
3. In terms of a loss function between a real box and a prediction box of YOLOv3, the coordinate loss adopts the sum of the squares of the mean values, the confidence coefficient loss and the category loss adopt cross entropy, and the two are added to obtain an overall error. When the method is used for calculating the loss, the position relation and the contact ratio of the two frames and the direction in which the predicted frames need to be closed cannot be comprehensively reflected, so that a loss function needs to be improved.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a remote sensing image small target detection method based on improved YOLOv3, the improved YOLOv3 detection network has obviously improved performance, more comprehensive target detection in the remote sensing image, higher precision and improved training speed.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a remote sensing image small target detection method based on improved YOLOv3 comprises the following steps:
step 1, preprocessing a data set: acquiring a training remote sensing image to form a data set, carrying out format conversion on the data set, randomly dividing the data set into a training verification set and a test set, and evaluating a YOLOv3 network model in a cross verification mode during training;
step 2, optimizing the YOLOv3 network: adding a cavity convolution group module, a characteristic strengthening module and a channel attention mechanism module into the Neck;
step 3, online data enhancement: randomly selecting the same number of pictures in a training set each time, and inputting the pictures into an optimized YOLOv3 network after online image data of the pictures are enhanced;
step 4, forward reasoning: head in the optimized YOLOv3 network is responsible for deducing object coordinates and classes according to the fused features to obtain frame coordinates, object classes and confidence degrees of surrounding target objects;
step 5, improving a loss function: iteratively training and updating parameters according to the function values, and evaluating on a verification set after each iteration;
and 6, finishing training: and selecting the optimized YOLOv3 network model with the highest detection precision and recall rate on the verification set to load into the network.
The technical scheme of the invention is further improved as follows: in step 1, the data set preprocessing specifically includes: and converting the labeling information of the data set into a VOC format, and performing the following steps of: 1, randomly dividing the training verification set and the test set into a training verification set and a test set, wherein the sets are not interfered with each other and have no same picture, so that data is prevented from being polluted;
and evaluating the YOLOv3 network model in a cross validation mode during training, namely firstly, a training validation set is evaluated according to the following steps of 8: the proportion of 1 is randomly divided into a training set and a verification set, the training set is used for model training and weight updating, and the verification set is used for evaluating a YOLOv3 network model obtained after each training.
The technical scheme of the invention is further improved as follows: in the step 2, the step of the method is carried out,
the cavity convolution group module can adapt to multi-scale image input and enlarge the network receptive field;
the feature enhancement module can fuse shallow features with rich object position information and less semantic information with deep features with rich object semantic information and less position information, and fuse features with different resolutions;
the channel attention mechanism module can eliminate interference, extract object feature information which is more critical to detection from a complex background, and give each channel weight to the features to strengthen global features.
The technical scheme of the invention is further improved as follows: the calculation formula of the channel attention mechanism is as follows:
global average pooling:
Figure BDA0003328308360000031
wherein W, H represents the width and height of the feature pattern, xi,jRepresenting the value of the ith row and jth column point on each channel of the characteristic diagram;
and (3) channel convolution:
Figure BDA0003328308360000032
ω=σ(C1Dk(y))
in the formula, since pooling is performed on the global average, i is 1 and y is equal tojIt is indicated that the j-th channel,
Figure BDA0003328308360000033
denotes yjOf k adjacent channels, αjDenotes the jth channel weight, σ denotes sigmoid (), ωiRepresents the ith weight;
k, solving:
Figure BDA0003328308360000041
where c represents a given number of channels, γ, b are equal to 2, 1 respectively, and odd represents the nearest odd number.
The technical scheme of the invention is further improved as follows: in step 3, the online data enhancement technology comprises photometric distortion, geometric distortion, simulated occlusion and multi-image fusion;
the luminosity distortion mainly changes pixel points of the picture, such as: random brightness variation, random contrast variation, random saturation variation, random chromaticity variation, and random noise addition;
the geometric distortion mainly changes the shape of the picture, such as: random cutting, random rotation and random angle;
the simulated shielding refers to randomly erasing small blocks in the picture, namely setting the pixel points of the small blocks to be completely black;
the multi-image fusion refers to randomly cutting a common part of one image and replacing a part at the same position on the other image, or overlapping two images together and overlapping pixel points.
The technical scheme of the invention is further improved as follows: in step 5, the position loss in the loss function is calculated by using the DIOU, and the distance between the two target frames can be directly minimized by considering three factors of the coincidence degree, the coincidence direction and the position relation of the detection frame and the real frame, so that the convergence is faster;
the modified loss function formula is:
Figure BDA0003328308360000042
in the formula, LossDIOUIs the total DIOU Loss, on a pictureconfiLoss of total confidence in a picture, LossclsN is the target number on a picture, for the total class loss on a picture.
The technical scheme of the invention is further improved as follows: and 6, selecting the model with the highest evaluation precision and recall rate on the verification set to load into the network, and obtaining the detection effect of the model on the test set.
Due to the adoption of the technical scheme, the invention has the technical progress that:
1. the invention can fuse the shallow feature with more position information and the deep feature with more semantic information by using the feature enhancement module, thereby improving the available information quantity of the Head reasoning layer.
2. The invention uses the cavity convolution group module to improve the receptive field without changing the resolution; the channel attention mechanism can be used for enabling the network to extract more detection information from complex background information, and interference elimination is facilitated.
3. After the loss function is improved, the target frame is more fit.
4. The improved YOLOv3 detection network has the advantages of obviously improved performance, more comprehensive detection of targets in remote sensing images, higher precision and improved training speed.
Drawings
FIG. 1 is a primary structural view of YOLOv3 in the present invention;
FIG. 2 is a general scheme of the improvement to YOLOv3 in the present invention;
FIG. 3 is a diagram of an SPP module of the present invention;
FIG. 4 is a block diagram of the RFB module of the present invention;
FIG. 5 is a diagram of a SFM hole convolution group module used in the present invention;
FIG. 6 is a CSP block diagram of the present invention;
FIG. 7 is a block diagram of FEM feature enhancement used in the present invention;
FIG. 8 is an attention block diagram of an ECA channel used in the present invention;
FIG. 9 is a diagram of IOU and DIOU in the present invention.
Detailed Description
The invention is described in further detail below with reference to the following figures and examples:
the invention provides a remote sensing image small target detection method based on improved YOLOv3, which improves training speed and overall accuracy by improving a loss function and adding a feature enhancement module channel attention mechanism module in a YOLOv3 original network (shown in figure 1), and the improved scheme is shown in figure 2.
In this patent application:
SPP is English abbreviation of Spatial farm Pooling;
RFB is English abbreviation of received Field Block;
SFM is English abbreviation of Spatial and Field Model;
CSP is English abbreviation of Cross Stage Partial connections;
FEM is English abbreviation of Feature Enhanced Model;
ECA is English abbreviation of effective Channel Attention;
IOU is English abbreviation of interaction Over Union;
DIOU is an English abbreviation for Distance IOU.
2-9, a remote sensing image small target detection method based on improved YOLOv3 includes randomly dividing a data set into a training verification set and a test set, and evaluating a model in a cross-validation mode during training. And enhancing the online data of the picture set for training, wherein the data comprises geometric distortion, photometric distortion, simulated occlusion and picture fusion. The basic backbone network Darknet-53 is responsible for extracting object features from shallow to deep in the data enhanced picture. Adding a feature enhancement mechanism FEM into the Neck, and fusing features with different resolutions; adding a hole convolution group SFM to adapt to multi-scale input and expand the receptive field; and embedding an ECA channel attention module, and giving weights to all channels of the features to strengthen the global features. Head is responsible for inferring object coordinates and classes from the fused features. During training, comparing the inferred result of the Head with a target label, solving an error according to an error function, wherein the error function improves the original MSE (mean square error) into the DIOU loss, and then updating the parameters to be updated by using an error back propagation method and a random gradient descent method; during testing, NMS processing is carried out on the result of the Head, each prediction frame only corresponds to one target, and the accuracy rate is calculated by comparing all the prediction frames with the real frames; the detection process is similar to the test process except that the accuracy is not calculated any more and the prediction box is displayed directly on the picture.
Examples
A remote sensing image small target detection method based on improved YOLOv3 specifically comprises the following steps:
step 1: acquiring training remote sensing images to form a data set, converting the labeled information of the data set into a VOC format, and carrying out the following steps of: the proportion of 1 is randomly divided into a training verification set and a test set, the sets are not interfered with each other, and the data are prevented from being polluted because of no same picture. Cross validation is adopted during training, namely a training validation set is firstly processed according to the following steps of 8: the proportion of 1 is randomly divided into a training set and a verification set, the training set is used for model training and weight updating, and the verification set is used for model evaluation obtained when each round of training is finished.
Step 2: and (5) improving the Neck. As the Darknet-53 network deepens, the extracted target features become less variable and more abstract from concrete.
In order to adapt to various conditions of inconsistent target positions and sizes at different resolutions, simultaneously expand the receptive field and use the thought of SPP and RFB networks for reference, an SFM is proposed, as shown in FIG. 3, namely, three results obtained by respectively passing input features through three cavity convolution branches with different eccentricities are superposed with the original features on a channel, and finally, the fused features are transmitted to the next module.
The object position information in the shallow layer features is rich, the semantic information is less, the object semantic information in the deep layer features is rich, the position information is less, in order to complement the feature information of the shallow layer features and the deep layer features, the feature enhancement module FEM is provided by using the CSP module thought for reference, and the target information is more comprehensive. The FEM superposes the quadruple down-sampling feature on a channel after passing through a convolution kernel of 3 multiplied by 3 and the octave down-sampling feature, superposes the octave down-sampling feature on the channel after passing through the convolution kernel of 3 multiplied by 3 and the sixteen down-sampling feature, and inputs the two superposed features into the next module after passing through two convolution kernels of 1 multiplied by 1 respectively.
The channel attention mechanism ECA was used after the two superimposed features to expand the viewing range of each feature pixel, with the ECA module shown in fig. 5. Each extracted feature map has a plurality of channels, the features on each channel can be superposed to obtain complete object features, in order to eliminate interference and extract key information, an ECA channel attention mechanism is embedded, channels with important feature components are endowed with heavy weights, features with non-important weights are endowed with small weights, then the weighted sum is obtained, and ECA is shown in figure 5. Neck is the second part of the YOLOv3 detection network, the Neck part labeled in FIG. 1. The modified tack is as noted in the portion of tack in fig. 2.
The ECA module realizes a non-dimensionality-reduction local cross-channel interaction strategy, can self-adaptively select the size of a one-dimensional convolution kernel, and fuses the characteristic information of each adjacent k different channels. Firstly, performing global average pooling on input; then, using a 1 × 1 convolution kernel to complete the channel convolution; secondly, obtaining the weight of each channel through a Sigmoid function after the channel convolution result; and finally, multiplying each layer of the input characteristics by the weight of the corresponding layer. The ECA module is embedded in a position shown in fig. 2, behind two FEM modules, respectively.
The formula for ECA is:
global average pooling:
Figure BDA0003328308360000081
wherein W, H represents the width and height of the feature pattern, xi,jThe value of the ith row and jth column point on each channel of the characteristic diagram。
And (3) channel convolution:
Figure BDA0003328308360000082
ω=σ(C1Dk(y))
in the formula, since the global average pooling is performed, i is 1; y isjRepresents the jth channel;
Figure BDA0003328308360000083
denotes yjA set of k adjacent channels; alpha is alphajRepresents the jth channel weight; σ denotes sigmoid (), ωiRepresenting the ith weight.
k, solving:
Figure BDA0003328308360000084
wherein c represents a given number of channels; gamma and b are equal to 2 and 1 respectively; odd denotes the nearest odd number.
And step 3: the on-line data enhancement technology mainly comprises the steps of photometric distortion, geometric distortion, simulated shielding and multi-image fusion. The luminosity distortion mainly changes the pixel points of the picture, such as: random luminance variation, random contrast variation, random saturation variation, random chrominance variation, adding random noise. Geometric distortion mainly changes the shape of a picture, such as: random cutting, random rotation and random angle. The simulated shielding comprises the following steps: and randomly erasing small blocks in the picture (namely setting the pixel points of the small blocks to be completely black). The multi-image fusion has: the general part of one image is Cut randomly and replaces the part at the same position on the other image, or the two images are overlapped together, and the pixel points are overlapped, and the technology comprises Mix _ Up, Cut _ Mix and style _ transfer _ GAN. The purpose of data enhancement is to: 1. the training data volume is increased, and the generalization capability of the model is improved; 2. and the diversity of the pictures is increased, and the robustness of the model is improved.
And 4, step 4: and (4) carrying out forward reasoning and outputting a result.
The Head YOLOv3 detects a third portion of the network, as noted in fig. 1. And outputting a tensor of S (3 (4+1+20)) according to the Neck fused features, wherein each picture is mapped into S cells, each cell contains three detection results, and each detection result comprises 4 detection box coordinates, 1 confidence coefficient and 20 prediction scores of the category of one object.
And 5: and calculating errors and updating parameters. And (4) a forward derivation part of the YOLOv3 network from step 1 to step 4, solving an error of a derivation result and a corresponding label, reversely propagating the error along a forward path, updating all weight parameters according to the gradient direction, and performing one iteration from the forward direction to the reverse direction. Errors are classified into position errors, confidence errors, and category errors. The position error uses the DIOU loss, and the error calculation mode of other parts is not changed. DIOU can directly minimize the distance of the two object boxes and therefore converge faster.
The IOU calculation formula is:
Figure BDA0003328308360000091
in the formula, A and B are a real frame and a prediction frame, and IOU is the intersection ratio union of the areas of the two frames.
The DIOU calculation formula is as follows:
Figure BDA0003328308360000092
in the formula, ρ2(bA,bB) Is the Euclidean distance between the central points of the two frames; c. C2Representing the euclidean distance around the smallest rectangle diagonal of the two boxes as shown in figure 4.
The DIOU (position loss) penalty for a single prediction box is:
lossDIOU=1-DIOU
the total DIOU loss on a picture is:
Figure BDA0003328308360000101
in the formula, λcoodRepresenting the weight taken up by the DIOU loss.
Total confidence loss on one picture:
Figure BDA0003328308360000102
where the penalties are divided into targeted confidence penalties (first row) and non-targeted confidence penalties (second row),
Figure BDA0003328308360000103
represents the confidence of the jth prediction box of the ith cell,
Figure BDA0003328308360000104
represents the confidence of the jth real box of the ith cell,
Figure BDA0003328308360000105
indicating that when the ith cell has a target, it is 1, and the remaining cases are 0,
Figure BDA0003328308360000106
when the ith cell is the jth prediction box without target, the number is 1, and the other cases are 0, lambdanoobjRepresenting confidence error weights without targets.
Total class loss on one picture:
Figure BDA0003328308360000107
wherein c ∈ classes denotes 20 classes of the object,
Figure BDA0003328308360000108
representing the probability of each category in the jth prediction box of the ith cell,
Figure BDA0003328308360000111
means ith cell jth trueIn the real frame, the probability of each category is 1, and the positions of other categories are 0.
The total loss on one picture is:
Figure BDA0003328308360000112
in the formula, N is the number of targets on one picture.
The initial learning rate is set to 0.0001, the learning decay rate facility is 0.995, Batch _ size is set to 10, the number of save iterations is 2, the number of partial weight update rounds is set to 35, the number of total weight fine adjustment rounds is set to 15, the test score threshold is set to 0.3, the test IOU threshold is set to 0.5, and the input picture size is randomly selected to be 10 to 19 times 32 per several rounds.
The training method adopts a random gradient descent, error reverse propagation and learning rate attenuation method.
The model obtained by each training needs to evaluate the precision and the recall rate on a verification set, and the model with the best evaluation is tested on a test set to obtain the generalization capability of the model.
Step 6: and loading the weight which is evaluated to be the best, inputting the picture, obtaining a plurality of frames by the network at the moment, filtering the frames by using NMS (non-maximum suppression) to obtain the frames with the highest confidence, and displaying the surrounding frames, the target type and the confidence on the picture.

Claims (7)

1. A remote sensing image small target detection method based on improved YOLOv3 is characterized in that: the method comprises the following steps:
step 1, preprocessing a data set: acquiring a training remote sensing image to form a data set, carrying out format conversion on the data set, randomly dividing the data set into a training verification set and a test set, and evaluating a YOLOv3 network model in a cross verification mode during training;
step 2, optimizing the YOLOv3 network: adding a cavity convolution group module, a characteristic strengthening module and a channel attention mechanism module into the Neck;
step 3, online data enhancement: randomly selecting the same number of pictures in a training set each time, and inputting the pictures into an optimized YOLOv3 network after online image data of the pictures are enhanced;
step 4, forward reasoning: head in the optimized YOLOv3 network is responsible for deducing object coordinates and classes according to the fused features to obtain frame coordinates, object classes and confidence degrees of surrounding target objects;
step 5, improving a loss function: iteratively training and updating parameters according to the function values, and evaluating on a verification set after each iteration;
and 6, finishing training: and selecting the optimized YOLOv3 network model with the highest detection precision and recall rate on the verification set to load into the network.
2. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: in step 1, the data set preprocessing specifically includes: and converting the labeling information of the data set into a VOC format, and performing the following steps of: 1, randomly dividing the training verification set and the test set into a training verification set and a test set, wherein the sets are not interfered with each other and have no same picture, so that data is prevented from being polluted;
and evaluating the YOLOv3 network model in a cross validation mode during training, namely firstly, a training validation set is evaluated according to the following steps of 8: the proportion of 1 is randomly divided into a training set and a verification set, the training set is used for model training and weight updating, and the verification set is used for model evaluation obtained after each round of training is finished.
3. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: in the step 2, the step of the method is carried out,
the cavity convolution group module can adapt to multi-scale image input and enlarge the network receptive field;
the feature enhancement module can fuse shallow features with rich object position information and less semantic information with deep features with rich object semantic information and less position information, and fuse features with different resolutions;
the channel attention mechanism module can eliminate interference, extract object feature information which is more critical to detection from a complex background, and give each channel weight to the features to strengthen global features.
4. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 3, wherein: the calculation formula of the channel attention mechanism module is as follows:
global average pooling:
Figure FDA0003328308350000021
wherein W, H represents the width and height of the feature pattern, xi,jRepresenting the value of the ith row and jth column point on each channel of the characteristic diagram;
and (3) channel convolution:
Figure FDA0003328308350000022
ω=σ(C1Dk(y))
in the formula, since pooling is performed on the global average, i is 1 and y is equal tojIt is indicated that the j-th channel,
Figure FDA0003328308350000023
denotes yjOf k adjacent channels, αjDenotes the jth channel weight, σ denotes sigmoid (), ωiRepresents the ith weight;
k, solving:
Figure FDA0003328308350000024
where c represents a given number of channels, γ, b are equal to 2, 1 respectively, and odd represents the nearest odd number.
5. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: in step 3, the online data enhancement technology comprises photometric distortion, geometric distortion, simulated occlusion and multi-image fusion;
the luminosity distortion mainly changes pixel points of the picture, such as: random brightness variation, random contrast variation, random saturation variation, random chromaticity variation, and random noise addition;
the geometric distortion mainly changes the shape of the picture, such as: random cutting, random rotation and random angle;
the simulated shielding refers to randomly erasing small blocks in the picture, namely setting the pixel points of the small blocks to be completely black;
the multi-image fusion refers to randomly cutting a common part of one image and replacing a part at the same position on the other image, or overlapping two images together and overlapping pixel points.
6. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: in step 5, the position loss in the loss function is calculated by using the DIOU, and the distance between the two target frames can be directly minimized by considering three factors of the coincidence degree, the coincidence direction and the position relation of the detection frame and the real frame, so that the convergence is faster;
the modified loss function formula is:
Figure FDA0003328308350000031
in the formula, LossDIOUIs the total DIOU Loss, on a pictureconfiLoss of total confidence in a picture, LossclsN is the target number on a picture, for the total class loss on a picture.
7. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: and 6, selecting the model with the highest evaluation precision and recall rate on the verification set to load into the network, and obtaining the detection effect of the model on the test set.
CN202111269827.9A 2021-10-29 2021-10-29 Remote sensing image small target detection method based on improvement YOLOv3 Active CN113971764B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111269827.9A CN113971764B (en) 2021-10-29 2021-10-29 Remote sensing image small target detection method based on improvement YOLOv3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111269827.9A CN113971764B (en) 2021-10-29 2021-10-29 Remote sensing image small target detection method based on improvement YOLOv3

Publications (2)

Publication Number Publication Date
CN113971764A true CN113971764A (en) 2022-01-25
CN113971764B CN113971764B (en) 2024-05-14

Family

ID=79588953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111269827.9A Active CN113971764B (en) 2021-10-29 2021-10-29 Remote sensing image small target detection method based on improvement YOLOv3

Country Status (1)

Country Link
CN (1) CN113971764B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155365A (en) * 2022-02-07 2022-03-08 北京航空航天大学杭州创新研究院 Model training method, image processing method and related device
CN114692826A (en) * 2022-03-02 2022-07-01 华南理工大学 Light-weight target detection system without prior frame
CN114882241A (en) * 2022-05-20 2022-08-09 东南大学 Target detection method under complex background based on convolution attention mechanism
CN118230079A (en) * 2024-05-27 2024-06-21 中国科学院西安光学精密机械研究所 Detection method for remote sensing small target based on improved YOLO

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723748A (en) * 2020-06-22 2020-09-29 电子科技大学 Infrared remote sensing image ship detection method
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system
CN112529090A (en) * 2020-12-18 2021-03-19 天津大学 Small target detection method based on improved YOLOv3
CN112733749A (en) * 2021-01-14 2021-04-30 青岛科技大学 Real-time pedestrian detection method integrating attention mechanism
CN112819100A (en) * 2021-03-01 2021-05-18 深圳中湾智能科技有限公司 Multi-scale target detection method and device for unmanned aerial vehicle platform
CN112818903A (en) * 2020-12-10 2021-05-18 北京航空航天大学 Small sample remote sensing image target detection method based on meta-learning and cooperative attention
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system
WO2021139069A1 (en) * 2020-01-09 2021-07-15 南京信息工程大学 General target detection method for adaptive attention guidance mechanism
CN111723748A (en) * 2020-06-22 2020-09-29 电子科技大学 Infrared remote sensing image ship detection method
CN112818903A (en) * 2020-12-10 2021-05-18 北京航空航天大学 Small sample remote sensing image target detection method based on meta-learning and cooperative attention
CN112529090A (en) * 2020-12-18 2021-03-19 天津大学 Small target detection method based on improved YOLOv3
CN112733749A (en) * 2021-01-14 2021-04-30 青岛科技大学 Real-time pedestrian detection method integrating attention mechanism
CN112819100A (en) * 2021-03-01 2021-05-18 深圳中湾智能科技有限公司 Multi-scale target detection method and device for unmanned aerial vehicle platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周敏;史振威;丁火平: "遥感图像飞机目标分类的卷积神经网络方法", 中国图象图形学报, vol. 22, no. 5, 31 December 2017 (2017-12-31) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155365A (en) * 2022-02-07 2022-03-08 北京航空航天大学杭州创新研究院 Model training method, image processing method and related device
CN114692826A (en) * 2022-03-02 2022-07-01 华南理工大学 Light-weight target detection system without prior frame
CN114882241A (en) * 2022-05-20 2022-08-09 东南大学 Target detection method under complex background based on convolution attention mechanism
CN118230079A (en) * 2024-05-27 2024-06-21 中国科学院西安光学精密机械研究所 Detection method for remote sensing small target based on improved YOLO
CN118230079B (en) * 2024-05-27 2024-08-30 中国科学院西安光学精密机械研究所 Detection method for remote sensing small target based on improved YOLO

Also Published As

Publication number Publication date
CN113971764B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN111310862B (en) Image enhancement-based deep neural network license plate positioning method in complex environment
CN113569667B (en) Inland ship target identification method and system based on lightweight neural network model
CN113971764B (en) Remote sensing image small target detection method based on improvement YOLOv3
CN111091105A (en) Remote sensing image target detection method based on new frame regression loss function
CN110738697A (en) Monocular depth estimation method based on deep learning
CN111625608B (en) Method and system for generating electronic map according to remote sensing image based on GAN model
CN111652321A (en) Offshore ship detection method based on improved YOLOV3 algorithm
CN110060237A (en) A kind of fault detection method, device, equipment and system
CN111612807A (en) Small target image segmentation method based on scale and edge information
CN107424159A (en) Image, semantic dividing method based on super-pixel edge and full convolutional network
CN114663346A (en) Strip steel surface defect detection method based on improved YOLOv5 network
CN111523553A (en) Central point network multi-target detection method based on similarity matrix
CN115222946B (en) Single-stage instance image segmentation method and device and computer equipment
CN111127472B (en) Multi-scale image segmentation method based on weight learning
CN116645592B (en) Crack detection method based on image processing and storage medium
CN117237808A (en) Remote sensing image target detection method and system based on ODC-YOLO network
CN113743417A (en) Semantic segmentation method and semantic segmentation device
CN107341440A (en) Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning
CN117372898A (en) Unmanned aerial vehicle aerial image target detection method based on improved yolov8
CN115238758A (en) Multi-task three-dimensional target detection method based on point cloud feature enhancement
CN117292117A (en) Small target detection method based on attention mechanism
CN117437615A (en) Foggy day traffic sign detection method and device, storage medium and electronic equipment
CN113159158A (en) License plate correction and reconstruction method and system based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant