CN113971764A - Remote sensing image small target detection method based on improved YOLOv3 - Google Patents
Remote sensing image small target detection method based on improved YOLOv3 Download PDFInfo
- Publication number
- CN113971764A CN113971764A CN202111269827.9A CN202111269827A CN113971764A CN 113971764 A CN113971764 A CN 113971764A CN 202111269827 A CN202111269827 A CN 202111269827A CN 113971764 A CN113971764 A CN 113971764A
- Authority
- CN
- China
- Prior art keywords
- training
- yolov3
- remote sensing
- network
- picture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 47
- 238000012795 verification Methods 0.000 claims abstract description 24
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000005728 strengthening Methods 0.000 claims abstract description 3
- 238000000034 method Methods 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 15
- 238000010586 diagram Methods 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000002790 cross-validation Methods 0.000 claims description 4
- 238000010200 validation analysis Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 claims description 2
- 238000002372 labelling Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 abstract description 13
- 238000013135 deep learning Methods 0.000 abstract description 3
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 238000005070 sampling Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a remote sensing image small target detection method based on improved YOLOv3, which belongs to the technical field of deep learning and target detection and comprises data set preprocessing; optimizing a YOLOv3 network, and adding a cavity convolution group module, a feature enhancement module and a channel attention mechanism module into the Neck; enhancing online data; forward reasoning; improving a loss function; and selecting a YOLOv3 network model with the highest detection precision and recall rate on the verification set to load into the network. According to the invention, the cavity convolution group module, the characteristic strengthening module and the channel attention mechanism module are added in the original YOLOv3 network to improve the YOLOv3 detection network by improving the loss function, so that the performance is obviously improved, the target detection in the remote sensing image is more comprehensive and higher in precision, and the training speed and the overall detection precision are improved.
Description
Technical Field
The invention relates to the field of deep learning and target detection, in particular to a remote sensing image small target detection method based on improved YOLOv 3.
Background
With the development of deep learning and neural networks, computer vision has developed at a rapid pace. In the field, target detection and identification technologies are widely researched and applied to practice, and great convenience is brought to life of people. For example, the method is applied to unmanned aerial vehicles, can automatically identify specific targets in remote sensing images, and can replace manual work to efficiently complete repeated work and the like. However, in many target detection efforts, the following problems exist:
1. the target is mostly small in size and only has dozens of pixel points, so that the target is not beneficial to searching and identifying;
2. the background is complicated, and interference factor is many, if shoot the angle, illumination change, similar target, object shelter from scheduling problem, lead to erroneous judgement easily, be unfavorable for the detection.
After comparing several conventional target detection networks commonly used at present, a YOLOv3(You Only Look one v3) detection algorithm is selected and used, the algorithm detects a speed block, the recognition accuracy is high, and better detection performance can be obtained by improving on the basis of the algorithm. The improvement idea of the network is as follows:
1. in the basic backbone network, as the depth increases, although the field of view is enlarged after each down-sampling, the image size is reduced, the resolution is reduced, and even though the up-sampling is performed, a certain target feature is lost. This can be solved by hole convolution: the receptive field is enlarged, and the model can observe a larger range in the picture, so that the target can be detected more comprehensively; larger size feature maps will contain more target information, which is beneficial for localization and classification.
2. When people look at things, the people focus on interested parts after observing the whole area, and more useful information is obtained from a complex background, namely an attention mechanism algorithm. Computer vision has many similarities with human vision, and the basic idea is to let machines learn to eliminate influencing factors and capture key information. The algorithm is used for detection, and the accuracy is improved to a certain extent.
3. In terms of a loss function between a real box and a prediction box of YOLOv3, the coordinate loss adopts the sum of the squares of the mean values, the confidence coefficient loss and the category loss adopt cross entropy, and the two are added to obtain an overall error. When the method is used for calculating the loss, the position relation and the contact ratio of the two frames and the direction in which the predicted frames need to be closed cannot be comprehensively reflected, so that a loss function needs to be improved.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a remote sensing image small target detection method based on improved YOLOv3, the improved YOLOv3 detection network has obviously improved performance, more comprehensive target detection in the remote sensing image, higher precision and improved training speed.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a remote sensing image small target detection method based on improved YOLOv3 comprises the following steps:
and 6, finishing training: and selecting the optimized YOLOv3 network model with the highest detection precision and recall rate on the verification set to load into the network.
The technical scheme of the invention is further improved as follows: in step 1, the data set preprocessing specifically includes: and converting the labeling information of the data set into a VOC format, and performing the following steps of: 1, randomly dividing the training verification set and the test set into a training verification set and a test set, wherein the sets are not interfered with each other and have no same picture, so that data is prevented from being polluted;
and evaluating the YOLOv3 network model in a cross validation mode during training, namely firstly, a training validation set is evaluated according to the following steps of 8: the proportion of 1 is randomly divided into a training set and a verification set, the training set is used for model training and weight updating, and the verification set is used for evaluating a YOLOv3 network model obtained after each training.
The technical scheme of the invention is further improved as follows: in the step 2, the step of the method is carried out,
the cavity convolution group module can adapt to multi-scale image input and enlarge the network receptive field;
the feature enhancement module can fuse shallow features with rich object position information and less semantic information with deep features with rich object semantic information and less position information, and fuse features with different resolutions;
the channel attention mechanism module can eliminate interference, extract object feature information which is more critical to detection from a complex background, and give each channel weight to the features to strengthen global features.
The technical scheme of the invention is further improved as follows: the calculation formula of the channel attention mechanism is as follows:
global average pooling:
wherein W, H represents the width and height of the feature pattern, xi,jRepresenting the value of the ith row and jth column point on each channel of the characteristic diagram;
and (3) channel convolution:
ω=σ(C1Dk(y))
in the formula, since pooling is performed on the global average, i is 1 and y is equal tojIt is indicated that the j-th channel,denotes yjOf k adjacent channels, αjDenotes the jth channel weight, σ denotes sigmoid (), ωiRepresents the ith weight;
k, solving:
where c represents a given number of channels, γ, b are equal to 2, 1 respectively, and odd represents the nearest odd number.
The technical scheme of the invention is further improved as follows: in step 3, the online data enhancement technology comprises photometric distortion, geometric distortion, simulated occlusion and multi-image fusion;
the luminosity distortion mainly changes pixel points of the picture, such as: random brightness variation, random contrast variation, random saturation variation, random chromaticity variation, and random noise addition;
the geometric distortion mainly changes the shape of the picture, such as: random cutting, random rotation and random angle;
the simulated shielding refers to randomly erasing small blocks in the picture, namely setting the pixel points of the small blocks to be completely black;
the multi-image fusion refers to randomly cutting a common part of one image and replacing a part at the same position on the other image, or overlapping two images together and overlapping pixel points.
The technical scheme of the invention is further improved as follows: in step 5, the position loss in the loss function is calculated by using the DIOU, and the distance between the two target frames can be directly minimized by considering three factors of the coincidence degree, the coincidence direction and the position relation of the detection frame and the real frame, so that the convergence is faster;
the modified loss function formula is:
in the formula, LossDIOUIs the total DIOU Loss, on a pictureconfiLoss of total confidence in a picture, LossclsN is the target number on a picture, for the total class loss on a picture.
The technical scheme of the invention is further improved as follows: and 6, selecting the model with the highest evaluation precision and recall rate on the verification set to load into the network, and obtaining the detection effect of the model on the test set.
Due to the adoption of the technical scheme, the invention has the technical progress that:
1. the invention can fuse the shallow feature with more position information and the deep feature with more semantic information by using the feature enhancement module, thereby improving the available information quantity of the Head reasoning layer.
2. The invention uses the cavity convolution group module to improve the receptive field without changing the resolution; the channel attention mechanism can be used for enabling the network to extract more detection information from complex background information, and interference elimination is facilitated.
3. After the loss function is improved, the target frame is more fit.
4. The improved YOLOv3 detection network has the advantages of obviously improved performance, more comprehensive detection of targets in remote sensing images, higher precision and improved training speed.
Drawings
FIG. 1 is a primary structural view of YOLOv3 in the present invention;
FIG. 2 is a general scheme of the improvement to YOLOv3 in the present invention;
FIG. 3 is a diagram of an SPP module of the present invention;
FIG. 4 is a block diagram of the RFB module of the present invention;
FIG. 5 is a diagram of a SFM hole convolution group module used in the present invention;
FIG. 6 is a CSP block diagram of the present invention;
FIG. 7 is a block diagram of FEM feature enhancement used in the present invention;
FIG. 8 is an attention block diagram of an ECA channel used in the present invention;
FIG. 9 is a diagram of IOU and DIOU in the present invention.
Detailed Description
The invention is described in further detail below with reference to the following figures and examples:
the invention provides a remote sensing image small target detection method based on improved YOLOv3, which improves training speed and overall accuracy by improving a loss function and adding a feature enhancement module channel attention mechanism module in a YOLOv3 original network (shown in figure 1), and the improved scheme is shown in figure 2.
In this patent application:
SPP is English abbreviation of Spatial farm Pooling;
RFB is English abbreviation of received Field Block;
SFM is English abbreviation of Spatial and Field Model;
CSP is English abbreviation of Cross Stage Partial connections;
FEM is English abbreviation of Feature Enhanced Model;
ECA is English abbreviation of effective Channel Attention;
IOU is English abbreviation of interaction Over Union;
DIOU is an English abbreviation for Distance IOU.
2-9, a remote sensing image small target detection method based on improved YOLOv3 includes randomly dividing a data set into a training verification set and a test set, and evaluating a model in a cross-validation mode during training. And enhancing the online data of the picture set for training, wherein the data comprises geometric distortion, photometric distortion, simulated occlusion and picture fusion. The basic backbone network Darknet-53 is responsible for extracting object features from shallow to deep in the data enhanced picture. Adding a feature enhancement mechanism FEM into the Neck, and fusing features with different resolutions; adding a hole convolution group SFM to adapt to multi-scale input and expand the receptive field; and embedding an ECA channel attention module, and giving weights to all channels of the features to strengthen the global features. Head is responsible for inferring object coordinates and classes from the fused features. During training, comparing the inferred result of the Head with a target label, solving an error according to an error function, wherein the error function improves the original MSE (mean square error) into the DIOU loss, and then updating the parameters to be updated by using an error back propagation method and a random gradient descent method; during testing, NMS processing is carried out on the result of the Head, each prediction frame only corresponds to one target, and the accuracy rate is calculated by comparing all the prediction frames with the real frames; the detection process is similar to the test process except that the accuracy is not calculated any more and the prediction box is displayed directly on the picture.
Examples
A remote sensing image small target detection method based on improved YOLOv3 specifically comprises the following steps:
step 1: acquiring training remote sensing images to form a data set, converting the labeled information of the data set into a VOC format, and carrying out the following steps of: the proportion of 1 is randomly divided into a training verification set and a test set, the sets are not interfered with each other, and the data are prevented from being polluted because of no same picture. Cross validation is adopted during training, namely a training validation set is firstly processed according to the following steps of 8: the proportion of 1 is randomly divided into a training set and a verification set, the training set is used for model training and weight updating, and the verification set is used for model evaluation obtained when each round of training is finished.
Step 2: and (5) improving the Neck. As the Darknet-53 network deepens, the extracted target features become less variable and more abstract from concrete.
In order to adapt to various conditions of inconsistent target positions and sizes at different resolutions, simultaneously expand the receptive field and use the thought of SPP and RFB networks for reference, an SFM is proposed, as shown in FIG. 3, namely, three results obtained by respectively passing input features through three cavity convolution branches with different eccentricities are superposed with the original features on a channel, and finally, the fused features are transmitted to the next module.
The object position information in the shallow layer features is rich, the semantic information is less, the object semantic information in the deep layer features is rich, the position information is less, in order to complement the feature information of the shallow layer features and the deep layer features, the feature enhancement module FEM is provided by using the CSP module thought for reference, and the target information is more comprehensive. The FEM superposes the quadruple down-sampling feature on a channel after passing through a convolution kernel of 3 multiplied by 3 and the octave down-sampling feature, superposes the octave down-sampling feature on the channel after passing through the convolution kernel of 3 multiplied by 3 and the sixteen down-sampling feature, and inputs the two superposed features into the next module after passing through two convolution kernels of 1 multiplied by 1 respectively.
The channel attention mechanism ECA was used after the two superimposed features to expand the viewing range of each feature pixel, with the ECA module shown in fig. 5. Each extracted feature map has a plurality of channels, the features on each channel can be superposed to obtain complete object features, in order to eliminate interference and extract key information, an ECA channel attention mechanism is embedded, channels with important feature components are endowed with heavy weights, features with non-important weights are endowed with small weights, then the weighted sum is obtained, and ECA is shown in figure 5. Neck is the second part of the YOLOv3 detection network, the Neck part labeled in FIG. 1. The modified tack is as noted in the portion of tack in fig. 2.
The ECA module realizes a non-dimensionality-reduction local cross-channel interaction strategy, can self-adaptively select the size of a one-dimensional convolution kernel, and fuses the characteristic information of each adjacent k different channels. Firstly, performing global average pooling on input; then, using a 1 × 1 convolution kernel to complete the channel convolution; secondly, obtaining the weight of each channel through a Sigmoid function after the channel convolution result; and finally, multiplying each layer of the input characteristics by the weight of the corresponding layer. The ECA module is embedded in a position shown in fig. 2, behind two FEM modules, respectively.
The formula for ECA is:
global average pooling:
wherein W, H represents the width and height of the feature pattern, xi,jThe value of the ith row and jth column point on each channel of the characteristic diagram。
And (3) channel convolution:
ω=σ(C1Dk(y))
in the formula, since the global average pooling is performed, i is 1; y isjRepresents the jth channel;denotes yjA set of k adjacent channels; alpha is alphajRepresents the jth channel weight; σ denotes sigmoid (), ωiRepresenting the ith weight.
k, solving:
wherein c represents a given number of channels; gamma and b are equal to 2 and 1 respectively; odd denotes the nearest odd number.
And step 3: the on-line data enhancement technology mainly comprises the steps of photometric distortion, geometric distortion, simulated shielding and multi-image fusion. The luminosity distortion mainly changes the pixel points of the picture, such as: random luminance variation, random contrast variation, random saturation variation, random chrominance variation, adding random noise. Geometric distortion mainly changes the shape of a picture, such as: random cutting, random rotation and random angle. The simulated shielding comprises the following steps: and randomly erasing small blocks in the picture (namely setting the pixel points of the small blocks to be completely black). The multi-image fusion has: the general part of one image is Cut randomly and replaces the part at the same position on the other image, or the two images are overlapped together, and the pixel points are overlapped, and the technology comprises Mix _ Up, Cut _ Mix and style _ transfer _ GAN. The purpose of data enhancement is to: 1. the training data volume is increased, and the generalization capability of the model is improved; 2. and the diversity of the pictures is increased, and the robustness of the model is improved.
And 4, step 4: and (4) carrying out forward reasoning and outputting a result.
The Head YOLOv3 detects a third portion of the network, as noted in fig. 1. And outputting a tensor of S (3 (4+1+20)) according to the Neck fused features, wherein each picture is mapped into S cells, each cell contains three detection results, and each detection result comprises 4 detection box coordinates, 1 confidence coefficient and 20 prediction scores of the category of one object.
And 5: and calculating errors and updating parameters. And (4) a forward derivation part of the YOLOv3 network from step 1 to step 4, solving an error of a derivation result and a corresponding label, reversely propagating the error along a forward path, updating all weight parameters according to the gradient direction, and performing one iteration from the forward direction to the reverse direction. Errors are classified into position errors, confidence errors, and category errors. The position error uses the DIOU loss, and the error calculation mode of other parts is not changed. DIOU can directly minimize the distance of the two object boxes and therefore converge faster.
The IOU calculation formula is:
in the formula, A and B are a real frame and a prediction frame, and IOU is the intersection ratio union of the areas of the two frames.
The DIOU calculation formula is as follows:
in the formula, ρ2(bA,bB) Is the Euclidean distance between the central points of the two frames; c. C2Representing the euclidean distance around the smallest rectangle diagonal of the two boxes as shown in figure 4.
The DIOU (position loss) penalty for a single prediction box is:
lossDIOU=1-DIOU
the total DIOU loss on a picture is:
in the formula, λcoodRepresenting the weight taken up by the DIOU loss.
Total confidence loss on one picture:
where the penalties are divided into targeted confidence penalties (first row) and non-targeted confidence penalties (second row),represents the confidence of the jth prediction box of the ith cell,represents the confidence of the jth real box of the ith cell,indicating that when the ith cell has a target, it is 1, and the remaining cases are 0,when the ith cell is the jth prediction box without target, the number is 1, and the other cases are 0, lambdanoobjRepresenting confidence error weights without targets.
Total class loss on one picture:
wherein c ∈ classes denotes 20 classes of the object,representing the probability of each category in the jth prediction box of the ith cell,means ith cell jth trueIn the real frame, the probability of each category is 1, and the positions of other categories are 0.
The total loss on one picture is:
in the formula, N is the number of targets on one picture.
The initial learning rate is set to 0.0001, the learning decay rate facility is 0.995, Batch _ size is set to 10, the number of save iterations is 2, the number of partial weight update rounds is set to 35, the number of total weight fine adjustment rounds is set to 15, the test score threshold is set to 0.3, the test IOU threshold is set to 0.5, and the input picture size is randomly selected to be 10 to 19 times 32 per several rounds.
The training method adopts a random gradient descent, error reverse propagation and learning rate attenuation method.
The model obtained by each training needs to evaluate the precision and the recall rate on a verification set, and the model with the best evaluation is tested on a test set to obtain the generalization capability of the model.
Step 6: and loading the weight which is evaluated to be the best, inputting the picture, obtaining a plurality of frames by the network at the moment, filtering the frames by using NMS (non-maximum suppression) to obtain the frames with the highest confidence, and displaying the surrounding frames, the target type and the confidence on the picture.
Claims (7)
1. A remote sensing image small target detection method based on improved YOLOv3 is characterized in that: the method comprises the following steps:
step 1, preprocessing a data set: acquiring a training remote sensing image to form a data set, carrying out format conversion on the data set, randomly dividing the data set into a training verification set and a test set, and evaluating a YOLOv3 network model in a cross verification mode during training;
step 2, optimizing the YOLOv3 network: adding a cavity convolution group module, a characteristic strengthening module and a channel attention mechanism module into the Neck;
step 3, online data enhancement: randomly selecting the same number of pictures in a training set each time, and inputting the pictures into an optimized YOLOv3 network after online image data of the pictures are enhanced;
step 4, forward reasoning: head in the optimized YOLOv3 network is responsible for deducing object coordinates and classes according to the fused features to obtain frame coordinates, object classes and confidence degrees of surrounding target objects;
step 5, improving a loss function: iteratively training and updating parameters according to the function values, and evaluating on a verification set after each iteration;
and 6, finishing training: and selecting the optimized YOLOv3 network model with the highest detection precision and recall rate on the verification set to load into the network.
2. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: in step 1, the data set preprocessing specifically includes: and converting the labeling information of the data set into a VOC format, and performing the following steps of: 1, randomly dividing the training verification set and the test set into a training verification set and a test set, wherein the sets are not interfered with each other and have no same picture, so that data is prevented from being polluted;
and evaluating the YOLOv3 network model in a cross validation mode during training, namely firstly, a training validation set is evaluated according to the following steps of 8: the proportion of 1 is randomly divided into a training set and a verification set, the training set is used for model training and weight updating, and the verification set is used for model evaluation obtained after each round of training is finished.
3. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: in the step 2, the step of the method is carried out,
the cavity convolution group module can adapt to multi-scale image input and enlarge the network receptive field;
the feature enhancement module can fuse shallow features with rich object position information and less semantic information with deep features with rich object semantic information and less position information, and fuse features with different resolutions;
the channel attention mechanism module can eliminate interference, extract object feature information which is more critical to detection from a complex background, and give each channel weight to the features to strengthen global features.
4. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 3, wherein: the calculation formula of the channel attention mechanism module is as follows:
global average pooling:
wherein W, H represents the width and height of the feature pattern, xi,jRepresenting the value of the ith row and jth column point on each channel of the characteristic diagram;
and (3) channel convolution:
ω=σ(C1Dk(y))
in the formula, since pooling is performed on the global average, i is 1 and y is equal tojIt is indicated that the j-th channel,denotes yjOf k adjacent channels, αjDenotes the jth channel weight, σ denotes sigmoid (), ωiRepresents the ith weight;
k, solving:
where c represents a given number of channels, γ, b are equal to 2, 1 respectively, and odd represents the nearest odd number.
5. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: in step 3, the online data enhancement technology comprises photometric distortion, geometric distortion, simulated occlusion and multi-image fusion;
the luminosity distortion mainly changes pixel points of the picture, such as: random brightness variation, random contrast variation, random saturation variation, random chromaticity variation, and random noise addition;
the geometric distortion mainly changes the shape of the picture, such as: random cutting, random rotation and random angle;
the simulated shielding refers to randomly erasing small blocks in the picture, namely setting the pixel points of the small blocks to be completely black;
the multi-image fusion refers to randomly cutting a common part of one image and replacing a part at the same position on the other image, or overlapping two images together and overlapping pixel points.
6. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: in step 5, the position loss in the loss function is calculated by using the DIOU, and the distance between the two target frames can be directly minimized by considering three factors of the coincidence degree, the coincidence direction and the position relation of the detection frame and the real frame, so that the convergence is faster;
the modified loss function formula is:
in the formula, LossDIOUIs the total DIOU Loss, on a pictureconfiLoss of total confidence in a picture, LossclsN is the target number on a picture, for the total class loss on a picture.
7. The method for detecting the small target of the remote sensing image based on the improved YOLOv3 as claimed in claim 1, wherein: and 6, selecting the model with the highest evaluation precision and recall rate on the verification set to load into the network, and obtaining the detection effect of the model on the test set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111269827.9A CN113971764B (en) | 2021-10-29 | 2021-10-29 | Remote sensing image small target detection method based on improvement YOLOv3 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111269827.9A CN113971764B (en) | 2021-10-29 | 2021-10-29 | Remote sensing image small target detection method based on improvement YOLOv3 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113971764A true CN113971764A (en) | 2022-01-25 |
CN113971764B CN113971764B (en) | 2024-05-14 |
Family
ID=79588953
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111269827.9A Active CN113971764B (en) | 2021-10-29 | 2021-10-29 | Remote sensing image small target detection method based on improvement YOLOv3 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113971764B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114155365A (en) * | 2022-02-07 | 2022-03-08 | 北京航空航天大学杭州创新研究院 | Model training method, image processing method and related device |
CN114692826A (en) * | 2022-03-02 | 2022-07-01 | 华南理工大学 | Light-weight target detection system without prior frame |
CN114882241A (en) * | 2022-05-20 | 2022-08-09 | 东南大学 | Target detection method under complex background based on convolution attention mechanism |
CN118230079A (en) * | 2024-05-27 | 2024-06-21 | 中国科学院西安光学精密机械研究所 | Detection method for remote sensing small target based on improved YOLO |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723748A (en) * | 2020-06-22 | 2020-09-29 | 电子科技大学 | Infrared remote sensing image ship detection method |
WO2020215236A1 (en) * | 2019-04-24 | 2020-10-29 | 哈尔滨工业大学(深圳) | Image semantic segmentation method and system |
CN112529090A (en) * | 2020-12-18 | 2021-03-19 | 天津大学 | Small target detection method based on improved YOLOv3 |
CN112733749A (en) * | 2021-01-14 | 2021-04-30 | 青岛科技大学 | Real-time pedestrian detection method integrating attention mechanism |
CN112819100A (en) * | 2021-03-01 | 2021-05-18 | 深圳中湾智能科技有限公司 | Multi-scale target detection method and device for unmanned aerial vehicle platform |
CN112818903A (en) * | 2020-12-10 | 2021-05-18 | 北京航空航天大学 | Small sample remote sensing image target detection method based on meta-learning and cooperative attention |
WO2021139069A1 (en) * | 2020-01-09 | 2021-07-15 | 南京信息工程大学 | General target detection method for adaptive attention guidance mechanism |
-
2021
- 2021-10-29 CN CN202111269827.9A patent/CN113971764B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020215236A1 (en) * | 2019-04-24 | 2020-10-29 | 哈尔滨工业大学(深圳) | Image semantic segmentation method and system |
WO2021139069A1 (en) * | 2020-01-09 | 2021-07-15 | 南京信息工程大学 | General target detection method for adaptive attention guidance mechanism |
CN111723748A (en) * | 2020-06-22 | 2020-09-29 | 电子科技大学 | Infrared remote sensing image ship detection method |
CN112818903A (en) * | 2020-12-10 | 2021-05-18 | 北京航空航天大学 | Small sample remote sensing image target detection method based on meta-learning and cooperative attention |
CN112529090A (en) * | 2020-12-18 | 2021-03-19 | 天津大学 | Small target detection method based on improved YOLOv3 |
CN112733749A (en) * | 2021-01-14 | 2021-04-30 | 青岛科技大学 | Real-time pedestrian detection method integrating attention mechanism |
CN112819100A (en) * | 2021-03-01 | 2021-05-18 | 深圳中湾智能科技有限公司 | Multi-scale target detection method and device for unmanned aerial vehicle platform |
Non-Patent Citations (1)
Title |
---|
周敏;史振威;丁火平: "遥感图像飞机目标分类的卷积神经网络方法", 中国图象图形学报, vol. 22, no. 5, 31 December 2017 (2017-12-31) * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114155365A (en) * | 2022-02-07 | 2022-03-08 | 北京航空航天大学杭州创新研究院 | Model training method, image processing method and related device |
CN114692826A (en) * | 2022-03-02 | 2022-07-01 | 华南理工大学 | Light-weight target detection system without prior frame |
CN114882241A (en) * | 2022-05-20 | 2022-08-09 | 东南大学 | Target detection method under complex background based on convolution attention mechanism |
CN118230079A (en) * | 2024-05-27 | 2024-06-21 | 中国科学院西安光学精密机械研究所 | Detection method for remote sensing small target based on improved YOLO |
CN118230079B (en) * | 2024-05-27 | 2024-08-30 | 中国科学院西安光学精密机械研究所 | Detection method for remote sensing small target based on improved YOLO |
Also Published As
Publication number | Publication date |
---|---|
CN113971764B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN107945204B (en) | Pixel-level image matting method based on generation countermeasure network | |
CN111310862B (en) | Image enhancement-based deep neural network license plate positioning method in complex environment | |
CN113569667B (en) | Inland ship target identification method and system based on lightweight neural network model | |
CN113971764B (en) | Remote sensing image small target detection method based on improvement YOLOv3 | |
CN111091105A (en) | Remote sensing image target detection method based on new frame regression loss function | |
CN110738697A (en) | Monocular depth estimation method based on deep learning | |
CN111625608B (en) | Method and system for generating electronic map according to remote sensing image based on GAN model | |
CN111652321A (en) | Offshore ship detection method based on improved YOLOV3 algorithm | |
CN110060237A (en) | A kind of fault detection method, device, equipment and system | |
CN111612807A (en) | Small target image segmentation method based on scale and edge information | |
CN107424159A (en) | Image, semantic dividing method based on super-pixel edge and full convolutional network | |
CN114663346A (en) | Strip steel surface defect detection method based on improved YOLOv5 network | |
CN111523553A (en) | Central point network multi-target detection method based on similarity matrix | |
CN115222946B (en) | Single-stage instance image segmentation method and device and computer equipment | |
CN111127472B (en) | Multi-scale image segmentation method based on weight learning | |
CN116645592B (en) | Crack detection method based on image processing and storage medium | |
CN117237808A (en) | Remote sensing image target detection method and system based on ODC-YOLO network | |
CN113743417A (en) | Semantic segmentation method and semantic segmentation device | |
CN107341440A (en) | Indoor RGB D scene image recognition methods based on multitask measurement Multiple Kernel Learning | |
CN117372898A (en) | Unmanned aerial vehicle aerial image target detection method based on improved yolov8 | |
CN115238758A (en) | Multi-task three-dimensional target detection method based on point cloud feature enhancement | |
CN117292117A (en) | Small target detection method based on attention mechanism | |
CN117437615A (en) | Foggy day traffic sign detection method and device, storage medium and electronic equipment | |
CN113159158A (en) | License plate correction and reconstruction method and system based on generation countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |