CN115223009A - Small target detection method and device based on improved YOLOv5 - Google Patents

Small target detection method and device based on improved YOLOv5 Download PDF

Info

Publication number
CN115223009A
CN115223009A CN202210780605.1A CN202210780605A CN115223009A CN 115223009 A CN115223009 A CN 115223009A CN 202210780605 A CN202210780605 A CN 202210780605A CN 115223009 A CN115223009 A CN 115223009A
Authority
CN
China
Prior art keywords
module
network model
csp
improved
yolov5
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210780605.1A
Other languages
Chinese (zh)
Inventor
马显龙
郭晨鋆
周帅
曹占国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Original Assignee
Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of Yunnan Power Grid Co Ltd filed Critical Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority to CN202210780605.1A priority Critical patent/CN115223009A/en
Publication of CN115223009A publication Critical patent/CN115223009A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a small target detection method and a device based on improved YOLOv5, wherein the method comprises the steps of improving a YOLOv5 network model to obtain an improved YOLOv5 network model; acquiring a small target image data set, and dividing the image data set into a training sample set and a test sample set; inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight; and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result. According to the invention, the existing YOLOv5 network model is improved, so that the accuracy and instantaneity of detection work are improved, the human resource loss is reduced, and the monitoring efficiency is effectively improved.

Description

Small target detection method and device based on improved YOLOv5
Technical Field
The invention belongs to the technical field of detection, and particularly relates to a small target detection method and device based on improved YOLOv 5.
Background
At present, the economic development and the social production do not leave the safe and stable supply of electric power, the quality and the safety of power utilization are taken charge in order to ensure the normal power utilization in life and work of people, and the safe operation of a power grid is the basic guarantee of power supply of a power system. Because the city develops rapidly, many small-size transformer substations are in the city center, and this inevitable meeting is the foreign matter that comes in the air, and the most common is small-size foreign matter such as unmanned aerial vehicle and kite. The electric power system is an important infrastructure for supporting the development of the economic society and guaranteeing the basic civilian life, and the normal operation of the modern society highly depends on reliable supply of electric power. Therefore, in order to ensure the safety of the transformer substation and ensure that the operating environment can operate reliably, efficiently and stably, the foreign matter invasion must be effectively monitored and detected so as to know the problems of the transformer substation and find the problems in time so as to solve the problems as early as possible.
In the related art, the existing small target detection method based on the improved YOLOv5 generally has the problems of low detection precision and low speed.
Disclosure of Invention
In view of the above, the present invention is to provide a method and an apparatus for detecting a small target based on improved yoolov 5, so as to solve the problems of low detection accuracy and low speed of the method for detecting a small target based on improved yoolov 5 in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme: a small target detection method based on improved YOLOv5 comprises the following steps:
improving the YOLOv5 network model to obtain an improved YOLOv5 network model;
acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set;
inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight;
and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.
Further, the YOLOv5 network model includes: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps:
adding a first Transformer module in the backbone network, and adding a plurality of CBAM modules and a second Transformer module in the Neck network.
Further, the backbone network further includes:
the system comprises a Focus module, a first CSP module, a second CSP module, a third CSP module and an SPP module;
therefore, the input end of the Focus module is connected with the input end of the Focus module, the output end of the Focus module is connected with the input end of the first CSP module, the output end of the first CSP module is connected with the input end of the second CSP module, the output end of the second CSP module is connected with the input end of the third CSP module, the output end of the third CSP module is connected with the input end of the SPP module, the output end of the SPP module is connected with the input end of the first Transformer module, and the output end of the first Transformer module is connected with the Neck network.
Furthermore, the number of the CBAM modules is three, namely a first CBAM module, a second CBAM module and a third CBAM module; the hack network further comprises: a fourth CSP module, a fifth CSP module, a sixth CSP module, a first Concat module, a second Concat module, a third Concat module and a fourth Concat module;
the output end of the first Transformer module is connected with the input end of the fourth CSP module through a first Concat module after being subjected to upsampling, and the output end of the fourth CSP module is connected with a fifth CSP module through a second Concat module after being subjected to upsampling;
the input end of the first CBAM module is connected with the output end of the second CSP module and the output end of the third CSP module, and the output end of the first CBAM module is connected with the fifth CSP module through the second Concat module;
the input end of the second CBAM module is connected with the output end of the fourth CSP module, and the output end of the second CBAM module is connected with the output end of the fifth CSP module and then is sent to the sixth CSP module through the third Concat module;
the input end of the third CBAM module is connected with the output end of the first Transformer module, and the output end of the third CBAM module is connected with the output end of the sixth CSP module and then is sent to the second Transformer module through the fourth Concat module;
and the output ends of the fifth CSP module, the sixth CSP module and the second transform module are all connected with the input end of the prediction layer.
Further, the method also comprises the following steps: determining the detection precision of the improved YOLOv5 network model; the method comprises the following steps:
obtaining evaluation indexes of an improved YOLOv5 network model, wherein the evaluation indexes comprise accuracy, recall rate, average mean accuracy and average frame rate;
and determining the detection precision of the improved YOLOv5 network model according to the evaluation index.
Further, the first or second Transformer module comprises:
the multi-head attention layer is connected with the full connection layer in a residual mode.
Further, before inputting the training sample set into the improved YOLOv5 network model for training, the method further includes:
and performing data enhancement on the training sample set and the testing sample set by adopting a Mosaic-9 data enhancement mode.
Further, the image size of the training sample set and the test sample set is 608 × 608.
The embodiment of the present application provides a small target detection device based on improved YOLOv5, including:
the improvement module is used for improving the YOLOv5 network model to obtain an improved YOLOv5 network model;
the acquisition module is used for acquiring a small target image data set and dividing the image data set into a training sample set and a test sample set;
the training module is used for inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight;
and the detection module is used for inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.
Further, the YOLOv5 network model includes: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps:
adding a first Transformer module in the backbone network, and adding a plurality of CBAM modules and a second Transformer module in the hack network.
By adopting the technical scheme, the invention can achieve the following beneficial effects:
the invention provides a small target detection method and a device based on improved YOLOv5, which comprises the steps of improving a YOLOv5 network model to obtain an improved YOLOv5 network model; acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set; inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight; and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result. According to the method and the device, the existing YOLOv5 network model is improved, so that the accuracy and the instantaneity of detection work are improved, the human resource loss is reduced, the accuracy of detection of small targets is effectively improved, and the monitoring efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of the steps of the improved YOLOv 5-based small target detection method of the present invention;
FIG. 2 is a schematic diagram of the Mosaic-9 data enhancement provided by the present invention;
FIG. 3 is a schematic structural diagram of an improved YOLOv5 network model provided by the present invention;
FIG. 4 is a schematic structural diagram of a modified YOLOv 5-based small target detection device according to the present invention;
fig. 5 is a schematic diagram of an environment hardware structure implemented by the improved YOLOv 5-based small target detection method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It should be apparent that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without making any creative effort, shall fall within the protection scope of the present invention.
A specific method and apparatus for detecting a small target based on improved YOLOv5 provided in the embodiments of the present application will be described with reference to the accompanying drawings.
As shown in fig. 1, the method for detecting a small target based on improved YOLOv5 provided in the embodiment of the present application includes:
s101, improving the YOLOv5 network model to obtain an improved YOLOv5 network model;
it can be understood that the application is an improvement on the basis of the existing YOLOv5 network model, and an improved YOLOv5 network model with higher performance is obtained.
S102, acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set;
small targets in this application refer to small volumes of target objects, e.g. birds, kites, drones, etc., whose images are determined as image data sets with image sizes 608 x 608 for the training sample set and the test sample set. And then dividing the image data set into a training sample set and a testing sample set, wherein the training sample set is used for training the improved YOLOv5 network model, and the testing sample set is used for testing the trained improved YOLOv5 network model to obtain a testing result.
S103, inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight;
the improved YOLOv5 network model is trained by adopting a training sample set to adjust the pre-training weight of the improved YOLOv5 network model, so that the parameters in the improved YOLOv5 network model achieve the optimal effect, and the improved YOLOv5 network model is more accurate in target detection.
S104, inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.
And finally, inputting the test sample set into the adjusted improved YOLOv5 network model for testing to obtain a detection result.
The working principle of the improved YOLOv 5-based small target detection method is as follows: according to the method, firstly, an existing YOLOv5 network model is improved on the basis to obtain an improved YOLOv5 network model with higher performance, then a training sample set is input into the improved YOLOv5 network model to be trained to obtain pre-training weights, and the pre-training weights are applied to the improved YOLOv5 network model to detect a test sample set to obtain a detection result.
In some embodiments, before inputting the training sample set into the improved YOLOv5 network model for training, the method further includes:
and performing data enhancement on the training sample set and the test sample set by adopting a Mosaic-9 data enhancement mode.
Specifically, the YOLOv5 network model is a deep learning network, and compared with the traditional manually designed features, the deep learning automatic learning features are superior to the features obtained through deep neural network learning in terms of expression ability, robustness and richness. Compared with the traditional target detection method, the deep learning network needs several times of samples, the problem that the deep learning network is not concentrated enough on feature learning of a detected target when the number of the samples is insufficient is caused, and data enhancement is a technology for improving performance and reducing generalization errors when a neural network model for computer vision problems is trained. When the deep learning network model is used for prediction, the image data expansion of the test data set is applied to allow the deep learning network model to predict a plurality of images of different versions so as to obtain better prediction performance, so that the Mosaic data is adopted for enhancement.
In the application, a Mosaic-9 mode is adopted to carry out data enhancement on a training sample set and a testing sample set, and the process includes the steps of randomly cutting, randomly zooming and randomly arranging nine pictures so as to combine the nine pictures into one picture. As shown in fig. 2, specifically, nine pictures are utilized, the nine pictures are spliced after being turned, zoomed, and the like, each picture has its corresponding identification frame, and after the nine pictures are spliced, a new picture is obtained, and at the same time, the identification frame corresponding to the picture is also obtained.
The specific process is as follows: firstly, data of a batch is taken from a total data set, nine pictures are taken from the data at each time randomly, cutting and splicing at random positions are carried out, new pictures are synthesized, the batch size is repeated for multiple times (namely the number of data samples captured by one training), new data of the pictures of the batch size after being subjected to Mosaic data enhancement are finally obtained and then transmitted to a subsequent network for training, namely the nine pictures are transmitted to learn at one time, the background of a detected object is greatly enriched, the data of the nine pictures are calculated at one time during data normalization calculation, the training speed is improved, the memory requirement is also reduced, the data are enhanced in a Mosaic-9 mode, and the obtained data enhancement effect is more remarkable. The application adopts a Mosaic-9 data enhancement mode for the training sample set. The purpose of expanding the sample size is achieved, and the condition that the network generalization is reduced due to insufficient data sets is relieved.
In some embodiments, as shown in fig. 3, the YOLOv5 network model includes: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps:
adding a first Transformer module in the backbone network, and adding a plurality of CBAM modules and a second Transformer module in the hack network.
Specifically, in the application, the improved Yolov5 combines a transform module in a backbone network, modifies a network structure in a hack network to merge a hybrid attention module (CBAM module) and the transform module, and improves the network generalization performance.
It can be understood that the core idea of visual attention is to find the correlation between features based on the original data, and then highlight some important features, such as channel attention, pixel attention, multi-order attention, etc. CBAM (conditional Block Attention Module) is a Module that combines spatial Attention and channel Attention mechanisms. Wherein, the channel attention calculation process is as follows: inputting each channel of the feature map, performing maximal pooling and average pooling simultaneously, enabling the obtained intermediate vector to pass through a Multi-Layer perceptron (MLP), only designing a hidden Layer for the Multi-Layer perceptron, and finally performing element-by-element addition on the feature vectors output by the Multi-Layer perceptron and performing Sigmoid activation operation to obtain channel attention; the specific process for realizing the spatial attention comprises the following steps: and obtaining a characteristic diagram after the channel attention is adjusted, performing maximum or average pooling, performing convolution operation, and activating a convolution result through Sigmoid to obtain the space attention.
In some embodiments, the backbone network further includes:
the system comprises a Focus module, a first CSP module, a second CSP module, a third CSP module and an SPP module;
therefore, the input end of the Focus module is connected with the input end of the Focus module, the output end of the Focus module is connected with the input end of the first CSP module, the output end of the first CSP module is connected with the input end of the second CSP module, the output end of the second CSP module is connected with the input end of the third CSP module, the output end of the third CSP module is connected with the input end of the SPP module, the output end of the SPP module is connected with the input end of the first Transformer module, and the output end of the first Transformer module is connected with the Neck network.
Specifically, in the present application, in a Backbone network (Backbone) portion, the YOLOv5 network uses a Focus module and a CSPNets (Cross Stage partial Networks, cross-regional local convergence Networks) module. The CSP module greatly reduces the calculated amount while enhancing the learning performance of the whole convolutional neural network, the Focus module performs slicing operation on the picture, an input channel is expanded to be 4 times of the original input channel, the calculated amount is reduced while downsampling is achieved, the speed is increased, and the SPP (Spatial Pyramid Pooling) module effectively increases the receiving range of the trunk characteristics.
In some embodiments, the CBAM modules include three, a first CBAM module, a second CBAM module, and a third CBAM module; the hack network further comprises: a fourth CSP module, a fifth CSP module, a sixth CSP module, a first Concat module, a second Concat module, a third Concat module and a fourth Concat module;
the output end of the first Transformer module is connected with the input end of the fourth CSP module through a first Concat module after being subjected to upsampling, and the output end of the fourth CSP module is connected with a fifth CSP module through a second Concat module after being subjected to upsampling;
the input end of the first CBAM module is connected with the output end of the second CSP module and the output end of the third CSP module, and the output end of the first CBAM module is connected with the fifth CSP module through the second Concat module;
the input end of the second CBAM module is connected with the output end of the fourth CSP module, and the output end of the second CBAM module is connected with the output end of the fifth CSP module and then is sent to the sixth CSP module through the third Concat module;
the input end of the third CBAM module is connected with the output end of the first Transformer module, and the output end of the third CBAM module is connected with the output end of the sixth CSP module and then is sent to the second Transformer module through the fourth Concat module;
the output ends of the fifth CSP module, the sixth CSP module and the second Transformer module are all connected with the input end of the prediction layer
The neutral network comprises a Feature Pyramid Network (FPN) and a PAN structure Aggregation network (PAN), wherein the Feature Pyramid network transmits semantic information from top to bottom, and the PAN structure Aggregation network transmits positioning information from bottom to top. And then, the extracted semantic information and the positioning information are fused, and meanwhile, the characteristics of the trunk layer and the detection layer are fused, so that the model can obtain richer characteristic information.
In some embodiments, the first or second Transformer module comprises:
the multi-head attention layer is connected with the full connection layer in a residual mode.
It can be understood that the Transformer has a very significant ability to extract features, while also operating efficiently. Meanwhile, the attention mechanism is used for performing attention reconstruction on the feature map extracted by the Yolov5 network, highlighting important information in the feature map and inhibiting some unimportant information.
Each transform contains two sublayers. The first is a multi-headed attention layer and the second is a fully-connected layer (MLP). Residual concatenation and layer normalization are used between each sub-layer. The Transformer not only improves the capability of capturing different local information, but also can explore characteristic characterization potential by utilizing a self-attention mechanism.
In some embodiments, determining a detection accuracy of the improved YOLOv5 network model; the method comprises the following steps:
obtaining evaluation indexes of an improved YOLOv5 network model, wherein the evaluation indexes comprise a cross-over ratio, accuracy, recall rate, average mean accuracy and average frame rate;
and determining the detection precision of the improved YOLOv5 network model according to the evaluation index.
Evaluation indexes of the improved YOLOv5 network model mainly include cross-over ratio (IoU), precision (Precision), recall rate (Recall), mean Average Precision (mAP) and mean frame rate (FPS).
Wherein, the intersection ratio is calculated by adopting the following method:
Figure BDA0003735607790000101
the accuracy was calculated as follows:
Figure BDA0003735607790000102
the recall ratio is calculated as follows:
Figure BDA0003735607790000103
wherein TP represents the number of detection boxes with IoU > 0.5, FP represents the detection boxes with IoU ≦ 0.5, FN represents the number of no detected group Truth, and the value of IoU is positioned as the ratio of the intersection and union of the areas of the two rectangular boxes. Adding its formula above.
The average mean accuracy is calculated as follows:
Figure BDA0003735607790000104
m refers to the number of categories of detection targets; n is the sample data size used for detection; pi is the probability of correct detection of a class of objects in an image,
Figure BDA0003735607790000105
p from the first to the nth of i =1 i And (4) summing.
Figure BDA0003735607790000106
P from the first to the nth of j =1 j And (6) summing.
The average frame rate (FPS) formula is as follows:
Figure BDA0003735607790000107
in the formula, frames is the frame number of the image processed by the algorithm; time is the Time consumed to process a frame picture.
Along with the increase of the training period, the more accurate the bounding box of the model added with the attention mechanism is, the more accurate the small target detection is, the greater the Precision (Precision) and the Recall rate (Recall) are, and the higher the accuracy rate is. The mAP and the FPS are calculated according to the formula, the method is compared with other deep learning algorithms YOLOv4 and Yolov5, experiments show that the mAP (%) and the FPS of the Yolov4 are 81.6 and 57.8 respectively, the mAP (%) and the FPS of the Yolov5 are 88.3 and 46.8 respectively, the mAP (%) and the FPS of the improved Yolov5 algorithm provided by the invention are 91.4 and 62.5 respectively, the precision is highest, the detection speed is fastest, and small targets can be detected quickly and accurately.
In summary, in the improved YOLOv5 network model in the present application, both the backbone network and the hack network are improved. The most critical part for extracting features in the improved YOLOv5 is a backbone network, the YOLOv5 network is used as a basic framework, a Transformer attention mechanism is fused in the backbone network, and the original CSP2_1 module at the tail end of the backbone network is replaced. Compared with the original CSP2_1 module, the Transformer can better capture global information; in addition, the reason for placing the Transformer module at the end of the backbone network is that the resolution of the characteristic diagram at the end of the network is low, so that the expensive calculation and storage cost can be reduced. Moreover, the structure of the method helps the network to converge better and prevents the network from being over-fitted.
Because the FPN + PAN is a structure with better fusion characteristics at present, when the characteristics of the Neck network are fused, the basic structure combines the mixed attention module CBAM on the structure of the reference FPN + PAN to carry out attention reconstruction on the characteristic diagram extracted by the convolutional neural network, so that important information in the characteristic diagram is highlighted, and some unimportant information is suppressed. The CBAM module is positioned behind the backbone network and performs attention reconstruction in front of the hack network, and can play a role in the beginning and the end.
In addition, the original CSP2_1 module of the network on the characteristic diagram with the output size of 19 × 19 is also replaced by the Transformer module by the Neck network. Because the improved network mainly aims at small targets, the feature learning capability is poor at the deepest part of the network, the attention reconstruction is carried out on the Neck network structure, the original CSP2_1 module is replaced by the Transformer module at the deepest position, and the condition that the small target features in the deepest area of the network are not obvious enough is better relieved. Therefore, the method includes the steps that the CBAM attention mechanism module is added to the Neck network before feature fusion of each time, the input feature graph is subjected to convolution operation along the channel direction through the feature graph with channel attention, then the convolution result is subjected to the space attention module, and finally the more accurate feature graph is obtained. Meanwhile, a Transformer module at the deepest part of the network is used for obtaining richer characteristic information.
As shown in fig. 4, an embodiment of the present application provides a small target detection apparatus based on improved YOLOv5, including:
an improvement module 401, configured to improve the YOLOv5 network model to obtain an improved YOLOv5 network model;
an obtaining module 402, configured to obtain a small target image data set, and divide the image data set into a training sample set and a testing sample set;
a training module 403, configured to input the training sample set into an improved YOLOv5 network model for training, to obtain a pre-training weight, and adjust the improved YOLOv5 network model according to the pre-training weight;
the detection module 404 is configured to input the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.
The working principle of the improved YOLOv 5-based small target detection device provided by the application is that an improved module 401 improves a YOLOv5 network model to obtain an improved YOLOv5 network model; an obtaining module 402 obtains a small target image dataset and divides the image dataset into a training sample set and a testing sample set; the training module 403 inputs the training sample set into an improved YOLOv5 network model for training, so as to obtain a pre-training weight, and adjusts the improved YOLOv5 network model according to the pre-training weight; the detection module 404 inputs the test sample set into the adjusted improved yollov 5 network model to obtain a detection result.
Further, the YOLOv5 network model includes: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps:
adding a first Transformer module in the backbone network, and adding a plurality of CBAM modules and a second Transformer module in the hack network.
The present application provides a computer device comprising: a memory, which may include volatile memory in a computer readable medium, random Access Memory (RAM), and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The computer device stores an operating system, and the memory is an example of a computer-readable medium. The computer program, when executed by the processor, causes the processor to perform a small object detection method based on modified YOLOv5, the structure shown in fig. 5 is only a block diagram of a part of the structure related to the present application, and does not constitute a limitation to a computer device to which the present application is applied, and a specific computer device may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.
In one embodiment, the improved YOLOv 5-based small target detection method provided by the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 5.
In some embodiments, the computer program, when executed by the processor, causes the processor to perform the steps of: improving the YOLOv5 network model to obtain an improved YOLOv5 network model; acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set; inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight; and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.
The present application also provides a computer storage medium, examples of which include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassette tape storage or other magnetic storage devices, or any other non-transmission medium, that can be used to store information that can be accessed by a computing device.
In some embodiments, the present invention further provides a computer-readable storage medium, storing a computer program, which when executed by a processor, improves a yollov 5 network model to obtain an improved yollov 5 network model; acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set; inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight; and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.
In summary, the invention provides a method and a device for detecting a small target based on improved YOLOv5, by improving a YOLOv5 network model, specifically, a transform module is used at the end of a backbone network to replace an original CSP2_1 module, the network is deep, the resolution of a feature map is lower, the transform module can improve the capability of capturing feature information, and the calculation cost and the storage cost can be reduced. In addition, the Neck network is changed from the original structure, a CBAM attention mechanism module is added before feature fusion every time, the input feature graph is subjected to convolution operation along the channel direction through the feature graph of channel attention, then the convolution result is subjected to a space attention module, and finally a more accurate feature graph is obtained.
It is to be understood that the embodiments of the method provided above correspond to the embodiments of the apparatus described above, and the corresponding specific contents may be referred to each other, which is not described herein again.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and shall cover the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A small target detection method based on improved YOLOv5 is characterized by comprising the following steps:
improving the YOLOv5 network model to obtain an improved YOLOv5 network model;
acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set;
inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight;
and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.
2. The method of claim 1, wherein the YOLOv5 network model comprises: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps:
adding a first Transformer module in the backbone network, and adding a plurality of CBAM modules and a second Transformer module in the hack network.
3. The method of claim 2, wherein the backbone network further comprises:
the system comprises a Focus module, a first CSP module, a second CSP module, a third CSP module and an SPP module;
therefore, the input end of the Focus module is connected with the input end of the Focus module, the output end of the Focus module is connected with the input end of the first CSP module, the output end of the first CSP module is connected with the input end of the second CSP module, the output end of the second CSP module is connected with the input end of the third CSP module, the output end of the third CSP module is connected with the input end of the SPP module, the output end of the SPP module is connected with the input end of the first Transformer module, and the output end of the first Transformer module is connected with the Neck network.
4. The method of claim 3, wherein the CBAM modules comprise three, a first CBAM module, a second CBAM module, and a third CBAM module; the hack network further comprises: a fourth CSP module, a fifth CSP module, a sixth CSP module, a first Concat module, a second Concat module, a third Concat module and a fourth Concat module;
the output end of the first Transformer module is connected with the input end of the fourth CSP module through a first Concat module after being subjected to upsampling, and the output end of the fourth CSP module is connected with a fifth CSP module through a second Concat module after being subjected to upsampling;
the input end of the first CBAM module is connected with the output end of the second CSP module and the output end of the third CSP module, and the output end of the first CBAM module is connected with the fifth CSP module through the second Concat module;
the input end of the second CBAM module is connected with the output end of the fourth CSP module, and the output end of the second CBAM module is connected with the output end of the fifth CSP module and then is sent to the sixth CSP module through the third Concat module;
the input end of the third CBAM module is connected with the output end of the first Transformer module, and the output end of the third CBAM module is connected with the output end of the sixth CSP module and then is sent to the second Transformer module through the fourth Concat module;
and the output ends of the fifth CSP module, the sixth CSP module and the second transform module are all connected with the input end of the prediction layer.
5. The method of claim 1, further comprising: determining the detection precision of the improved YOLOv5 network model; the method comprises the following steps:
obtaining evaluation indexes of an improved YOLOv5 network model, wherein the evaluation indexes comprise accuracy, recall rate, average mean accuracy and average frame rate;
and determining the detection precision of the improved YOLOv5 network model according to the evaluation index.
6. The method of claim 2, wherein the first or second Transformer module comprises:
the multi-head attention layer is connected with the full connection layer in a residual mode.
7. The method of claim 1, wherein before inputting the training sample set into the improved YOLOv5 network model for training, the method further comprises:
and performing data enhancement on the training sample set and the test sample set by adopting a Mosaic-9 data enhancement mode.
8. The method of claim 1,
the image sizes of the training sample set and the test sample set are 608 x 608.
9. A small target detection device based on improved YOLOv5 is characterized by comprising:
the improvement module is used for improving the YOLOv5 network model to obtain an improved YOLOv5 network model;
the acquisition module is used for acquiring a small target image data set and dividing the image data set into a training sample set and a test sample set;
the training module is used for inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight;
and the detection module is used for inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.
10. The apparatus of claim 9, wherein the YOLOv5 network model comprises: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps:
adding a first Transformer module in the backbone network, and adding a plurality of CBAM modules and a second Transformer module in the Neck network.
CN202210780605.1A 2022-07-07 2022-07-07 Small target detection method and device based on improved YOLOv5 Pending CN115223009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210780605.1A CN115223009A (en) 2022-07-07 2022-07-07 Small target detection method and device based on improved YOLOv5

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210780605.1A CN115223009A (en) 2022-07-07 2022-07-07 Small target detection method and device based on improved YOLOv5

Publications (1)

Publication Number Publication Date
CN115223009A true CN115223009A (en) 2022-10-21

Family

ID=83609936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210780605.1A Pending CN115223009A (en) 2022-07-07 2022-07-07 Small target detection method and device based on improved YOLOv5

Country Status (1)

Country Link
CN (1) CN115223009A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994244A (en) * 2023-08-16 2023-11-03 临海市特产技术推广总站(临海市柑桔产业技术协同创新中心) Method for evaluating fruit yield of citrus tree based on Yolov8
CN117668669A (en) * 2024-02-01 2024-03-08 齐鲁工业大学(山东省科学院) Pipeline safety monitoring method and system based on improved YOLOv7
CN118038012A (en) * 2023-12-22 2024-05-14 广东工程职业技术学院 YOLOv 5-based fire image detection system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994244A (en) * 2023-08-16 2023-11-03 临海市特产技术推广总站(临海市柑桔产业技术协同创新中心) Method for evaluating fruit yield of citrus tree based on Yolov8
CN118038012A (en) * 2023-12-22 2024-05-14 广东工程职业技术学院 YOLOv 5-based fire image detection system
CN117668669A (en) * 2024-02-01 2024-03-08 齐鲁工业大学(山东省科学院) Pipeline safety monitoring method and system based on improved YOLOv7
CN117668669B (en) * 2024-02-01 2024-04-19 齐鲁工业大学(山东省科学院) Pipeline safety monitoring method and system based on improvement YOLOv (YOLOv)

Similar Documents

Publication Publication Date Title
CN115223009A (en) Small target detection method and device based on improved YOLOv5
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
CN113392960B (en) Target detection network and method based on mixed hole convolution pyramid
CN111696110B (en) Scene segmentation method and system
CN114463759A (en) Lightweight character detection method and device based on anchor-frame-free algorithm
CN116229452B (en) Point cloud three-dimensional target detection method based on improved multi-scale feature fusion
CN116665095B (en) Method and system for detecting motion ship, storage medium and electronic equipment
CN113962246A (en) Target detection method, system, equipment and storage medium fusing bimodal features
Wang et al. TF-SOD: a novel transformer framework for salient object detection
CN113901928A (en) Target detection method based on dynamic super-resolution, and power transmission line component detection method and system
CN114005094A (en) Aerial photography vehicle target detection method, system and storage medium
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN114677357A (en) Model, method and equipment for detecting self-explosion defect of aerial photographing insulator and storage medium
Li et al. Object detection for uav images based on improved yolov6
CN112101113B (en) Lightweight unmanned aerial vehicle image small target detection method
CN111292331B (en) Image processing method and device
CN116681885A (en) Infrared image target identification method and system for power transmission and transformation equipment
CN116206195A (en) Offshore culture object detection method, system, storage medium and computer equipment
CN114821224A (en) Method and system for amplifying railway image style conversion data
CN112446292B (en) 2D image salient object detection method and system
Liu et al. L2-LiteSeg: A Real-Time Semantic Segmentation Method for End-to-End Autonomous Driving
Zhang et al. Real-Time Detection of Small Targets for Video Surveillance Based on MS-YOLOv5
An et al. Research review of object detection algorithms in vehicle detection
Dang et al. Multi-scale spatial transform network for atmospheric polarization prediction
Cai et al. KBN-YOLOv5: Improved YOLOv5 for detecting bird’s nest in high-voltage tower

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination