CN115223009A

CN115223009A - Small target detection method and device based on improved YOLOv5

Info

Publication number: CN115223009A
Application number: CN202210780605.1A
Authority: CN
Inventors: 马显龙; 郭晨鋆; 周帅; 曹占国
Original assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-10-21

Abstract

The invention relates to a small target detection method and a device based on improved YOLOv5, wherein the method comprises the steps of improving a YOLOv5 network model to obtain an improved YOLOv5 network model; acquiring a small target image data set, and dividing the image data set into a training sample set and a test sample set; inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight; and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result. According to the invention, the existing YOLOv5 network model is improved, so that the accuracy and instantaneity of detection work are improved, the human resource loss is reduced, and the monitoring efficiency is effectively improved.

Description

Small target detection method and device based on improved YOLOv5

Technical Field

The invention belongs to the technical field of detection, and particularly relates to a small target detection method and device based on improved YOLOv 5.

Background

At present, the economic development and the social production do not leave the safe and stable supply of electric power, the quality and the safety of power utilization are taken charge in order to ensure the normal power utilization in life and work of people, and the safe operation of a power grid is the basic guarantee of power supply of a power system. Because the city develops rapidly, many small-size transformer substations are in the city center, and this inevitable meeting is the foreign matter that comes in the air, and the most common is small-size foreign matter such as unmanned aerial vehicle and kite. The electric power system is an important infrastructure for supporting the development of the economic society and guaranteeing the basic civilian life, and the normal operation of the modern society highly depends on reliable supply of electric power. Therefore, in order to ensure the safety of the transformer substation and ensure that the operating environment can operate reliably, efficiently and stably, the foreign matter invasion must be effectively monitored and detected so as to know the problems of the transformer substation and find the problems in time so as to solve the problems as early as possible.

In the related art, the existing small target detection method based on the improved YOLOv5 generally has the problems of low detection precision and low speed.

Disclosure of Invention

In view of the above, the present invention is to provide a method and an apparatus for detecting a small target based on improved yoolov 5, so as to solve the problems of low detection accuracy and low speed of the method for detecting a small target based on improved yoolov 5 in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme: a small target detection method based on improved YOLOv5 comprises the following steps:

improving the YOLOv5 network model to obtain an improved YOLOv5 network model;

acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set;

inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight;

and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.

Further, the YOLOv5 network model includes: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps:

adding a first Transformer module in the backbone network, and adding a plurality of CBAM modules and a second Transformer module in the Neck network.

Further, the backbone network further includes:

the system comprises a Focus module, a first CSP module, a second CSP module, a third CSP module and an SPP module;

therefore, the input end of the Focus module is connected with the input end of the Focus module, the output end of the Focus module is connected with the input end of the first CSP module, the output end of the first CSP module is connected with the input end of the second CSP module, the output end of the second CSP module is connected with the input end of the third CSP module, the output end of the third CSP module is connected with the input end of the SPP module, the output end of the SPP module is connected with the input end of the first Transformer module, and the output end of the first Transformer module is connected with the Neck network.

Furthermore, the number of the CBAM modules is three, namely a first CBAM module, a second CBAM module and a third CBAM module; the hack network further comprises: a fourth CSP module, a fifth CSP module, a sixth CSP module, a first Concat module, a second Concat module, a third Concat module and a fourth Concat module;

the output end of the first Transformer module is connected with the input end of the fourth CSP module through a first Concat module after being subjected to upsampling, and the output end of the fourth CSP module is connected with a fifth CSP module through a second Concat module after being subjected to upsampling;

the input end of the first CBAM module is connected with the output end of the second CSP module and the output end of the third CSP module, and the output end of the first CBAM module is connected with the fifth CSP module through the second Concat module;

the input end of the second CBAM module is connected with the output end of the fourth CSP module, and the output end of the second CBAM module is connected with the output end of the fifth CSP module and then is sent to the sixth CSP module through the third Concat module;

the input end of the third CBAM module is connected with the output end of the first Transformer module, and the output end of the third CBAM module is connected with the output end of the sixth CSP module and then is sent to the second Transformer module through the fourth Concat module;

and the output ends of the fifth CSP module, the sixth CSP module and the second transform module are all connected with the input end of the prediction layer.

Further, the method also comprises the following steps: determining the detection precision of the improved YOLOv5 network model; the method comprises the following steps:

obtaining evaluation indexes of an improved YOLOv5 network model, wherein the evaluation indexes comprise accuracy, recall rate, average mean accuracy and average frame rate;

and determining the detection precision of the improved YOLOv5 network model according to the evaluation index.

Further, the first or second Transformer module comprises:

the multi-head attention layer is connected with the full connection layer in a residual mode.

Further, before inputting the training sample set into the improved YOLOv5 network model for training, the method further includes:

and performing data enhancement on the training sample set and the testing sample set by adopting a Mosaic-9 data enhancement mode.

Further, the image size of the training sample set and the test sample set is 608 × 608.

The embodiment of the present application provides a small target detection device based on improved YOLOv5, including:

the improvement module is used for improving the YOLOv5 network model to obtain an improved YOLOv5 network model;

the acquisition module is used for acquiring a small target image data set and dividing the image data set into a training sample set and a test sample set;

the training module is used for inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight;

and the detection module is used for inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.

adding a first Transformer module in the backbone network, and adding a plurality of CBAM modules and a second Transformer module in the hack network.

By adopting the technical scheme, the invention can achieve the following beneficial effects:

the invention provides a small target detection method and a device based on improved YOLOv5, which comprises the steps of improving a YOLOv5 network model to obtain an improved YOLOv5 network model; acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set; inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight; and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result. According to the method and the device, the existing YOLOv5 network model is improved, so that the accuracy and the instantaneity of detection work are improved, the human resource loss is reduced, the accuracy of detection of small targets is effectively improved, and the monitoring efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of the steps of the improved YOLOv 5-based small target detection method of the present invention;

FIG. 2 is a schematic diagram of the Mosaic-9 data enhancement provided by the present invention;

FIG. 3 is a schematic structural diagram of an improved YOLOv5 network model provided by the present invention;

FIG. 4 is a schematic structural diagram of a modified YOLOv 5-based small target detection device according to the present invention;

fig. 5 is a schematic diagram of an environment hardware structure implemented by the improved YOLOv 5-based small target detection method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It should be apparent that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without making any creative effort, shall fall within the protection scope of the present invention.

A specific method and apparatus for detecting a small target based on improved YOLOv5 provided in the embodiments of the present application will be described with reference to the accompanying drawings.

As shown in fig. 1, the method for detecting a small target based on improved YOLOv5 provided in the embodiment of the present application includes:

s101, improving the YOLOv5 network model to obtain an improved YOLOv5 network model;

it can be understood that the application is an improvement on the basis of the existing YOLOv5 network model, and an improved YOLOv5 network model with higher performance is obtained.

S102, acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set;

small targets in this application refer to small volumes of target objects, e.g. birds, kites, drones, etc., whose images are determined as image data sets with image sizes 608 x 608 for the training sample set and the test sample set. And then dividing the image data set into a training sample set and a testing sample set, wherein the training sample set is used for training the improved YOLOv5 network model, and the testing sample set is used for testing the trained improved YOLOv5 network model to obtain a testing result.

S103, inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight;

the improved YOLOv5 network model is trained by adopting a training sample set to adjust the pre-training weight of the improved YOLOv5 network model, so that the parameters in the improved YOLOv5 network model achieve the optimal effect, and the improved YOLOv5 network model is more accurate in target detection.

S104, inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.

And finally, inputting the test sample set into the adjusted improved YOLOv5 network model for testing to obtain a detection result.

The working principle of the improved YOLOv 5-based small target detection method is as follows: according to the method, firstly, an existing YOLOv5 network model is improved on the basis to obtain an improved YOLOv5 network model with higher performance, then a training sample set is input into the improved YOLOv5 network model to be trained to obtain pre-training weights, and the pre-training weights are applied to the improved YOLOv5 network model to detect a test sample set to obtain a detection result.

In some embodiments, before inputting the training sample set into the improved YOLOv5 network model for training, the method further includes:

and performing data enhancement on the training sample set and the test sample set by adopting a Mosaic-9 data enhancement mode.

Specifically, the YOLOv5 network model is a deep learning network, and compared with the traditional manually designed features, the deep learning automatic learning features are superior to the features obtained through deep neural network learning in terms of expression ability, robustness and richness. Compared with the traditional target detection method, the deep learning network needs several times of samples, the problem that the deep learning network is not concentrated enough on feature learning of a detected target when the number of the samples is insufficient is caused, and data enhancement is a technology for improving performance and reducing generalization errors when a neural network model for computer vision problems is trained. When the deep learning network model is used for prediction, the image data expansion of the test data set is applied to allow the deep learning network model to predict a plurality of images of different versions so as to obtain better prediction performance, so that the Mosaic data is adopted for enhancement.

In the application, a Mosaic-9 mode is adopted to carry out data enhancement on a training sample set and a testing sample set, and the process includes the steps of randomly cutting, randomly zooming and randomly arranging nine pictures so as to combine the nine pictures into one picture. As shown in fig. 2, specifically, nine pictures are utilized, the nine pictures are spliced after being turned, zoomed, and the like, each picture has its corresponding identification frame, and after the nine pictures are spliced, a new picture is obtained, and at the same time, the identification frame corresponding to the picture is also obtained.

The specific process is as follows: firstly, data of a batch is taken from a total data set, nine pictures are taken from the data at each time randomly, cutting and splicing at random positions are carried out, new pictures are synthesized, the batch size is repeated for multiple times (namely the number of data samples captured by one training), new data of the pictures of the batch size after being subjected to Mosaic data enhancement are finally obtained and then transmitted to a subsequent network for training, namely the nine pictures are transmitted to learn at one time, the background of a detected object is greatly enriched, the data of the nine pictures are calculated at one time during data normalization calculation, the training speed is improved, the memory requirement is also reduced, the data are enhanced in a Mosaic-9 mode, and the obtained data enhancement effect is more remarkable. The application adopts a Mosaic-9 data enhancement mode for the training sample set. The purpose of expanding the sample size is achieved, and the condition that the network generalization is reduced due to insufficient data sets is relieved.

In some embodiments, as shown in fig. 3, the YOLOv5 network model includes: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps:

Specifically, in the application, the improved Yolov5 combines a transform module in a backbone network, modifies a network structure in a hack network to merge a hybrid attention module (CBAM module) and the transform module, and improves the network generalization performance.

It can be understood that the core idea of visual attention is to find the correlation between features based on the original data, and then highlight some important features, such as channel attention, pixel attention, multi-order attention, etc. CBAM (conditional Block Attention Module) is a Module that combines spatial Attention and channel Attention mechanisms. Wherein, the channel attention calculation process is as follows: inputting each channel of the feature map, performing maximal pooling and average pooling simultaneously, enabling the obtained intermediate vector to pass through a Multi-Layer perceptron (MLP), only designing a hidden Layer for the Multi-Layer perceptron, and finally performing element-by-element addition on the feature vectors output by the Multi-Layer perceptron and performing Sigmoid activation operation to obtain channel attention; the specific process for realizing the spatial attention comprises the following steps: and obtaining a characteristic diagram after the channel attention is adjusted, performing maximum or average pooling, performing convolution operation, and activating a convolution result through Sigmoid to obtain the space attention.

In some embodiments, the backbone network further includes:

Specifically, in the present application, in a Backbone network (Backbone) portion, the YOLOv5 network uses a Focus module and a CSPNets (Cross Stage partial Networks, cross-regional local convergence Networks) module. The CSP module greatly reduces the calculated amount while enhancing the learning performance of the whole convolutional neural network, the Focus module performs slicing operation on the picture, an input channel is expanded to be 4 times of the original input channel, the calculated amount is reduced while downsampling is achieved, the speed is increased, and the SPP (Spatial Pyramid Pooling) module effectively increases the receiving range of the trunk characteristics.

In some embodiments, the CBAM modules include three, a first CBAM module, a second CBAM module, and a third CBAM module; the hack network further comprises: a fourth CSP module, a fifth CSP module, a sixth CSP module, a first Concat module, a second Concat module, a third Concat module and a fourth Concat module;

the output ends of the fifth CSP module, the sixth CSP module and the second Transformer module are all connected with the input end of the prediction layer

The neutral network comprises a Feature Pyramid Network (FPN) and a PAN structure Aggregation network (PAN), wherein the Feature Pyramid network transmits semantic information from top to bottom, and the PAN structure Aggregation network transmits positioning information from bottom to top. And then, the extracted semantic information and the positioning information are fused, and meanwhile, the characteristics of the trunk layer and the detection layer are fused, so that the model can obtain richer characteristic information.

In some embodiments, the first or second Transformer module comprises:

It can be understood that the Transformer has a very significant ability to extract features, while also operating efficiently. Meanwhile, the attention mechanism is used for performing attention reconstruction on the feature map extracted by the Yolov5 network, highlighting important information in the feature map and inhibiting some unimportant information.

Each transform contains two sublayers. The first is a multi-headed attention layer and the second is a fully-connected layer (MLP). Residual concatenation and layer normalization are used between each sub-layer. The Transformer not only improves the capability of capturing different local information, but also can explore characteristic characterization potential by utilizing a self-attention mechanism.

In some embodiments, determining a detection accuracy of the improved YOLOv5 network model; the method comprises the following steps:

obtaining evaluation indexes of an improved YOLOv5 network model, wherein the evaluation indexes comprise a cross-over ratio, accuracy, recall rate, average mean accuracy and average frame rate;

Evaluation indexes of the improved YOLOv5 network model mainly include cross-over ratio (IoU), precision (Precision), recall rate (Recall), mean Average Precision (mAP) and mean frame rate (FPS).

Wherein, the intersection ratio is calculated by adopting the following method:

the accuracy was calculated as follows:

the recall ratio is calculated as follows:

wherein TP represents the number of detection boxes with IoU > 0.5, FP represents the detection boxes with IoU ≦ 0.5, FN represents the number of no detected group Truth, and the value of IoU is positioned as the ratio of the intersection and union of the areas of the two rectangular boxes. Adding its formula above.

The average mean accuracy is calculated as follows:

m refers to the number of categories of detection targets; n is the sample data size used for detection; pi is the probability of correct detection of a class of objects in an image,

p from the first to the nth of i =1 _i And (4) summing.

P from the first to the nth of j =1 _j And (6) summing.

The average frame rate (FPS) formula is as follows:

in the formula, frames is the frame number of the image processed by the algorithm; time is the Time consumed to process a frame picture.

Along with the increase of the training period, the more accurate the bounding box of the model added with the attention mechanism is, the more accurate the small target detection is, the greater the Precision (Precision) and the Recall rate (Recall) are, and the higher the accuracy rate is. The mAP and the FPS are calculated according to the formula, the method is compared with other deep learning algorithms YOLOv4 and Yolov5, experiments show that the mAP (%) and the FPS of the Yolov4 are 81.6 and 57.8 respectively, the mAP (%) and the FPS of the Yolov5 are 88.3 and 46.8 respectively, the mAP (%) and the FPS of the improved Yolov5 algorithm provided by the invention are 91.4 and 62.5 respectively, the precision is highest, the detection speed is fastest, and small targets can be detected quickly and accurately.

In summary, in the improved YOLOv5 network model in the present application, both the backbone network and the hack network are improved. The most critical part for extracting features in the improved YOLOv5 is a backbone network, the YOLOv5 network is used as a basic framework, a Transformer attention mechanism is fused in the backbone network, and the original CSP2_1 module at the tail end of the backbone network is replaced. Compared with the original CSP2_1 module, the Transformer can better capture global information; in addition, the reason for placing the Transformer module at the end of the backbone network is that the resolution of the characteristic diagram at the end of the network is low, so that the expensive calculation and storage cost can be reduced. Moreover, the structure of the method helps the network to converge better and prevents the network from being over-fitted.

Because the FPN + PAN is a structure with better fusion characteristics at present, when the characteristics of the Neck network are fused, the basic structure combines the mixed attention module CBAM on the structure of the reference FPN + PAN to carry out attention reconstruction on the characteristic diagram extracted by the convolutional neural network, so that important information in the characteristic diagram is highlighted, and some unimportant information is suppressed. The CBAM module is positioned behind the backbone network and performs attention reconstruction in front of the hack network, and can play a role in the beginning and the end.

In addition, the original CSP2_1 module of the network on the characteristic diagram with the output size of 19 × 19 is also replaced by the Transformer module by the Neck network. Because the improved network mainly aims at small targets, the feature learning capability is poor at the deepest part of the network, the attention reconstruction is carried out on the Neck network structure, the original CSP2_1 module is replaced by the Transformer module at the deepest position, and the condition that the small target features in the deepest area of the network are not obvious enough is better relieved. Therefore, the method includes the steps that the CBAM attention mechanism module is added to the Neck network before feature fusion of each time, the input feature graph is subjected to convolution operation along the channel direction through the feature graph with channel attention, then the convolution result is subjected to the space attention module, and finally the more accurate feature graph is obtained. Meanwhile, a Transformer module at the deepest part of the network is used for obtaining richer characteristic information.

As shown in fig. 4, an embodiment of the present application provides a small target detection apparatus based on improved YOLOv5, including:

an improvement module 401, configured to improve the YOLOv5 network model to obtain an improved YOLOv5 network model;

an obtaining module 402, configured to obtain a small target image data set, and divide the image data set into a training sample set and a testing sample set;

a training module 403, configured to input the training sample set into an improved YOLOv5 network model for training, to obtain a pre-training weight, and adjust the improved YOLOv5 network model according to the pre-training weight;

the detection module 404 is configured to input the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.

The working principle of the improved YOLOv 5-based small target detection device provided by the application is that an improved module 401 improves a YOLOv5 network model to obtain an improved YOLOv5 network model; an obtaining module 402 obtains a small target image dataset and divides the image dataset into a training sample set and a testing sample set; the training module 403 inputs the training sample set into an improved YOLOv5 network model for training, so as to obtain a pre-training weight, and adjusts the improved YOLOv5 network model according to the pre-training weight; the detection module 404 inputs the test sample set into the adjusted improved yollov 5 network model to obtain a detection result.

The present application provides a computer device comprising: a memory, which may include volatile memory in a computer readable medium, random Access Memory (RAM), and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The computer device stores an operating system, and the memory is an example of a computer-readable medium. The computer program, when executed by the processor, causes the processor to perform a small object detection method based on modified YOLOv5, the structure shown in fig. 5 is only a block diagram of a part of the structure related to the present application, and does not constitute a limitation to a computer device to which the present application is applied, and a specific computer device may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.

In one embodiment, the improved YOLOv 5-based small target detection method provided by the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 5.

In some embodiments, the computer program, when executed by the processor, causes the processor to perform the steps of: improving the YOLOv5 network model to obtain an improved YOLOv5 network model; acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set; inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight; and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.

The present application also provides a computer storage medium, examples of which include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassette tape storage or other magnetic storage devices, or any other non-transmission medium, that can be used to store information that can be accessed by a computing device.

In some embodiments, the present invention further provides a computer-readable storage medium, storing a computer program, which when executed by a processor, improves a yollov 5 network model to obtain an improved yollov 5 network model; acquiring a small target image data set, and dividing the image data set into a training sample set and a testing sample set; inputting the training sample set into an improved YOLOv5 network model for training to obtain a pre-training weight, and adjusting the improved YOLOv5 network model according to the pre-training weight; and inputting the test sample set into the adjusted improved YOLOv5 network model to obtain a detection result.

In summary, the invention provides a method and a device for detecting a small target based on improved YOLOv5, by improving a YOLOv5 network model, specifically, a transform module is used at the end of a backbone network to replace an original CSP2_1 module, the network is deep, the resolution of a feature map is lower, the transform module can improve the capability of capturing feature information, and the calculation cost and the storage cost can be reduced. In addition, the Neck network is changed from the original structure, a CBAM attention mechanism module is added before feature fusion every time, the input feature graph is subjected to convolution operation along the channel direction through the feature graph of channel attention, then the convolution result is subjected to a space attention module, and finally a more accurate feature graph is obtained.

It is to be understood that the embodiments of the method provided above correspond to the embodiments of the apparatus described above, and the corresponding specific contents may be referred to each other, which is not described herein again.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and shall cover the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A small target detection method based on improved YOLOv5 is characterized by comprising the following steps:

improving the YOLOv5 network model to obtain an improved YOLOv5 network model;

2. The method of claim 1, wherein the YOLOv5 network model comprises: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps:

3. The method of claim 2, wherein the backbone network further comprises:

4. The method of claim 3, wherein the CBAM modules comprise three, a first CBAM module, a second CBAM module, and a third CBAM module; the hack network further comprises: a fourth CSP module, a fifth CSP module, a sixth CSP module, a first Concat module, a second Concat module, a third Concat module and a fourth Concat module;

5. The method of claim 1, further comprising: determining the detection precision of the improved YOLOv5 network model; the method comprises the following steps:

6. The method of claim 2, wherein the first or second Transformer module comprises:

7. The method of claim 1, wherein before inputting the training sample set into the improved YOLOv5 network model for training, the method further comprises:

8. The method of claim 1,

the image sizes of the training sample set and the test sample set are 608 x 608.

9. A small target detection device based on improved YOLOv5 is characterized by comprising:

10. The apparatus of claim 9, wherein the YOLOv5 network model comprises: the input end, the backbone network, the Neck network, the prediction network and the output end are connected in sequence; the method for improving the YOLOv5 network model to obtain the improved YOLOv5 network model comprises the following steps: