CN114627502A - Improved YOLOv 5-based target recognition detection method - Google Patents

Improved YOLOv 5-based target recognition detection method Download PDF

Info

Publication number
CN114627502A
CN114627502A CN202210240265.3A CN202210240265A CN114627502A CN 114627502 A CN114627502 A CN 114627502A CN 202210240265 A CN202210240265 A CN 202210240265A CN 114627502 A CN114627502 A CN 114627502A
Authority
CN
China
Prior art keywords
target
improved
yolov5
detection algorithm
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210240265.3A
Other languages
Chinese (zh)
Inventor
李广博
查文文
焦俊
陈成鹏
辜丽川
时国龙
马慧敏
陶亮
彭硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Agricultural University AHAU
Original Assignee
Anhui Agricultural University AHAU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Agricultural University AHAU filed Critical Anhui Agricultural University AHAU
Priority to CN202210240265.3A priority Critical patent/CN114627502A/en
Publication of CN114627502A publication Critical patent/CN114627502A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target identification detection method based on improved YOLOv5, belonging to the field of target detection and comprising the following steps: acquiring target image sample data and constructing a sample data set; carrying out capacity expansion processing on the sample data set to obtain a data set to be identified; the method for improving the target detection algorithm YOLOv5 to obtain the improved target detection algorithm YOLOv5 specifically comprises the following steps: optimizing a target anchor frame, and adding a coordinated attention mechanism CA and a feature fusion BiFP; and identifying the image information in the data set to be identified by using an improved target detection algorithm YOLOv5 to obtain an identification result. The method adopts an improved target detection algorithm YOLOv5, so that the individual identification accuracy of the live pigs under the general condition is improved, and the detection performance under the situations of dense live pigs and remote small targets is improved.

Description

Improved YOLOv 5-based target recognition detection method
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to a target identification detection method based on improved YOLOv 5.
Background
With the development of modern breeding industry, the production management of live pigs is increasingly benefited, systematized and intelligentized, and the accurate identification of individual live pigs is an important part of the production management. The traditional pig individual identification mainly comprises modes of painting, branding, ear tag wearing, Radio Frequency Identification (RFID) and the like. The problems of label falling, infection and the like exist in the processes of painting pigment, branding and wearing the ear tag, and the production management of the live pigs is not facilitated. Radio frequency identification is costly and the signal is susceptible to interference from metallic substances.
With the development of machine learning in recent years, the modern breeding industry gradually adopts a neural network to carry out non-invasive identification on individual pigs. The neural network applied to pig face recognition mainly comprises the following components: marsort and the like build a self-adaptive pig face recognition method based on a convolutional neural network, and the accuracy rate reaches 83%; the pig face recognition based on the improved YOLOv3 is proposed by Tong et al, and the accuracy rate reaches 90.12%; hansen et al propose CNN models based on structures such as convolution, maximum pooling and close connection, and improve the pig face recognition effect. The method for recognizing the face posture of the live pig by improving a Tiny-YOLO model is provided by Yanhong, and the like, and the accuracy rate reaches 82.38%; eric T.Psota and the like establish a full convolution neural network to carry out example segmentation on live pigs, and the accuracy rate reaches 91%. These non-invasive methods represent a welfare for live pig production, but there is a need for further improvement in recognition accuracy.
Therefore, the invention provides a target recognition detection method based on improved YOLOv 5.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a target identification detection method based on improved YOLOv 5.
In order to achieve the above purpose, the invention provides the following technical scheme:
a target recognition detection method based on improved YOLOv5 comprises the following steps:
acquiring target image sample data and constructing a sample data set;
carrying out capacity expansion processing on the sample data set to obtain a data set to be identified;
the method for improving the target detection algorithm YOLOv5 to obtain the improved target detection algorithm YOLOv5 specifically comprises the following steps:
changing the Euclidean distance of a K-Means dimension clustering algorithm K-Means to 1-IOU, determining a priori anchor frame by adopting the K-Means algorithm, and optimizing a target anchor frame of a target detection algorithm YOLOv 5;
introducing a coordinated attention mechanism CA in a backbone network of a target detection algorithm YOLOv 5;
bidirectional cross-scale connection of a BiFPN improved target detection algorithm YOLOv5 is adopted, and weighting characteristic fusion is carried out to obtain an improved target detection algorithm YOLOv 5;
and identifying the image information in the data set to be identified by using an improved target detection algorithm YOLOv5 to obtain an identification result.
Preferably, the acquiring the target image sample data specifically includes: the camera head is controlled to rotate through the remote control system, time-sharing collection is achieved, and image sample data containing different characteristics are obtained.
Preferably, the expanding the capacity of the sample data set includes the following steps:
randomly cutting, randomly offsetting and enhancing Mosaic data of the sample data set, manually marking an image frame by using a picture marking tool labelImg, assigning a label name to the image frame, and storing the image frame, wherein the stored XML file comprises the target frame coordinate and the category information of the target image;
dividing a data set after a target label into a training set and a test set sample;
the Mosaic data enhancement is to piece together a plurality of experimental picture images in a training set for training an improved target detection algorithm YOLOv 5.
Preferably, the sample data set is screened and integrated before the capacity expansion processing is performed on the sample data set.
Preferably, when a coordinated attention mechanism CA is introduced into the backbone network of the target detection algorithm YOLOv5, the global pooling mode is decomposed and converted into two one-dimensional feature codes, which specifically includes:
firstly, an input image X is given, and each channel is coded along a horizontal coordinate and a vertical coordinate respectively by using average pooling with the sizes of (H,1) and (1, W), wherein H is the coordinate height, and W is the coordinate width;
the output of the c-th channel with height h and width w is expressed as follows:
Figure BDA0003541208470000031
Figure BDA0003541208470000032
in the formula, i and w are respectively the variation of width and height, and the 2 transformations respectively aggregate characteristics along two spatial directions to obtain a pair of direction-sensing characteristic diagrams;
after transformation in information embedding, the height z of the outputhAnd width zwPerforming a splicing operation by1 × 1 convolution F1And calculating to generate a feature map of the spatial information in the vertical and horizontal directions, wherein the formula is as follows:
f=δ(F1([zh,zw]))
then f is decomposed into tensors f along the spatial informationh∈RC/r×HAnd tensor fw∈RC/r×W(ii) a Where δ is the coefficient, r is the reduction rate used to control the sample size, and then for FhAnd FwPerforming a1 × 1 convolution transform on each of the fhAnd fwTransformed into a tensor with the same number of channels, the formula is as follows:
Figure BDA0003541208470000033
Figure BDA0003541208470000034
in the above formula
Figure BDA0003541208470000035
Is a sigmoid activation function, while reducing the number of channels of f by a suitable reduction ratio r,
finally, g is addedhAnd gwAn expansion operation is performed, as attention weights, respectively, with the following as output:
Figure BDA0003541208470000036
preferably, the performing the bidirectional cross-scale connection by using the BiFPN improved target detection algorithm YOLOv5 and performing the weighted feature fusion specifically includes:
deleting nodes which do not meet the standard in two non-adjacent fusion feature networks, namely nodes which only have one input edge and do not have feature fusion;
adding an additional edge from an original input to an output node between the two non-adjacent fused feature networks;
a pair of paths is treated as one feature layer and then repeated multiple times to get more high-level feature fusions.
Preferably, additional weights are added in the high-level feature fusion by fast normalized fusion.
Preferably, the identifying the image information in the data set to be identified by using the improved target detection algorithm YOLOv5 specifically includes:
inputting the test set sample into an improved target detection algorithm YOLOv5, and detecting a target image through a target anchor frame to obtain a target frame;
extracting feature information in the target frame through a backbone network;
and performing weighted feature fusion on the feature information to obtain an identification result.
Preferably, before the improved target detection algorithm YOLOv5 is used for identifying the data to be identified, the improved target detection algorithm YOLOv5 is evaluated, and the evaluation index includes: recall, abbreviated as R; precision, P for short; average precision AP and average precision MAP of the AP value of all categories;
wherein, TP is the correct target detection number, FN is the target number of missed detection, FP is the target number of virtual detection, and the specific formula is as follows:
Figure BDA0003541208470000041
Figure BDA0003541208470000042
Figure BDA0003541208470000043
the target identification detection method based on the improved YOLOv5 provided by the invention has the following beneficial effects:
the method comprises the steps of firstly, changing the Euclidean distance of a K-Means cluster into 1-IOU, and improving the adaptability of a model target frame; then, a coordinate attention mechanism is introduced into the backbone network so as to more effectively learn the characteristics of the small target and the target position; and finally, BiFPN feature fusion is introduced in a neck improved feature fusion mode, so that the model receptive field is enlarged, and the multi-scale learning of multiple interference targets is enhanced. Therefore, the individual identification accuracy of the live pigs under the general condition is improved, and the detection performance under the situations of dense live pigs and long-distance small targets is also improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention and the design thereof, the drawings required for the embodiments will be briefly described below. The drawings in the following description are only some embodiments of the invention and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a flowchart of an improved YOLOv 5-based target recognition detection method according to embodiment 1 of the present invention;
FIG. 2 is a diagram of a YOLOv5s network architecture;
FIG. 3 is a diagram of a coordinate attention mechanism;
FIG. 4 is a schematic diagram of a module for attention to CA;
FIG. 5 is a PANET and BiFPN feature fusion diagram;
FIG. 6 is a BiFPN feature fusion module flow;
FIG. 7 is an improved BiFPN feature fusion module;
FIG. 8 is a pig face data set annotation interface;
FIG. 9 is a comparison of the detection effects before and after introduction of the CA module;
FIG. 10 is a comparison of the before and after detection effects of the improved BiFPN feature fusion module;
FIG. 11 is a Loss plot;
fig. 12 is a comparison graph of the results.
Detailed Description
In order that those skilled in the art will better understand the technical solutions of the present invention and can practice the same, the present invention will be described in detail with reference to the accompanying drawings and specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1
The invention provides a target identification and detection method based on improved YOLOv5, which takes a live pig in Jing Ming pig farm of Meng city, Anhui as a detection object to identify and detect facial information of the live pig, and specifically as shown in figure 1, the method comprises the following steps:
step 1, collecting target image sample data and constructing a sample data set;
the collection tool is a Rouzen C920Pro camera. In order to better collect the facial information of the live pig, a system for remotely controlling the collecting device is established in the experiment, the camera of the collecting device can be remotely controlled to rotate through the control system, and the acquisition can be carried out in a time-sharing mode under the condition of sufficient light, so that the facial information of the live pig with different characteristics can be obtained, and image sample data containing different characteristics can be obtained. A total of 2126 sample images were acquired with a resolution of 1920 pixels by 1080 pixels.
Step 2, screening and integrating the sample data to obtain a sample data set;
in order to ensure that a good data set is obtained, the invention firstly screens and integrates the acquired data to obtain a sample data set.
Step 3, carrying out capacity expansion processing on the sample data set to obtain data to be identified, which specifically comprises the following steps:
and (3) randomly cutting, randomly shifting, performing Mosaic and other data enhancement on the sample data set to expand the experimental sample to 6378 samples, wherein the Mosaic data enhancement is to splice four experimental pictures into one for training, so that the capability of the model for detecting the small target is improved to a certain extent. The live pig data set in the experiment is 5 live pig individuals, the labelImg manual frame is used, the tag names are assigned, the numbers are respectively pig1 and pig2 … … pig5, and the sample division ratio of the training set to the test set is about 9: 1. the stored XML file includes the target frame coordinates and the category information of the sample image, and the annotation interface is shown in fig. 8.
Step 4, improving a target detection algorithm YOLOv5 to obtain an improved target detection algorithm YOLOv 5;
the principle of the YOLOv5 algorithm is first described as follows:
the YOLOv5 target detection algorithm is a new generation algorithm which inherits essence of a YOLO series algorithm, and is improved to a different extent in weight files, reasoning time and training time compared with YOLOv3 and YOLOv 4. In the official code of Yolov5, a total of 4 versions of a given target detection network are four models of Yolov5s, Yolov5m, Yolov5l and Yolov5 x. The four models are deepened and widened on the basis of Yolov5 s. Considering that pig face identification is applied to projects, the invention selects a lightweight network Yolov5s, and the structure of the lightweight network Yolov5s is mainly divided into four parts, namely an Input end, a Backbone network of a Backbone, a neutral network and a Prediction output end.
The Input end carries out unified processing on the pictures of the Input model mainly through three modes of Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture size.
The Backbone network of the backhaul mainly has the function of extracting features, and mainly comprises three modules, namely Focus, BottleneckCSP and spatial pyramid pooling SPP. The Focus module periodically extracts pixel points from an input picture to be reconstructed into a low-resolution image, namely, four adjacent positions of the image are stacked, so that the reception field of each point is improved, the loss of original information is reduced, the calculation amount is reduced, and the speed is increased. The BottleneckCSP module mainly comprises two parts of Bottleneck and CSP, and effectively reduces the increase speed of calculated quantity. SPP modules adopt 5/9/13 maximum pooling respectively, and then Concat fusion is carried out, so that the receptive field is improved.
PANET in the Neck network is based on the Mask R-CNN and FPN frameworks, enhances information dissemination, and has the capability of accurately retaining spatial information, which helps properly position pixels to form masks.
The output end of the Prediction is mainly a detection part, which applies an anchor box on the feature diagram and generates a final vector of classification probability, confidence and a target anchor box, and a complete Yolov5 model diagram is shown in FIG. 2.
The improvement of the target detection algorithm YOLOv5 comprises the following improvement steps:
step 4.1, optimizing the target anchor frame
The size of the target detection prior anchor frame has a great effect on target detection, and the network learning detector is accurate by selecting a proper prior anchor frame. There are two main ways to determine the prior anchor frame: empirical determination and clustering determination. YOLOv5 adopts a K-Means dimension clustering (K-Means) mode to determine a prior anchor frame, and the Euclidean distance used by a K-Means clustering algorithm is changed into 1-IOU (bboxes, anchors) as a determination mode of the prior anchor frame. In order to prove the feasibility of the improved model, the Avg IOU is introduced as an evaluation index, namely the average value of the maximum IOU of the prior frame and the actual target frame is obtained, and the larger the Avg IOU is, the better the obtained prior frame is.
The experimental hardware of the invention adopts 16G memory, NVIDIA GeForce RTX 2080Ti display cards, Intel i79700F 3.0.0 GHz and memory 16G processors, the software adopts Pythrch 1.7.1 and CUDA11.1, the experimental samples all use self-made pig face data sets, the total number of the experimental samples is 2126, the number of prior frames is 9, and the like. The prior boxes are designated as Cluster SSE using hand-designed AnchorBoxes, Cluster using original K-Means clustering, and Cluster IOU using improved clustering.
TABLE 1 Avg IOU comparison
Figure BDA0003541208470000071
The experimental results of table 1 show that: in the determination of the prior frame, the original edition clustering algorithm is superior to the manual design method, the improved clustering algorithm is superior to the original edition clustering algorithm and the manual design method, and the Avg IOU is respectively improved by 6.1 percent and 2.1 percent. Therefore, the improved K-Means algorithm is superior to the prior frame determination method, has higher fitness and improves the effect of multi-scale pig face image detection of the model.
Step 4.2, introducing a coordinate attention mechanism CA
In order to select feature information which is more critical to the current task from the input picture information, the invention introduces attention mechanisms, and the attention mechanisms mainly comprise three types of space attention, channel attention and self-attention at present. Most Attention mechanisms are used for a deep neural network, the performance can be well improved, but the calculation overhead is hard to bear for a mobile network with a smaller model, so that SE (Squeeze-and-Excitation), CBAM (conditional Block Attention Module), CA (coding Attention) Attention mechanisms are mainly introduced for experiments, a novel and efficient Attention mechanism CA is finally adopted for coding the channel relation and the long-term dependence through accurate position information, and the efficiency is ensured while almost no additional calculation is carried out. The specific flow is shown in fig. 3.
In order to enable the attention module to capture feature information with precise positions, the traditional global pooling mode is decomposed and converted into two one-dimensional feature codes. Specifically, first given an input X, each channel is encoded along a horizontal and vertical coordinate using an average pooling of sizes (H,1) and (1, W), respectively, H being the coordinate height and W being the coordinate width. Thus, the output of the c-th channel with height h and width w can be expressed as follows:
Figure BDA0003541208470000081
Figure BDA0003541208470000082
wherein i and w are the variation of width and height, respectively, and the 2 transformations aggregate features along two spatial directions to obtain a pair of direction-sensing feature maps. Meanwhile, the attention module is allowed to capture the long-term dependence along one spatial direction and store accurate position information along the other spatial direction, which is helpful for a network to eliminate the interference of picture background and more accurately locate an interested target.
After transformation in information embedding, the height and width z of the outputh,zwTo carry outSplicing operation and convolution of F by 1X 11And calculating to generate a feature map of the spatial information in the vertical and horizontal directions, wherein the formula is as follows: .
f=δ(F1([zh,zw])) (3)
Then f is decomposed into tensors f along the spatial informationh∈RC/r×HAnd tensor fw∈RC/r×W. Where δ is a coefficient and r is the reduction rate used to control the sample size. For F againhAnd FwPerforming a1 × 1 convolution transform on each of the fhAnd fwTransformed into a tensor with the same number of channels, the formula is as follows:
Figure BDA0003541208470000091
Figure BDA0003541208470000092
in the above formula
Figure BDA0003541208470000093
Is the sigmoid activation function. Meanwhile, the number of channels of f is reduced by a proper reduction ratio r, and the calculation amount and complexity of the model are reduced. Finally, g is addedhAnd gwAn expansion operation is performed, as attention weights, respectively, with the following as output:
Figure BDA0003541208470000094
the network flow before and after the main network introduces the CA attention mechanism is shown in figure 4.
Step 4.3, feature fusion BiFPN
After the neural network extracts features through the backbone network, the use of high-level features and low-level features is very critical to the lifting model in the aspect of pig face detection. The original edition Yolov5 uses bidirectional feature fusion using PANET (see fig. 5(a)) in feature fusion, and although the use and fusion of features are improved as a whole, it is not possible to perform fusion for features with a large contribution to sexual learning, and a large amount of parameters and calculations are also required.
Therefore, the invention introduces lightweight general upsampling operator (CARAFE) feature fusion, Adaptive Spatial Feature Fusion (ASFF) and BiFPN feature fusion to carry out experiments, finally selects BiFPN (as shown in figure 5(b)) with better effect to improve bidirectional cross-scale connection, and carries out weighted feature fusion. The specific flow is shown in fig. 6. Firstly, deleting nodes contributing less in the fusion feature network in P3 and P5, namely nodes which only have one input edge and do not have feature fusion; then add extra edges from the original input to the output node at P4 to fuse more features without adding too much cost; and finally, regarding a pair of paths as a feature layer, and repeating the steps for multiple times to obtain more high-level feature fusion. Since different input features have different resolutions, and the different resolutions contain different semantic information, that is, the contribution amounts of the features are different, the invention adds additional weight through fast normalization fusion (as shown in the following formula 7), so that the network learns the importance of each feature layer through the weight. Therefore, the model has better performance in pig face detection. The network diagrams before and after the BiFPN feature fusion is improved are shown in FIG. 7.
Figure BDA0003541208470000101
In the above formula, wiIs obtained by adding more than or equal to 0 to each wiAfter that, Relu is applied to ensure that e ═ 0.0001 is a small value to avoid instability of the value.
The experiment of the invention is realized by adopting a display card of a 16G memory and NVIDIA GeForce RTX 2080Ti, an Intel i79700F 3.0.0 GHz and a processor of a memory 16G under the environment of Pythrch 1.7.1 and CUDA 11.1.
The invention obtains the initialization weight of a pre-training model by training on a large data set COCO, optimizes the overall target by using an SGD optimizer, the training batch is 16, the learning rate is set to be 0.01, and the model iterates for 100 times. The image sizes used by the model were 640 x 640.
The improved target detection algorithm YOLOv5 is evaluated before the recognition of the data to be recognized is performed with the improved target detection algorithm YOLOv 5. The invention uses the evaluation indexes commonly used in deep learning: recall rate Recall, R for short; precision, P for short; average precision AP (average precision) and average precision map (mean average precision) of the AP values for all classes. Wherein TP (true presenting) is the correct number of pig faces detected, FN (false negatives) is the number of missed pig faces detected, and FP (false presenting) is the number of false pig faces detected. The specific formula is as follows:
Figure BDA0003541208470000102
Figure BDA0003541208470000103
Figure BDA0003541208470000104
step 5, identifying the data to be identified by using an improved target detection algorithm YOLOv5 to obtain an identification result, wherein the identification result comprises the following steps: inputting the test set sample into an improved target detection algorithm YOLOv5, and detecting a target image through a target anchor frame to obtain a target frame; extracting characteristic information in the target frame through a backbone network; and performing weighted feature fusion on the feature information to obtain an identification result.
Results and analysis of the experiments
First, improving the experimental contrast of the target Anchor frame
In order to further measure the performance of the optimization target anchor frame improvement on pig face identification detection, the same experimental device and parameters are adopted for carrying out experiments under a self-made pig face data set. Specifically, as shown in table 2, "+" indicates that the model adds the module (this is meant for each table below). The experimental result shows that compared with the original yollov 5 model, the model 1 after determining the target anchor frame by K-Means clustering is improved to different degrees in accuracy, recall rate and average accuracy rate. Meanwhile, support is provided for the improvement of the improved YOLOv5 model in multi-scale pig face detection.
TABLE 2 comparison of Performance before and after optimization of target Anchor Frames
Figure BDA0003541208470000111
Second, comparing the experimental results with the detection results
In order to better analyze the effect of the attention mechanism on the pig face detection, the model of introducing the CA, SE, CBAM attention mechanism in YOLOv5 was compared with the original YOLOv5 model under the same experimental conditions in the homemade pig face data set, as shown in table 3. The experimental results show that the model introduced with the attention mechanism is superior to the original model in different degrees of accuracy, recall ratio and average accuracy, wherein the model introduced with CA is superior to the models using SE and CBAM in average accuracy and recall ratio. Further, CA is shown to be superior to SE, which considers only the internal channel information and ignores the importance of the location information, and also superior to CBAM, which introduces location information by global pooling on channels, but can only capture local information, but cannot obtain globally dependent information. Therefore, the invention introduces a CA attention mechanism, improves the anti-interference capability of the model and the extraction capability of the target characteristics, and achieves better pig face detection effect.
Table 3 comparison of performance with different attention mechanisms
Figure BDA0003541208470000112
The detection results before and after adding the CA attention module are compared with the detection results of the original Yolov5 shown in FIG. 9(a), and it can be seen that the detection omission condition exists for the slightly fuzzy face recognition of the live pigs. The detection effect diagram (b) of the CA attention module is introduced to the right side, so that the learning capability of the model on position information is effectively improved, the anti-interference capability of the model is improved, the missing rate is reduced compared with that of an original model, and the classification confidence is improved to a certain extent.
Thirdly, comparing the experimental result of the improved feature fusion with the detection result
To observe the effect of the improved feature fusion model on pig face detection more deeply, feature fusion using the lightweight general upsampling operator (CARAFE), Adaptive Spatial Feature Fusion (ASFF), and bipfn feature fusion were compared to the original YOLOv5 model using the same pig face dataset and the same experimental conditions, as shown in table 4 below. The experimental result shows that compared with the original edition YOLOv5 model, the accuracy, the average accuracy and the recall rate of the model adopting the lightweight general upsampling operator CARAFE, the model adopting the adaptive spatial feature fusion ASFF and the model adopting the BiFPN feature fusion are improved, but the average accuracy and the accuracy rate of the model adopting the BiFPN feature fusion are obviously superior to those of the former two improvements. Therefore, the BiFPN feature fusion model is adopted, the features of the high layer and the low layer extracted by the trunk network are used in a biased manner, the receptive field is improved, and a good data basis is provided for the model to improve the pig face detection effect.
TABLE 4 Performance contrast for improved feature fusion algorithms
Figure BDA0003541208470000121
The pair of detection results before and after adding the BiFPN feature fusion module is shown in fig. 10: the original YOLOv5 detection effect is shown on the left side in fig. 10(a), and the detection effect is shown on the right side in fig. 10(b) with the BiFPN feature fusion module. The BiFPN feature fusion module can correctly detect the pig face target before and after improvement, but the improved model utilizes high-level and low-level information more pertinently, so that the receptive field of the model is increased, the prediction frame of the model is more accurate, and the classification confidence coefficient is higher.
Fourth, ablation experiment
In order to verify the mixed usage of the improved module of the present invention, the pig face effect of the model test was performed, and the results are shown in table 5. The same pig face data set is adopted in the experiments, and the experimental conditions are consistent. Experimental results show that the accuracy, the recall rate and the average accuracy of the model with the improved K-Means clustering, the CA coordinate attention mechanism and the BiFPN feature fusion are improved to different degrees compared with the original model. The prior frame determined by improving the K-Means clustering can effectively improve the learning efficiency of the model to the target detection frame; the CA has long-term dependence on the position information and the channel relation, so that the efficiency of the model on position information learning is effectively improved, and the prediction effect is improved; and finally, BiFPN feature fusion is adopted, so that nodes with small feature fusion contribution are simplified, and the use of nodes with large contribution is enhanced by means of weighting and the like, so that the feature graphs of a low layer and a high layer are more effectively fused, the use efficiency of the model for the features is improved, and a better detection effect is achieved.
Further, as shown by the improved longitudinal comparison of table 5, the model 11 (the improved target detection algorithm YOLOv5 of the present invention) adopted by the present invention is superior in all aspects except that the recall rate is slightly lower than that of the model 4 introducing CA alone and that of the model 10 introducing CA and BiFPN in a mixed manner. On the average accuracy mAP, one improvement point ( model 1,4,7), two improvement points (model 8,9,10) and three improvement points (model 11) are independently introduced, and the lifting amplitudes of the improvement points are sequentially increased compared with the original edition YOLOv5 model, so that the three improved modes of the invention are more effectively explained that the improvement modes are not only independently effective to the original model but also positively correlated to the model detection effect by respectively overlapping and using. In summary, the model 11 improved at three places together still has the best effect, although the recall rate is lower than that of individual schemes, the optimal average accuracy rate of 95.5% and the optimal average accuracy rate of 92.6% in all the schemes are achieved, and the improvement is respectively 2.2% and 13.2% higher than that of the original YOLOv5, and further illustrates the feasibility of the improved algorithm. (the following tables "√" and "×" indicate the use and non-use of the improvement, respectively.)
TABLE 5 ablation experiment contrast table
Figure BDA0003541208470000131
Fifth, training results of model 11
The Loss change of the model 11 finally adopted by the invention in the training is shown in fig. 11, the abscissa Epoch is the training frequency, and the ordinate Loss is the Loss value. As can be seen from FIG. 11, the loss value decreases more rapidly when the Epoch is 0-15, the loss value decreases more smoothly when the Epoch is 15-75, and finally the algorithm loss decreases to be substantially stable when the Epoch is 75-99, the loss value converges to about 0.01, and no over-fitting or under-fitting phenomenon occurs in the training process.
Sixth, model 11 test results
Fig. 12 is a visualization of the detection results before and after the original YOLOv5 model is improved at three points K-Means, CA, and BiFPN, where the left side is the detection effect of the original YOLOv5 fig. 12(a), and the right side is the detection effect of the model 11 with three points introduced and improved together fig. 12 (b). It can be seen that the original YOLOv5 detects missing detection in the dense and blocked live pig face. The improved model 11 can detect the pig4 with occlusion and smaller target in the dense live pig sample environment, so that the missing rate of the model is effectively reduced, and the classification confidence is improved. Therefore, the improved model has strong generalization capability in the scene with smaller, dense and sheltered targets.
In order to better analyze the advantages and disadvantages of the improved algorithm of the present invention, the introduced other improved algorithms were respectively tested under the same data set and test conditions, as shown in table 6 below. The experimental results show that the model 11 adopted by the invention is in the lead in other aspects except that the model is slightly lower in accuracy than the model 5, and the recall ratio is slightly lower than the model 3. The average accuracy rate of the method is obviously better than that of other improved models, and the method is improved by 2% compared with the model 2 with the lowest performance and is improved by 1.7% compared with the model 6 with the highest performance. Further illustrates the feasibility of the algorithm applied to pig face identification detection.
TABLE 6 comparison with other pig face detection algorithm Performance
Figure BDA0003541208470000141
The invention is improved based on the YOLOv5 algorithm. The method specifically adopts a K-Means clustering algorithm with the distance changed to 1-IOU to determine a prior frame, introduces an attention mechanism CA to a backsbone module of the model, provides a new feature fusion BiFPN on a Neck module of the model, and is applied to pig face identification detection. In the experimental environment of the invention and the pig face data set, not only the comparison test was performed on the individual improvement points, but also the comparison test was performed on the algorithms of the multiple improvement points. The improved model 11 is optimal, and the mAP reaches 0.955 which is improved by 2.2% compared with the original algorithm. Therefore, the model 11 improved by the experiment improves the pig face identification precision to a certain extent, and a feasible technical scheme is provided for individual management of live pigs.
The above-mentioned embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, and any simple modifications or equivalent substitutions of the technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (9)

1. A target recognition detection method based on improved YOLOv5 is characterized by comprising the following steps:
acquiring target image sample data and constructing a sample data set;
carrying out capacity expansion processing on the sample data set to obtain a data set to be identified;
the method for improving the target detection algorithm YOLOv5 to obtain the improved target detection algorithm YOLOv5 specifically comprises the following steps:
changing the Euclidean distance of a K-Means dimension clustering algorithm K-Means to 1-IOU, determining a priori anchor frame by adopting the K-Means algorithm, and optimizing a target anchor frame of a target detection algorithm YOLOv 5;
introducing a coordinated attention mechanism CA in a backbone network of a target detection algorithm YOLOv 5;
bidirectional cross-scale connection of a BiFPN improved target detection algorithm YOLOv5 is adopted, and weighted feature fusion is carried out to obtain an improved target detection algorithm YOLOv 5;
and identifying the image information in the data set to be identified by using an improved target detection algorithm YOLOv5 to obtain an identification result.
2. The improved YOLOv 5-based target recognition detection method of claim 1, wherein the acquiring target image sample data specifically comprises: the camera is controlled to rotate through the remote control system, time-sharing collection is achieved, and image sample data containing different characteristics are obtained.
3. The improved YOLOv 5-based target recognition detection method according to claim 1, wherein the sample data set is subjected to capacity expansion processing, and the method comprises the following steps:
randomly cutting, randomly offsetting and enhancing Mosaic data of the sample data set, manually marking an image frame by using a picture marking tool labelImg, assigning a label name to the image frame, and storing the image frame, wherein the stored XML file comprises the target frame coordinate and the category information of the target image;
dividing a data set after a target label into a training set and a test set sample;
the Mosaic data enhancement is to piece together a plurality of experimental picture images in a training set for training an improved target detection algorithm YOLOv 5.
4. The improved YOLOv 5-based target recognition detection method of claim 3, wherein the sample data set is filtered and integrated before the capacity expansion process is performed on the sample data set.
5. The method for detecting target recognition based on improved YOLOv5 of claim 1, wherein the decomposing the global pooling mode into two one-dimensional feature codes when introducing a coordinated attention mechanism CA into the backbone network of the target detection algorithm YOLOv5 specifically comprises:
firstly, an input image X is given, and each channel is coded along a horizontal coordinate and a vertical coordinate respectively by using average pooling with the sizes of (H,1) and (1, W), wherein H is the coordinate height, and W is the coordinate width;
the output of the c-th channel with height h and width w is expressed as follows:
Figure FDA0003541208460000021
Figure FDA0003541208460000022
in the formula, i and w are respectively the variation of width and height, and the 2 transformations respectively aggregate characteristics along two spatial directions to obtain a pair of direction-sensing characteristic diagrams;
after transformation in information embedding, the height z of the outputhAnd width zwPerforming a splicing operation and convolving F by 1 × 11And calculating to generate a feature map of the spatial information in the vertical and horizontal directions, wherein the formula is as follows:
f=δ(F1([zh,zw]))
then f is decomposed into tensors f along the spatial informationhAnd tensor fw(ii) a Where δ is a coefficient, for FhAnd FwPerforming 1 × 1 convolution transformation to respectively convert fhAnd fwTransformed into a tensor with the same number of channels, the formula is as follows:
Figure FDA0003541208460000023
Figure FDA0003541208460000024
in the above formula
Figure FDA0003541208460000025
Is sigmoidActivating the function, and simultaneously reducing the number of channels of f by a suitable reduction ratio r,
finally, g is addedhAnd gwPerforming an expansion operation as attention weights respectively, and taking the following formula as an output:
Figure FDA0003541208460000026
6. the method for target recognition and detection based on the improved YOLOv5 of claim 1, wherein the bidirectional cross-scale connection with the BiFPN improved target detection algorithm YOLOv5 and the weighted feature fusion are performed, specifically comprising:
deleting nodes which do not meet the standard in two non-adjacent fusion feature networks, namely nodes which only have one input edge and do not have feature fusion;
adding an additional edge from an original input to an output node between the two non-adjacent fused feature networks;
a pair of paths is treated as one feature layer and then repeated multiple times to get more high-level feature fusions.
7. The improved YOLOv 5-based object recognition detection method according to claim 6, wherein additional weight is added in the high-level feature fusion through fast normalized fusion.
8. The method for detecting target recognition based on improved YOLOv5 of claim 1, wherein the recognizing the image information in the data set to be recognized by using an improved target detection algorithm YOLOv5 specifically comprises:
inputting the test set sample into an improved target detection algorithm YOLOv5, and detecting a target image through a target anchor frame to obtain a target frame;
extracting characteristic information in the target frame through a backbone network;
and performing weighted feature fusion on the feature information to obtain an identification result.
9. The improved YOLOv 5-based target recognition detection method according to claim 8, wherein before the recognition of the data to be recognized by the improved target detection algorithm YOLOv5, the improved target detection algorithm YOLOv5 is evaluated, and the evaluation index comprises: recall rate Recall, R for short; precision, P for short; average precision AP and average precision MAP of the average value of AP value of all sorts;
wherein, TP is the correct target detection number, FN is the target number of missed detection, FP is the target number of virtual detection, and the specific formula is as follows:
Figure FDA0003541208460000031
Figure FDA0003541208460000032
Figure FDA0003541208460000033
CN202210240265.3A 2022-03-10 2022-03-10 Improved YOLOv 5-based target recognition detection method Pending CN114627502A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210240265.3A CN114627502A (en) 2022-03-10 2022-03-10 Improved YOLOv 5-based target recognition detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210240265.3A CN114627502A (en) 2022-03-10 2022-03-10 Improved YOLOv 5-based target recognition detection method

Publications (1)

Publication Number Publication Date
CN114627502A true CN114627502A (en) 2022-06-14

Family

ID=81902900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210240265.3A Pending CN114627502A (en) 2022-03-10 2022-03-10 Improved YOLOv 5-based target recognition detection method

Country Status (1)

Country Link
CN (1) CN114627502A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115119766A (en) * 2022-06-16 2022-09-30 天津农学院 Sow oestrus detection method based on deep learning and infrared thermal imaging
CN115205568A (en) * 2022-07-13 2022-10-18 昆明理工大学 Road traffic multi-factor detection method with multi-scale feature fusion
CN115272828A (en) * 2022-08-11 2022-11-01 河南省农业科学院农业经济与信息研究所 Intensive target detection model training method based on attention mechanism
CN115471729A (en) * 2022-11-03 2022-12-13 青岛科技大学 Improved YOLOv 5-based ship target identification method and system
CN115909225A (en) * 2022-10-21 2023-04-04 武汉科技大学 OL-YoloV5 ship detection method based on online learning
CN116229376A (en) * 2023-05-06 2023-06-06 山东易视智能科技有限公司 Crowd early warning method, counting system, computing device and storage medium

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115119766A (en) * 2022-06-16 2022-09-30 天津农学院 Sow oestrus detection method based on deep learning and infrared thermal imaging
CN115119766B (en) * 2022-06-16 2023-08-18 天津农学院 Sow oestrus detection method based on deep learning and infrared thermal imaging
CN115205568A (en) * 2022-07-13 2022-10-18 昆明理工大学 Road traffic multi-factor detection method with multi-scale feature fusion
CN115205568B (en) * 2022-07-13 2024-04-19 昆明理工大学 Road traffic multi-element detection method based on multi-scale feature fusion
CN115272828A (en) * 2022-08-11 2022-11-01 河南省农业科学院农业经济与信息研究所 Intensive target detection model training method based on attention mechanism
CN115909225A (en) * 2022-10-21 2023-04-04 武汉科技大学 OL-YoloV5 ship detection method based on online learning
CN115909225B (en) * 2022-10-21 2024-07-02 武汉科技大学 OL-YoloV ship detection method based on online learning
CN115471729A (en) * 2022-11-03 2022-12-13 青岛科技大学 Improved YOLOv 5-based ship target identification method and system
CN115471729B (en) * 2022-11-03 2023-08-04 青岛科技大学 Ship target identification method and system based on improved YOLOv5
CN116229376A (en) * 2023-05-06 2023-06-06 山东易视智能科技有限公司 Crowd early warning method, counting system, computing device and storage medium
CN116229376B (en) * 2023-05-06 2023-08-04 山东易视智能科技有限公司 Crowd early warning method, counting system, computing device and storage medium

Similar Documents

Publication Publication Date Title
CN114627502A (en) Improved YOLOv 5-based target recognition detection method
CN112380952B (en) Power equipment infrared image real-time detection and identification method based on artificial intelligence
CN108108657B (en) Method for correcting locality sensitive Hash vehicle retrieval based on multitask deep learning
CN112633382B (en) Method and system for classifying few sample images based on mutual neighbor
CN110765865B (en) Underwater target detection method based on improved YOLO algorithm
CN110287798B (en) Vector network pedestrian detection method based on feature modularization and context fusion
CN113011288A (en) Mask RCNN algorithm-based remote sensing building detection method
CN110781882A (en) License plate positioning and identifying method based on YOLO model
CN114943893B (en) Feature enhancement method for land coverage classification
CN103353941B (en) Natural marker registration method based on viewpoint classification
CN112862849A (en) Image segmentation and full convolution neural network-based field rice ear counting method
CN111476314B (en) Fuzzy video detection method integrating optical flow algorithm and deep learning
CN114781514A (en) Floater target detection method and system integrating attention mechanism
CN111695373A (en) Zebra crossing positioning method, system, medium and device
CN114155556B (en) Human body posture estimation method and system based on stacked hourglass network added with channel shuffling module
CN116977937A (en) Pedestrian re-identification method and system
CN111709317A (en) Pedestrian re-identification method based on multi-scale features under saliency model
CN111767826A (en) Timing fixed-point scene abnormity detection method
CN112084913B (en) End-to-end human body detection and attribute identification method
CN117292324A (en) Crowd density estimation method and system
CN113065400A (en) Invoice seal detection method and device based on anchor-frame-free two-stage network
CN112183287A (en) People counting method of mobile robot under complex background
CN112418262A (en) Vehicle re-identification method, client and system
CN111832508A (en) DIE _ GA-based low-illumination target detection method
Feng et al. High-efficiency progressive transmission and automatic recognition of wildlife monitoring images with WISNs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination