CN114627502A

CN114627502A - Improved YOLOv 5-based target recognition detection method

Info

Publication number: CN114627502A
Application number: CN202210240265.3A
Authority: CN
Inventors: 李广博; 查文文; 焦俊; 陈成鹏; 辜丽川; 时国龙; 马慧敏; 陶亮; 彭硕
Original assignee: Anhui Agricultural University AHAU
Current assignee: Anhui Agricultural University AHAU
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2022-06-14

Abstract

The invention provides a target identification detection method based on improved YOLOv5, belonging to the field of target detection and comprising the following steps: acquiring target image sample data and constructing a sample data set; carrying out capacity expansion processing on the sample data set to obtain a data set to be identified; the method for improving the target detection algorithm YOLOv5 to obtain the improved target detection algorithm YOLOv5 specifically comprises the following steps: optimizing a target anchor frame, and adding a coordinated attention mechanism CA and a feature fusion BiFP; and identifying the image information in the data set to be identified by using an improved target detection algorithm YOLOv5 to obtain an identification result. The method adopts an improved target detection algorithm YOLOv5, so that the individual identification accuracy of the live pigs under the general condition is improved, and the detection performance under the situations of dense live pigs and remote small targets is improved.

Description

Improved YOLOv 5-based target recognition detection method

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a target identification detection method based on improved YOLOv 5.

Background

With the development of modern breeding industry, the production management of live pigs is increasingly benefited, systematized and intelligentized, and the accurate identification of individual live pigs is an important part of the production management. The traditional pig individual identification mainly comprises modes of painting, branding, ear tag wearing, Radio Frequency Identification (RFID) and the like. The problems of label falling, infection and the like exist in the processes of painting pigment, branding and wearing the ear tag, and the production management of the live pigs is not facilitated. Radio frequency identification is costly and the signal is susceptible to interference from metallic substances.

With the development of machine learning in recent years, the modern breeding industry gradually adopts a neural network to carry out non-invasive identification on individual pigs. The neural network applied to pig face recognition mainly comprises the following components: marsort and the like build a self-adaptive pig face recognition method based on a convolutional neural network, and the accuracy rate reaches 83%; the pig face recognition based on the improved YOLOv3 is proposed by Tong et al, and the accuracy rate reaches 90.12%; hansen et al propose CNN models based on structures such as convolution, maximum pooling and close connection, and improve the pig face recognition effect. The method for recognizing the face posture of the live pig by improving a Tiny-YOLO model is provided by Yanhong, and the like, and the accuracy rate reaches 82.38%; eric T.Psota and the like establish a full convolution neural network to carry out example segmentation on live pigs, and the accuracy rate reaches 91%. These non-invasive methods represent a welfare for live pig production, but there is a need for further improvement in recognition accuracy.

Therefore, the invention provides a target recognition detection method based on improved YOLOv 5.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a target identification detection method based on improved YOLOv 5.

In order to achieve the above purpose, the invention provides the following technical scheme:

a target recognition detection method based on improved YOLOv5 comprises the following steps:

acquiring target image sample data and constructing a sample data set;

carrying out capacity expansion processing on the sample data set to obtain a data set to be identified;

the method for improving the target detection algorithm YOLOv5 to obtain the improved target detection algorithm YOLOv5 specifically comprises the following steps:

changing the Euclidean distance of a K-Means dimension clustering algorithm K-Means to 1-IOU, determining a priori anchor frame by adopting the K-Means algorithm, and optimizing a target anchor frame of a target detection algorithm YOLOv 5;

introducing a coordinated attention mechanism CA in a backbone network of a target detection algorithm YOLOv 5;

bidirectional cross-scale connection of a BiFPN improved target detection algorithm YOLOv5 is adopted, and weighting characteristic fusion is carried out to obtain an improved target detection algorithm YOLOv 5;

and identifying the image information in the data set to be identified by using an improved target detection algorithm YOLOv5 to obtain an identification result.

Preferably, the acquiring the target image sample data specifically includes: the camera head is controlled to rotate through the remote control system, time-sharing collection is achieved, and image sample data containing different characteristics are obtained.

Preferably, the expanding the capacity of the sample data set includes the following steps:

randomly cutting, randomly offsetting and enhancing Mosaic data of the sample data set, manually marking an image frame by using a picture marking tool labelImg, assigning a label name to the image frame, and storing the image frame, wherein the stored XML file comprises the target frame coordinate and the category information of the target image;

dividing a data set after a target label into a training set and a test set sample;

the Mosaic data enhancement is to piece together a plurality of experimental picture images in a training set for training an improved target detection algorithm YOLOv 5.

Preferably, the sample data set is screened and integrated before the capacity expansion processing is performed on the sample data set.

Preferably, when a coordinated attention mechanism CA is introduced into the backbone network of the target detection algorithm YOLOv5, the global pooling mode is decomposed and converted into two one-dimensional feature codes, which specifically includes:

firstly, an input image X is given, and each channel is coded along a horizontal coordinate and a vertical coordinate respectively by using average pooling with the sizes of (H,1) and (1, W), wherein H is the coordinate height, and W is the coordinate width;

the output of the c-th channel with height h and width w is expressed as follows:

in the formula, i and w are respectively the variation of width and height, and the 2 transformations respectively aggregate characteristics along two spatial directions to obtain a pair of direction-sensing characteristic diagrams;

after transformation in information embedding, the height z of the output^hAnd width z^wPerforming a splicing operation by1 × 1 convolution F₁And calculating to generate a feature map of the spatial information in the vertical and horizontal directions, wherein the formula is as follows:

f＝δ(F₁([z^h,z^w]))

then f is decomposed into tensors f along the spatial information^h∈R^C/r×HAnd tensor f^w∈R^C/r×W(ii) a Where δ is the coefficient, r is the reduction rate used to control the sample size, and then for F_hAnd F_wPerforming a1 × 1 convolution transform on each of the f^hAnd f^wTransformed into a tensor with the same number of channels, the formula is as follows:

in the above formula

Is a sigmoid activation function, while reducing the number of channels of f by a suitable reduction ratio r,

finally, g is added^hAnd g^wAn expansion operation is performed, as attention weights, respectively, with the following as output:

preferably, the performing the bidirectional cross-scale connection by using the BiFPN improved target detection algorithm YOLOv5 and performing the weighted feature fusion specifically includes:

deleting nodes which do not meet the standard in two non-adjacent fusion feature networks, namely nodes which only have one input edge and do not have feature fusion;

adding an additional edge from an original input to an output node between the two non-adjacent fused feature networks;

a pair of paths is treated as one feature layer and then repeated multiple times to get more high-level feature fusions.

Preferably, additional weights are added in the high-level feature fusion by fast normalized fusion.

Preferably, the identifying the image information in the data set to be identified by using the improved target detection algorithm YOLOv5 specifically includes:

inputting the test set sample into an improved target detection algorithm YOLOv5, and detecting a target image through a target anchor frame to obtain a target frame;

extracting feature information in the target frame through a backbone network;

and performing weighted feature fusion on the feature information to obtain an identification result.

Preferably, before the improved target detection algorithm YOLOv5 is used for identifying the data to be identified, the improved target detection algorithm YOLOv5 is evaluated, and the evaluation index includes: recall, abbreviated as R; precision, P for short; average precision AP and average precision MAP of the AP value of all categories;

wherein, TP is the correct target detection number, FN is the target number of missed detection, FP is the target number of virtual detection, and the specific formula is as follows:

the target identification detection method based on the improved YOLOv5 provided by the invention has the following beneficial effects:

the method comprises the steps of firstly, changing the Euclidean distance of a K-Means cluster into 1-IOU, and improving the adaptability of a model target frame; then, a coordinate attention mechanism is introduced into the backbone network so as to more effectively learn the characteristics of the small target and the target position; and finally, BiFPN feature fusion is introduced in a neck improved feature fusion mode, so that the model receptive field is enlarged, and the multi-scale learning of multiple interference targets is enhanced. Therefore, the individual identification accuracy of the live pigs under the general condition is improved, and the detection performance under the situations of dense live pigs and long-distance small targets is also improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention and the design thereof, the drawings required for the embodiments will be briefly described below. The drawings in the following description are only some embodiments of the invention and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a flowchart of an improved YOLOv 5-based target recognition detection method according to embodiment 1 of the present invention;

FIG. 2 is a diagram of a YOLOv5s network architecture;

FIG. 3 is a diagram of a coordinate attention mechanism;

FIG. 4 is a schematic diagram of a module for attention to CA;

FIG. 5 is a PANET and BiFPN feature fusion diagram;

FIG. 6 is a BiFPN feature fusion module flow;

FIG. 7 is an improved BiFPN feature fusion module;

FIG. 8 is a pig face data set annotation interface;

FIG. 9 is a comparison of the detection effects before and after introduction of the CA module;

FIG. 10 is a comparison of the before and after detection effects of the improved BiFPN feature fusion module;

FIG. 11 is a Loss plot;

fig. 12 is a comparison graph of the results.

Detailed Description

In order that those skilled in the art will better understand the technical solutions of the present invention and can practice the same, the present invention will be described in detail with reference to the accompanying drawings and specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

Example 1

The invention provides a target identification and detection method based on improved YOLOv5, which takes a live pig in Jing Ming pig farm of Meng city, Anhui as a detection object to identify and detect facial information of the live pig, and specifically as shown in figure 1, the method comprises the following steps:

step 1, collecting target image sample data and constructing a sample data set;

the collection tool is a Rouzen C920Pro camera. In order to better collect the facial information of the live pig, a system for remotely controlling the collecting device is established in the experiment, the camera of the collecting device can be remotely controlled to rotate through the control system, and the acquisition can be carried out in a time-sharing mode under the condition of sufficient light, so that the facial information of the live pig with different characteristics can be obtained, and image sample data containing different characteristics can be obtained. A total of 2126 sample images were acquired with a resolution of 1920 pixels by 1080 pixels.

Step 2, screening and integrating the sample data to obtain a sample data set;

in order to ensure that a good data set is obtained, the invention firstly screens and integrates the acquired data to obtain a sample data set.

Step 3, carrying out capacity expansion processing on the sample data set to obtain data to be identified, which specifically comprises the following steps:

and (3) randomly cutting, randomly shifting, performing Mosaic and other data enhancement on the sample data set to expand the experimental sample to 6378 samples, wherein the Mosaic data enhancement is to splice four experimental pictures into one for training, so that the capability of the model for detecting the small target is improved to a certain extent. The live pig data set in the experiment is 5 live pig individuals, the labelImg manual frame is used, the tag names are assigned, the numbers are respectively pig1 and pig2 … … pig5, and the sample division ratio of the training set to the test set is about 9: 1. the stored XML file includes the target frame coordinates and the category information of the sample image, and the annotation interface is shown in fig. 8.

Step 4, improving a target detection algorithm YOLOv5 to obtain an improved target detection algorithm YOLOv 5;

the principle of the YOLOv5 algorithm is first described as follows:

the YOLOv5 target detection algorithm is a new generation algorithm which inherits essence of a YOLO series algorithm, and is improved to a different extent in weight files, reasoning time and training time compared with YOLOv3 and YOLOv 4. In the official code of Yolov5, a total of 4 versions of a given target detection network are four models of Yolov5s, Yolov5m, Yolov5l and Yolov5 x. The four models are deepened and widened on the basis of Yolov5 s. Considering that pig face identification is applied to projects, the invention selects a lightweight network Yolov5s, and the structure of the lightweight network Yolov5s is mainly divided into four parts, namely an Input end, a Backbone network of a Backbone, a neutral network and a Prediction output end.

The Input end carries out unified processing on the pictures of the Input model mainly through three modes of Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture size.

The Backbone network of the backhaul mainly has the function of extracting features, and mainly comprises three modules, namely Focus, BottleneckCSP and spatial pyramid pooling SPP. The Focus module periodically extracts pixel points from an input picture to be reconstructed into a low-resolution image, namely, four adjacent positions of the image are stacked, so that the reception field of each point is improved, the loss of original information is reduced, the calculation amount is reduced, and the speed is increased. The BottleneckCSP module mainly comprises two parts of Bottleneck and CSP, and effectively reduces the increase speed of calculated quantity. SPP modules adopt 5/9/13 maximum pooling respectively, and then Concat fusion is carried out, so that the receptive field is improved.

PANET in the Neck network is based on the Mask R-CNN and FPN frameworks, enhances information dissemination, and has the capability of accurately retaining spatial information, which helps properly position pixels to form masks.

The output end of the Prediction is mainly a detection part, which applies an anchor box on the feature diagram and generates a final vector of classification probability, confidence and a target anchor box, and a complete Yolov5 model diagram is shown in FIG. 2.

The improvement of the target detection algorithm YOLOv5 comprises the following improvement steps:

step 4.1, optimizing the target anchor frame

The size of the target detection prior anchor frame has a great effect on target detection, and the network learning detector is accurate by selecting a proper prior anchor frame. There are two main ways to determine the prior anchor frame: empirical determination and clustering determination. YOLOv5 adopts a K-Means dimension clustering (K-Means) mode to determine a prior anchor frame, and the Euclidean distance used by a K-Means clustering algorithm is changed into 1-IOU (bboxes, anchors) as a determination mode of the prior anchor frame. In order to prove the feasibility of the improved model, the Avg IOU is introduced as an evaluation index, namely the average value of the maximum IOU of the prior frame and the actual target frame is obtained, and the larger the Avg IOU is, the better the obtained prior frame is.

The experimental hardware of the invention adopts 16G memory, NVIDIA GeForce RTX 2080Ti display cards, Intel i79700F 3.0.0 GHz and memory 16G processors, the software adopts Pythrch 1.7.1 and CUDA11.1, the experimental samples all use self-made pig face data sets, the total number of the experimental samples is 2126, the number of prior frames is 9, and the like. The prior boxes are designated as Cluster SSE using hand-designed AnchorBoxes, Cluster using original K-Means clustering, and Cluster IOU using improved clustering.

TABLE 1 Avg IOU comparison

The experimental results of table 1 show that: in the determination of the prior frame, the original edition clustering algorithm is superior to the manual design method, the improved clustering algorithm is superior to the original edition clustering algorithm and the manual design method, and the Avg IOU is respectively improved by 6.1 percent and 2.1 percent. Therefore, the improved K-Means algorithm is superior to the prior frame determination method, has higher fitness and improves the effect of multi-scale pig face image detection of the model.

Step 4.2, introducing a coordinate attention mechanism CA

In order to select feature information which is more critical to the current task from the input picture information, the invention introduces attention mechanisms, and the attention mechanisms mainly comprise three types of space attention, channel attention and self-attention at present. Most Attention mechanisms are used for a deep neural network, the performance can be well improved, but the calculation overhead is hard to bear for a mobile network with a smaller model, so that SE (Squeeze-and-Excitation), CBAM (conditional Block Attention Module), CA (coding Attention) Attention mechanisms are mainly introduced for experiments, a novel and efficient Attention mechanism CA is finally adopted for coding the channel relation and the long-term dependence through accurate position information, and the efficiency is ensured while almost no additional calculation is carried out. The specific flow is shown in fig. 3.

In order to enable the attention module to capture feature information with precise positions, the traditional global pooling mode is decomposed and converted into two one-dimensional feature codes. Specifically, first given an input X, each channel is encoded along a horizontal and vertical coordinate using an average pooling of sizes (H,1) and (1, W), respectively, H being the coordinate height and W being the coordinate width. Thus, the output of the c-th channel with height h and width w can be expressed as follows:

wherein i and w are the variation of width and height, respectively, and the 2 transformations aggregate features along two spatial directions to obtain a pair of direction-sensing feature maps. Meanwhile, the attention module is allowed to capture the long-term dependence along one spatial direction and store accurate position information along the other spatial direction, which is helpful for a network to eliminate the interference of picture background and more accurately locate an interested target.

After transformation in information embedding, the height and width z of the output^h,z^wTo carry outSplicing operation and convolution of F by 1X 1₁And calculating to generate a feature map of the spatial information in the vertical and horizontal directions, wherein the formula is as follows: .

f＝δ(F₁([z^h,z^w])) (3)

Then f is decomposed into tensors f along the spatial information^h∈R^C/r×HAnd tensor f^w∈R^C/r×W. Where δ is a coefficient and r is the reduction rate used to control the sample size. For F again_hAnd F_wPerforming a1 × 1 convolution transform on each of the f^hAnd f^wTransformed into a tensor with the same number of channels, the formula is as follows:

in the above formula

Is the sigmoid activation function. Meanwhile, the number of channels of f is reduced by a proper reduction ratio r, and the calculation amount and complexity of the model are reduced. Finally, g is added^hAnd g^wAn expansion operation is performed, as attention weights, respectively, with the following as output:

the network flow before and after the main network introduces the CA attention mechanism is shown in figure 4.

Step 4.3, feature fusion BiFPN

After the neural network extracts features through the backbone network, the use of high-level features and low-level features is very critical to the lifting model in the aspect of pig face detection. The original edition Yolov5 uses bidirectional feature fusion using PANET (see fig. 5(a)) in feature fusion, and although the use and fusion of features are improved as a whole, it is not possible to perform fusion for features with a large contribution to sexual learning, and a large amount of parameters and calculations are also required.

Therefore, the invention introduces lightweight general upsampling operator (CARAFE) feature fusion, Adaptive Spatial Feature Fusion (ASFF) and BiFPN feature fusion to carry out experiments, finally selects BiFPN (as shown in figure 5(b)) with better effect to improve bidirectional cross-scale connection, and carries out weighted feature fusion. The specific flow is shown in fig. 6. Firstly, deleting nodes contributing less in the fusion feature network in P3 and P5, namely nodes which only have one input edge and do not have feature fusion; then add extra edges from the original input to the output node at P4 to fuse more features without adding too much cost; and finally, regarding a pair of paths as a feature layer, and repeating the steps for multiple times to obtain more high-level feature fusion. Since different input features have different resolutions, and the different resolutions contain different semantic information, that is, the contribution amounts of the features are different, the invention adds additional weight through fast normalization fusion (as shown in the following formula 7), so that the network learns the importance of each feature layer through the weight. Therefore, the model has better performance in pig face detection. The network diagrams before and after the BiFPN feature fusion is improved are shown in FIG. 7.

In the above formula, w_iIs obtained by adding more than or equal to 0 to each w_iAfter that, Relu is applied to ensure that e ═ 0.0001 is a small value to avoid instability of the value.

The experiment of the invention is realized by adopting a display card of a 16G memory and NVIDIA GeForce RTX 2080Ti, an Intel i79700F 3.0.0 GHz and a processor of a memory 16G under the environment of Pythrch 1.7.1 and CUDA 11.1.

The invention obtains the initialization weight of a pre-training model by training on a large data set COCO, optimizes the overall target by using an SGD optimizer, the training batch is 16, the learning rate is set to be 0.01, and the model iterates for 100 times. The image sizes used by the model were 640 x 640.

The improved target detection algorithm YOLOv5 is evaluated before the recognition of the data to be recognized is performed with the improved target detection algorithm YOLOv 5. The invention uses the evaluation indexes commonly used in deep learning: recall rate Recall, R for short; precision, P for short; average precision AP (average precision) and average precision map (mean average precision) of the AP values for all classes. Wherein TP (true presenting) is the correct number of pig faces detected, FN (false negatives) is the number of missed pig faces detected, and FP (false presenting) is the number of false pig faces detected. The specific formula is as follows:

step 5, identifying the data to be identified by using an improved target detection algorithm YOLOv5 to obtain an identification result, wherein the identification result comprises the following steps: inputting the test set sample into an improved target detection algorithm YOLOv5, and detecting a target image through a target anchor frame to obtain a target frame; extracting characteristic information in the target frame through a backbone network; and performing weighted feature fusion on the feature information to obtain an identification result.

Results and analysis of the experiments

First, improving the experimental contrast of the target Anchor frame

In order to further measure the performance of the optimization target anchor frame improvement on pig face identification detection, the same experimental device and parameters are adopted for carrying out experiments under a self-made pig face data set. Specifically, as shown in table 2, "+" indicates that the model adds the module (this is meant for each table below). The experimental result shows that compared with the original yollov 5 model, the model 1 after determining the target anchor frame by K-Means clustering is improved to different degrees in accuracy, recall rate and average accuracy rate. Meanwhile, support is provided for the improvement of the improved YOLOv5 model in multi-scale pig face detection.

TABLE 2 comparison of Performance before and after optimization of target Anchor Frames

Second, comparing the experimental results with the detection results

In order to better analyze the effect of the attention mechanism on the pig face detection, the model of introducing the CA, SE, CBAM attention mechanism in YOLOv5 was compared with the original YOLOv5 model under the same experimental conditions in the homemade pig face data set, as shown in table 3. The experimental results show that the model introduced with the attention mechanism is superior to the original model in different degrees of accuracy, recall ratio and average accuracy, wherein the model introduced with CA is superior to the models using SE and CBAM in average accuracy and recall ratio. Further, CA is shown to be superior to SE, which considers only the internal channel information and ignores the importance of the location information, and also superior to CBAM, which introduces location information by global pooling on channels, but can only capture local information, but cannot obtain globally dependent information. Therefore, the invention introduces a CA attention mechanism, improves the anti-interference capability of the model and the extraction capability of the target characteristics, and achieves better pig face detection effect.

Table 3 comparison of performance with different attention mechanisms

The detection results before and after adding the CA attention module are compared with the detection results of the original Yolov5 shown in FIG. 9(a), and it can be seen that the detection omission condition exists for the slightly fuzzy face recognition of the live pigs. The detection effect diagram (b) of the CA attention module is introduced to the right side, so that the learning capability of the model on position information is effectively improved, the anti-interference capability of the model is improved, the missing rate is reduced compared with that of an original model, and the classification confidence is improved to a certain extent.

Thirdly, comparing the experimental result of the improved feature fusion with the detection result

To observe the effect of the improved feature fusion model on pig face detection more deeply, feature fusion using the lightweight general upsampling operator (CARAFE), Adaptive Spatial Feature Fusion (ASFF), and bipfn feature fusion were compared to the original YOLOv5 model using the same pig face dataset and the same experimental conditions, as shown in table 4 below. The experimental result shows that compared with the original edition YOLOv5 model, the accuracy, the average accuracy and the recall rate of the model adopting the lightweight general upsampling operator CARAFE, the model adopting the adaptive spatial feature fusion ASFF and the model adopting the BiFPN feature fusion are improved, but the average accuracy and the accuracy rate of the model adopting the BiFPN feature fusion are obviously superior to those of the former two improvements. Therefore, the BiFPN feature fusion model is adopted, the features of the high layer and the low layer extracted by the trunk network are used in a biased manner, the receptive field is improved, and a good data basis is provided for the model to improve the pig face detection effect.

TABLE 4 Performance contrast for improved feature fusion algorithms

The pair of detection results before and after adding the BiFPN feature fusion module is shown in fig. 10: the original YOLOv5 detection effect is shown on the left side in fig. 10(a), and the detection effect is shown on the right side in fig. 10(b) with the BiFPN feature fusion module. The BiFPN feature fusion module can correctly detect the pig face target before and after improvement, but the improved model utilizes high-level and low-level information more pertinently, so that the receptive field of the model is increased, the prediction frame of the model is more accurate, and the classification confidence coefficient is higher.

Fourth, ablation experiment

In order to verify the mixed usage of the improved module of the present invention, the pig face effect of the model test was performed, and the results are shown in table 5. The same pig face data set is adopted in the experiments, and the experimental conditions are consistent. Experimental results show that the accuracy, the recall rate and the average accuracy of the model with the improved K-Means clustering, the CA coordinate attention mechanism and the BiFPN feature fusion are improved to different degrees compared with the original model. The prior frame determined by improving the K-Means clustering can effectively improve the learning efficiency of the model to the target detection frame; the CA has long-term dependence on the position information and the channel relation, so that the efficiency of the model on position information learning is effectively improved, and the prediction effect is improved; and finally, BiFPN feature fusion is adopted, so that nodes with small feature fusion contribution are simplified, and the use of nodes with large contribution is enhanced by means of weighting and the like, so that the feature graphs of a low layer and a high layer are more effectively fused, the use efficiency of the model for the features is improved, and a better detection effect is achieved.

Further, as shown by the improved longitudinal comparison of table 5, the model 11 (the improved target detection algorithm YOLOv5 of the present invention) adopted by the present invention is superior in all aspects except that the recall rate is slightly lower than that of the model 4 introducing CA alone and that of the model 10 introducing CA and BiFPN in a mixed manner. On the average accuracy mAP, one improvement point (

model

1,4,7), two improvement points (model 8,9,10) and three improvement points (model 11) are independently introduced, and the lifting amplitudes of the improvement points are sequentially increased compared with the original edition YOLOv5 model, so that the three improved modes of the invention are more effectively explained that the improvement modes are not only independently effective to the original model but also positively correlated to the model detection effect by respectively overlapping and using. In summary, the model 11 improved at three places together still has the best effect, although the recall rate is lower than that of individual schemes, the optimal average accuracy rate of 95.5% and the optimal average accuracy rate of 92.6% in all the schemes are achieved, and the improvement is respectively 2.2% and 13.2% higher than that of the original YOLOv5, and further illustrates the feasibility of the improved algorithm. (the following tables "√" and "×" indicate the use and non-use of the improvement, respectively.)

TABLE 5 ablation experiment contrast table

Fifth, training results of model 11

The Loss change of the model 11 finally adopted by the invention in the training is shown in fig. 11, the abscissa Epoch is the training frequency, and the ordinate Loss is the Loss value. As can be seen from FIG. 11, the loss value decreases more rapidly when the Epoch is 0-15, the loss value decreases more smoothly when the Epoch is 15-75, and finally the algorithm loss decreases to be substantially stable when the Epoch is 75-99, the loss value converges to about 0.01, and no over-fitting or under-fitting phenomenon occurs in the training process.

Sixth, model 11 test results

Fig. 12 is a visualization of the detection results before and after the original YOLOv5 model is improved at three points K-Means, CA, and BiFPN, where the left side is the detection effect of the original YOLOv5 fig. 12(a), and the right side is the detection effect of the model 11 with three points introduced and improved together fig. 12 (b). It can be seen that the original YOLOv5 detects missing detection in the dense and blocked live pig face. The improved model 11 can detect the pig4 with occlusion and smaller target in the dense live pig sample environment, so that the missing rate of the model is effectively reduced, and the classification confidence is improved. Therefore, the improved model has strong generalization capability in the scene with smaller, dense and sheltered targets.

In order to better analyze the advantages and disadvantages of the improved algorithm of the present invention, the introduced other improved algorithms were respectively tested under the same data set and test conditions, as shown in table 6 below. The experimental results show that the model 11 adopted by the invention is in the lead in other aspects except that the model is slightly lower in accuracy than the model 5, and the recall ratio is slightly lower than the model 3. The average accuracy rate of the method is obviously better than that of other improved models, and the method is improved by 2% compared with the model 2 with the lowest performance and is improved by 1.7% compared with the model 6 with the highest performance. Further illustrates the feasibility of the algorithm applied to pig face identification detection.

TABLE 6 comparison with other pig face detection algorithm Performance

The invention is improved based on the YOLOv5 algorithm. The method specifically adopts a K-Means clustering algorithm with the distance changed to 1-IOU to determine a prior frame, introduces an attention mechanism CA to a backsbone module of the model, provides a new feature fusion BiFPN on a Neck module of the model, and is applied to pig face identification detection. In the experimental environment of the invention and the pig face data set, not only the comparison test was performed on the individual improvement points, but also the comparison test was performed on the algorithms of the multiple improvement points. The improved model 11 is optimal, and the mAP reaches 0.955 which is improved by 2.2% compared with the original algorithm. Therefore, the model 11 improved by the experiment improves the pig face identification precision to a certain extent, and a feasible technical scheme is provided for individual management of live pigs.

The above-mentioned embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, and any simple modifications or equivalent substitutions of the technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A target recognition detection method based on improved YOLOv5 is characterized by comprising the following steps:

acquiring target image sample data and constructing a sample data set;

bidirectional cross-scale connection of a BiFPN improved target detection algorithm YOLOv5 is adopted, and weighted feature fusion is carried out to obtain an improved target detection algorithm YOLOv 5;

2. The improved YOLOv 5-based target recognition detection method of claim 1, wherein the acquiring target image sample data specifically comprises: the camera is controlled to rotate through the remote control system, time-sharing collection is achieved, and image sample data containing different characteristics are obtained.

3. The improved YOLOv 5-based target recognition detection method according to claim 1, wherein the sample data set is subjected to capacity expansion processing, and the method comprises the following steps:

4. The improved YOLOv 5-based target recognition detection method of claim 3, wherein the sample data set is filtered and integrated before the capacity expansion process is performed on the sample data set.

5. The method for detecting target recognition based on improved YOLOv5 of claim 1, wherein the decomposing the global pooling mode into two one-dimensional feature codes when introducing a coordinated attention mechanism CA into the backbone network of the target detection algorithm YOLOv5 specifically comprises:

after transformation in information embedding, the height z of the output^hAnd width z^wPerforming a splicing operation and convolving F by 1 × 1₁And calculating to generate a feature map of the spatial information in the vertical and horizontal directions, wherein the formula is as follows:

f＝δ(F₁([z^h,z^w]))

then f is decomposed into tensors f along the spatial information^hAnd tensor f^w(ii) a Where δ is a coefficient, for F_hAnd F_wPerforming 1 × 1 convolution transformation to respectively convert f^hAnd f^wTransformed into a tensor with the same number of channels, the formula is as follows:

in the above formula

Is sigmoidActivating the function, and simultaneously reducing the number of channels of f by a suitable reduction ratio r,

finally, g is added^hAnd g^wPerforming an expansion operation as attention weights respectively, and taking the following formula as an output:

6. the method for target recognition and detection based on the improved YOLOv5 of claim 1, wherein the bidirectional cross-scale connection with the BiFPN improved target detection algorithm YOLOv5 and the weighted feature fusion are performed, specifically comprising:

7. The improved YOLOv 5-based object recognition detection method according to claim 6, wherein additional weight is added in the high-level feature fusion through fast normalized fusion.

8. The method for detecting target recognition based on improved YOLOv5 of claim 1, wherein the recognizing the image information in the data set to be recognized by using an improved target detection algorithm YOLOv5 specifically comprises:

extracting characteristic information in the target frame through a backbone network;

9. The improved YOLOv 5-based target recognition detection method according to claim 8, wherein before the recognition of the data to be recognized by the improved target detection algorithm YOLOv5, the improved target detection algorithm YOLOv5 is evaluated, and the evaluation index comprises: recall rate Recall, R for short; precision, P for short; average precision AP and average precision MAP of the average value of AP value of all sorts;