CN115471871A

CN115471871A - Sheldrake gender classification and identification method based on target detection and classification network

Info

Publication number: CN115471871A
Application number: CN202211159949.7A
Authority: CN
Inventors: 谢天宇; 蒋凯林; 王建军; 周蓓; 汪灵悦; 闫瑞; 袁嘉男; 李丹阳; 郑兴泽; 弓欣瑶; 刘扬; 刘芩利; 李焦; 冯凌
Original assignee: Sichuan Agricultural University
Current assignee: Sichuan Agricultural University
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2022-12-13

Abstract

The invention relates to a sheldrake gender classification and identification method based on a target detection and classification network, which comprises the following steps: constructing a data set, and labeling the whole sheldrake body in each image in the data set; preprocessing data, then carrying out self-adaptive picture scaling, and adjusting the proportion of positive and negative samples in target detection through Focal local; and performing target detection on the sheldrake head on the image by adding a Yolov7 network structure of a CBAM (CBAM attention machine) attention mechanism to obtain a sheldrake head image, screening and unifying the size of the sheldrake head image, inputting the sheldrake head image into a classification network, and performing gender classification on the sheldrake head image to obtain the gender ratio of the sheldrake. The intelligent duck growth tracking system can record the whole growth and sale process of each sheldrake through the computer database, realizes the traceability of sheldrake growth information, the traceability of duck meat sources and the recordable duck meat heading, and is favorable for constructing an intelligent industrial traceability system with traceable product sources, traceable heading and accountability in the future.

Description

Sheldrake gender classification and identification method based on target detection and classification network

Technical Field

The invention relates to the technical field of computer vision, in particular to a sheldrake gender classification and identification method based on a target detection and classification network.

Background

The moderate-scale breeding of the duck groups is beneficial to promoting the economic benefit of breeding and promoting the breeding efficiency, wherein the appropriate sex proportion can ensure higher fertilization rate and fertilization quality of hatching eggs; meanwhile, the sex ratio also has important influence on the population dynamics and the viability, so that for a duck farm, the number of the ducks in the duck farm is detected in real time, and the number of the drakes and the ducks is counted, so that the method is an important measure in the breeding process; the sheldrake is popular with the public because of fast growth, high laying rate, more edible parts and delicious meat quality, but at present, the statistics of the number of the drakes and the female ducks in the duck farm is mainly manually estimated, and the method has the following problems: 1. the working efficiency is low: the mode of mechanical counting by workers is time-consuming and labor-consuming; 2. the statistical binding accuracy is not high: since the duck flock is dynamic, it is difficult for workers to avoid repeated counting or missing counting.

In recent years, many novel methods have emerged in bird recognition and detection based on sensor technology and image processing technology. For example, walid Osamy and the like propose a clustering algorithm (CSOCA) based on chicken flock optimization to realize clustering of sensor nodes, so that the energy efficiency of a Wireless Sensor Network (WSN) is improved. Bastiaan A.Vroegneweij et al propose a simple pixel-based classification method based on spectral reflectance characteristics, using a robot to perform identification and detection of eggs, hens, housing elements and garbage. Bruna Caroline gernimo et al identified and classified chickens with wooden breasts using spectral information from the Computer Vision System (CVS) and Near Infrared (NIR) region. Geffen et al propose a model for estimating the number of layers based on Faster R-CNN and a tracking algorithm. Jos Eduardo Del Valle et al propose an index of instability model for estimating thermal comfort of poultry based on computer vision and image analysis. Leroy et al propose computer vision methods to quantify poultry behavior. Jeremy Lubich et al propose methods for identifying, classifying and counting poultry and eggs based on SSD models. Because the above methods mainly focus on the behavior of individuals in a population and the identification and detection of specific individuals and are qualitative identification and detection, the accuracy of identification and detection is not ideal.

Deep Learning (DL) is a branch of machine Learning research, is an extremely efficient method and has great potential, and is widely used in various fields. Deep learning has also produced a breakthrough in the poultry farming industry, for example, xiaolin Zhuang et al uses digital image processing techniques and deep learning to identify sick broilers in a flock, achieving an average accuracy of 99.7% and a running speed of 40 frames per second (fps). Cheng Fang et al used a Deep Neural Network (DNN) -based attitude estimation for the first time to analyze the behavior of broiler chickens, filling the gap of utilizing skeleton feature points to calculate chicken behavior. Haitao Pu et al propose a Convolutional Neural Network (CNN) -based method, and a Kinect sensor is used to identify chicken behavior, and the accuracy of identifying the lameness of the chicken is as high as 99.17%. The research results show that the accuracy and the effectiveness of deep learning in the aspects of poultry identification and detection are high, so that the deep learning has a high application value in poultry breeding. However, the research on sheldrake is very little all over the world, and a high-efficiency and accurate identification and detection method is urgently needed to be provided to promote the development of sheldrake breeding industry.

It is noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a sheldrake gender classification and identification method based on target detection and classification network, and solves the defects of the existing duck group drakes and ducks statistics.

The purpose of the invention is realized by the following technical scheme: a sheldrake gender classification and identification method based on a target detection and classification network comprises the following steps:

constructing a data set, and labeling the whole sheldrake body in each image in the data set according to a bounding box labeling rule;

preprocessing the data by methods of Mixup data enhancement, mosaic data enhancement and HSV color space enhancement, and adjusting the proportion of positive and negative samples in target detection by Focal local;

and improving a Yolov7 network structure by using a CBAM (cone beam-based active mechanical system), performing target detection on the head of the sheldrake on the processed image by adding the Yolov7 network structure of the CBAM active mechanical system to obtain a sheldrake head image, screening and unifying the size of the sheldrake head image, inputting the sheldrake head image into a VovNet-27 slim classification network, and performing gender classification on the sheldrake head image by using the classification network to obtain the gender proportion of the sheldrake.

The Mixup data enhancement comprises: fusing images among different classes, constructing a new training sample and a new label in a linear interpolation mode, thereby expanding a training data set, and processing the data label through the following formula:

wherein (x) _i ,y _i ) And (x) _j ,y _j ) Two sets of data pairs represent training samples in the raw data set and their corresponding labels,

representing the training samples after feeding the Mixup data enhancement operation,

to represent

And the corresponding label, lambda represents a mixing coefficient calculated by the Beta distribution with the parameters of a and b, and a and b respectively represent the fusion ratio of two different types of images.

The Mosaic data enhancement comprises:

randomly extracting a batch of image data from the sheldrake data set enhanced by the Mixup data, randomly selecting 4 images from the image data set, and randomly zooming and randomly distributing the images to form a new image;

and reselecting image data of different batches of sizes, and randomly selecting 4 images from the image data to randomly zoom and randomly distribute to splice into a new image to obtain the data enhanced by the Mosaic data.

The adaptive picture scaling comprises:

calculating a scaling ratio: setting the size of the original picture to be x ₁ ×y ₁ The original scaled picture is x ₂ ×y ₂ Dividing the length and width correspondence of the original zoom picture by the length and width of the original picture to obtain two zoom coefficients of alpha and beta, and selecting the small zoom coefficient phi = min (alpha, beta), wherein x is ₂ /x ₁ ＝α，y ₂ /y ₁ ＝β；

The scaled size is calculated: multiplying the length and width of the original picture by a smaller scaling coefficient alpha to obtain the size x of the expected scaling picture ₃ ×y ₃ Wherein x is ₁ *φ＝x ₃ ，y ₁ *φ＝y ₃ ；

Calculating a black edge filling value: subtracting the width of the expected zoom picture from the width of the original zoom picture to obtain the width m of the original black border to be filled ₁ (ii) a The black edge width is then subtracted by a divisor of 128 to obtain the desired black edge width m ₂ (ii) a Finally, dividing the width of the expected black edge by 2, namely, evenly dividing the filled black edge to two ends of the expected zoom picture, and dividing x ₁ ×y ₁ To x ₃ ×(y ₃ +m ₂ )。

The calculation formula of the Focal local comprises the following steps:

FL(p _t )＝-α _t (1-p _t ) ^γ log(p _t )

improving network structure accuracy by adjusting the formula of loss, where p _t Is the classification probability of the different classes; gamma is a value larger than 0, called a modulation coefficient, and the weight of the samples which are easy to classify is reduced, so that the model concentrates more on the samples which are difficult to classify during training; alpha is alpha _t Is a [0,1 ]]Decimal between, by setting a _t The value of (3) controls the sharing weight of the positive and negative samples to the total loss, and a small value is taken to reduce the weight of the negative sample; gamma and alpha _t The weights of the difficult and easy classified samples can be controlled by the Focal loss, and the weights of the positive and negative samples can be adjusted by the combination of the both.

The improvement of the Yolov7 network structure by using the CBAM attention mechanism comprises the following steps: the Yolov7 network structure comprises an input end module, a backhaul module, a Head module and a Prediction module; adding a CBAM attention mechanism between the Backbone module and the Head module in the Yolov7 network structure to enhance the extraction of features.

The CBAM comprises a channel attention module and a space attention module, and the flow of the CBAM attention mechanism for extracting the feature map is as follows:

the channel attention module respectively carries out global maximum pooling operation and global average pooling operation on the input characteristic graphs with the size of H x W x C to obtain two characteristic graphs with the size of 1 x C, and the two characteristic graphs are respectively sent to a two-layer neural network; then, carrying out addition operation based on element-wise on the output features, carrying out sigmoid activation operation to generate final channel attribute feature, and finally carrying out multiplication operation on the channel attribute feature and the original input feature map to obtain the input features of the space attention module;

and the spatial attention module takes the feature diagram obtained in the previous step as an input feature diagram, and performs a global maximum pooling operation and a global average pooling operation to obtain two feature diagrams with the size of H x W1, performs a concat operation, performs a dimensionality reduction operation on the feature diagrams, generates a spatial attribute feature through a sigmoid activation operation, and finally performs a multiplication operation on the spatial attribute feature and the input feature diagram to obtain a final feature diagram.

The gender classification recognition method further comprises an evaluation step, wherein the evaluation step comprises the following steps:

by passing

To indicate the proportion of positive samples in the samples with positive prediction results, and by

The prediction result is represented as the proportion of the actual number of positive samples in the positive samples to the positive samples in the full samples, wherein TP represents that the positive samples are predicted to be a positive class, FP represents that the negative samples are predicted to be a positive class, and FN represents that the positive samples are predicted to be a negative class;

precision calculation based on Precision and RecallWeighted average of rates and recall F1-sore, i.e.

The Precision embodies the distinguishing capability of the model to the negative samples, the higher the Precision is, the stronger the distinguishing capability of the model to the negative samples is, the Recall embodies the recognition capability of the model to the positive samples is, the higher the Recall is, the stronger the recognition capability of the model to the positive samples is, F1-score is the combination of the two, and the higher the F1-score is, the more stable the model is.

The invention has the following advantages: a sheldrake sex classification recognition method based on target detection and classification network applies artificial intelligence technology to actual sheldrake cultivation production through a series of modern hardware equipment, and three processes of sheldrake disease supervision and control are automatically realized by a computer: early discovery, early isolation and early treatment; therefore, the occurrence of large-scale sheldrake group diseases is reduced, the economic loss is greatly reduced, and meanwhile, the economic benefit can be improved; the artificial intelligent equipment system participates in the sheldrake breeding industry in the whole process, the whole process of growth and sale of each sheldrake can be recorded through the computer database, the traceability of sheldrake growth information, the traceability of duck meat sources and the direction-finding recordable of duck meat are realized, an intelligent industry tracing system with traceable product sources, direction-finding verifiable product sources and responsibility-finding functions can be constructed in the future, the system is favorable for the supervision of relevant enterprises by government departments, and is favorable for consumers to comprehensively understand product information and purchase more securely; the artificial intelligence traceability system is transferred to the sheldrake cultivation industry, and the healthy development of the industry is facilitated.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of a Yolov7 network structure;

fig. 3 is a schematic diagram of a grid structure of Yolov7 with a CABM attention mechanism added.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided below in connection with the appended drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the present invention relates to a sheldrake gender classification and identification method based on a target detection and classification network, which performs single-class or two-class target detection on the whole sheldrake body or sheldrake head by preprocessing the original data set. After the target detection of the second classification is carried out, the sex proportion of the sheldrake is directly obtained; after single-class target detection is carried out, a whole body picture of a single sheldrake or a head picture of the single sheldrake passes through a classification network, and finally the sex ratio of the sheldrake is obtained; the method specifically comprises the following steps:

s1, constructing a data set: acquiring image and video data sets by utilizing a micro pan-tilt camera-Osmo Pocket 2 with extremely strong adaptability and extremely high flexibility, and acquiring data of sheldrake in 10 different sheldrake houses by changing the shooting angle and the shooting distance for many times in the data set preparation process; secondly, manually screening and discarding some data with extremely high repetition degree and some redundant data which are not captured by shellfishes due to shielding of shellfishes; finally, the resulting data set contains 1500 images in total, wherein the training set contains 1300 images and the test set contains 200 images.

S2, labeling of a data set: marking the whole sheldrake body in each image in the data set according to a bounding box marking rule;

further, the bounding box annotation rule includes: (1) the classification is unambiguous: setting the type as duck aiming at the target detection of a single type, and setting the type as male and female aiming at the target detection of a second type; (2) In case the target individual is occluded, truncated or blurred, the bounding box is to be unambiguous: when the target individual is shielded and cut off, the boundary frame contains key features of the target individual but does not contain other individuals, and when the target individual is fuzzy, the sample still participates in training to improve the robustness of the model; and (3) marking end, performing boundary check: and the coordinate of the bounding box is not on the image boundary, so that the data enhancement process is prevented from generating boundary crossing errors.

S3, data preprocessing: preprocessing data by a method of Mixup data enhancement, mosaic data enhancement and HSV color space enhancement, and adjusting the proportion of positive and negative samples in target detection by Focal local;

s4, gender classification and identification: the CBAM attention mechanism improves a Yolov7 network structure, the Yolov7 network structure of the CBAM attention mechanism is added to perform target detection on the head of the sheldrake on the processed image to obtain a sheldrake head image, a Yolov7 model preprocessing mode is integrally multiplexed with Yolov5, and the use of Mosaic data enhancement is suitable for small target detection. The method comprises the steps of providing an ELAN-based expansion ELAN in architecture, continuously enhancing network learning capacity under the condition of not damaging an original gradient path by using expand, shuffle and merge cardinality, expanding channels and cardinality of calculation blocks by using group convolution in the architecture of the calculation blocks, and guiding different groups of calculation blocks to learn more diversified characteristics; and screening and unifying the head images of the sheldrake, inputting the head images into a VovNet-27 slim classification network, and changing the output characteristic size into 1/4 of the original size after passing through 1 OSA module. The OSA module first convolves 5 channels with a number of 64 by 3, and then concats the 5 convolved outputs together, then the number of channels should be 64 by 5=320. Then, after a convolution kernel of 1 × 1, the output channel number is 128, so as to realize the purpose of classifying the sex of the sheldrake head image, and finally obtain the sheldrake sex ratio.

S5, a network model evaluation step, which specifically comprises the following steps:

by passing

calculating a weighted average of Precision and Recall F1-sore from Precision and Recall, i.e.

The Precision embodies the distinguishing capability of the model to the negative samples, the higher the Precision is, the stronger the distinguishing capability of the model to the negative samples is, the Recall embodies the recognition capability of the model to the positive samples, the higher the Recall is, the stronger the recognition capability of the model to the positive samples is, the F1-score is the combination of the two, and the higher the F1-score is, the more stable the model is.

The average of the highest Precision at different recalls is represented by AP (typically each AP will be calculated for each class separately) and the calculation formula is as follows:

first, a set of thresholds [0,0.1,0.2, \ 8230;, 1 ] is set]. Then for recalls greater than each threshold (e.g., recalls>0.3 We get a corresponding maximum precision. Thus, 11 precisions are calculated, and the AP is the average of these 11 precisions. This method is called 11-point interleaved average precision. Where r is call, P, greater than each threshold _interp (r) is the maximum precision to which r corresponds. The specific process is as follows:

(1) For the category C, firstly, sorting prediction frames of all the categories C output by the algorithm according to the confidence level;

(2) Setting different k values, selecting top-k prediction boxes, calculating FP and TP to ensure that recall is respectively equal to 0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9 and 1.0, and then calculating corresponding Precision;

(3) Averaging the 11 precisions to obtain the AP;

another calculation method [27] is replaced by PASCAL VOC challenge since 2010, so that the calculation precision is improved, and low accuracy can be better distinguished. Assuming that there are M positive cases in the N samples, M calls are obtained, for each call value r, the maximum precision corresponding to (r' > r) is calculated, and then the M precisions are averaged to obtain the AP.

For PR curves, we can use integration to compute:

generally, an AP is the Mean under a single class, and a Mean Average Precision (mAP) is the Mean of APs under all classes. The calculation formula is as follows:

where Q is the number of all classes and the denominator is the sum of the APs of all classes.

Further, the Mixup data enhancement includes: fusing images among different classes, constructing a new training sample and a new label in a linear interpolation mode, thereby expanding a training data set, and processing the data label through the following formula:

wherein (x) _i ,y _i ) And (x) _j ,y _j ) Two areThe group data pairs represent training samples in the original data set and their corresponding labels,

represent

And the corresponding label, lambda, represents the mixing coefficient calculated by the Beta distribution with the parameters of a and b, and a and b respectively represent the fusion proportion of two different images.

Further, the Mosaic data enhancement comprises:

randomly extracting a batch of image data from the sheldrake data set enhanced by the Mixup data, and randomly selecting 4 images from the batch of image data to randomly zoom and randomly distribute to splice into a new image;

and (3) reselecting a batch of image data with different sizes, and randomly selecting 4 images from the image data to randomly zoom and randomly distribute to splice into a new image to obtain data enhanced by Mosaic data.

Further, the adaptive picture scaling comprises:

calculating a scaling ratio: setting the size of the original picture to x ₁ ×y ₁ The original scaled picture size is x ₂ ×y ₂ Dividing the length and width correspondence of the original zoom picture by the length and width of the original picture to obtain two zoom coefficients of alpha and beta, and selecting the small zoom coefficient phi = min (alpha, beta), wherein x ₂ /x ₁ ＝α，y ₂ /y ₁ ＝β；

Calculate the scaled size: multiplying the length and width of the original picture by a smaller scaling factor alpha to obtain the size (at this time, the black edge is not filled) x of the desired scaled picture ₃ ×y ₃ Wherein x is ₁ *φ＝x ₃ ，y ₁ *φ＝y ₃ ；

Calculating a black edge filling value: reducing the width of the original scaled pictureAfter the width of the picture (the black border is not filled yet at this time) is desirably scaled, the original black border width m needing to be filled is obtained ₁ (ii) a The black edge width is then left over by a divisor of 128 to obtain the desired black edge width m ₂ (ii) a Finally, dividing the width of the expected black edge by 2, namely, evenly dividing the filled black edge to two ends of the expected zoom picture, and dividing x ₁ ×y ₁ To x ₃ ×(y ₃ +m ₂ ) Instead of x ₂ ×y ₂ 。

Further, focal loss is an improved cross entropy loss function, the problem of serious imbalance of the proportion of positive and negative samples in one-stage target detection can be solved, and high accuracy is achieved in a single-stage structure. In the population image of the data set of the invention, the sex ratio of sheldrake is concentrated in 1:7 (drake: duck), the proportion of positive and negative samples is unbalanced, therefore, the calculation formula of Focal local comprises:

FL(p _t )＝-α _t (1-p _t ) ^γ log(p _t )

Further, using the CBAM attention mechanism to improve Yolov7 network architecture includes: the Yolov7 network structure comprises an input end module, a trunk extraction (Backbone) module, a detection (Head) module and a Prediction (Prediction) module; wherein the input terminal represents an input picture. The size of the input image of the network is 608 × 608, and the stage usually includes an image preprocessing stage, i.e., scaling the input image to the input size of the network, and performing normalization. In the network training stage, yolov7 enhances the training speed of the operation promotion model and the accuracy of the network by using the Mosaic data; and provides a self-adaptive anchor frame calculation and self-adaptive picture scaling method; the backhaul module is usually a network of classifiers with excellent performance, and is used for extracting some general feature representations. The Yolov7 uses not only the CSPDarknet53 structure but also the Focus structure as a reference network; the Head module has the functions of: in the convolutional neural network training process, the deeper the number of layers constructed by the network is, the stronger the feature information extracted to the target is, and the better the prediction effect of the network on the target is. But also makes the location information of the target weaker and weaker; and with the continuous operation of convolution, the network is easy to lose the characteristic information of the small target. Therefore, the Yolov7 network adopts the FPN + PAN structure to perform multi-scale fusion on the target characteristics, so that the phenomenon of loss of characteristic information of small targets is avoided to a great extent; the Prediction module aims at different detection algorithms, the number of branches at the output end is different, and the Prediction module usually comprises a classification branch and a regression branch. Yolov7 replaces the Smooth L1Loss function with the GIOU _ Loss function, so that the detection precision of the algorithm is further improved; adding a CBAM attention mechanism between the Backbone module and the Head module in the Yolov7 network structure to enhance the extraction of features.

As shown in fig. 2, the Yolov7 model preprocessing mode is wholly multiplexed with Yolov5, and the use of Mosaic data enhancement is suitable for small target detection. An extended ELAN (E-ELAN) based on the ELAN is provided on the architecture, the network learning capability is continuously enhanced under the condition that the original gradient path is not damaged by using the expanded, the shuffle and the merge cardinality, the channels and the cardinality of the calculation block are extended by using the group convolution on the architecture of the calculation block, and the calculation blocks of different groups are guided to learn more diversified characteristics.

The CBAM includes a Channel Attention Module (CAM) that allows the network to focus on the foreground of the image, making the network focus more on the meaningful areas, and a Spatial Attention Module (SAM) that allows the network to focus on the locations in the entire picture that are rich in context information.

As shown in fig. 3, a CBAM attention mechanism is added to the YOLOV7 network structure, which functions to further improve the feature extraction capability of the enhanced feature extraction network portion. Because the features extracted from the backbone network part are the basis of subsequent network processing, if the attention mechanism is added to the backbone network part, the random initialization weight of the attention mechanism module will destroy the weight of the backbone network part, which may cause the network prediction effect to be poor, and thus the attention mechanism is added to the enhanced feature extraction network part, which will not destroy the initial features extracted from the network.

The flow of extracting the feature map by the CBAM attention mechanism is as follows:

and the spatial attention module takes the feature map obtained in the previous step as an input feature map, and performs a global maximum pooling operation and a global average pooling operation to obtain two feature maps with the size of H x W1, performs concat operation, performs dimensionality reduction operation on the feature maps, generates a spatial attribute feature through a sigmoid activation operation, and finally performs a multiplication operation on the spatial attribute feature and the input feature map to obtain a final feature map.

As can be seen from Table 1 below, YOLOv7 is superior to other detection algorithms in terms of accuracy, recall, F1-score, mAP, detection speed, etc. For example, its accuracy on sheldrake datasets is 3.64% higher than yollov 4; the recall rate is 4.94% higher than Yolov 5; the mAP is 7.75% higher and 2.67% higher than YOLOV4 and YOLOV5 respectively. The remaining indicators are substantially better than other target detection algorithms. And finally, YOLOv7 is selected as a target detection algorithm used by the invention.

Method

P

R

F1

mAP@0.5

mAP@0.5:0.95

FPS

CenterNet

92.16％

95.12％

0.94

95.41％

62.80％

33

SSD

86.03％

82.40％

0.84

89.03％

45.90％

39

EfficientDet

87.66％

92.98％

0.90

95.91％

60.40％

26

RatinaNet

88.00％

89.17％

0.89

94.04％

56.40％

13

YOLOv4s

92.26％

78.04％

0.85

89.82％

44.10％

22

YOLOv5s

95.50％

88.70％

0.92

94.90％

66.70％

62

YOLOv7

95.80％

93.64％

0.95

97.57％

65.50％

60

As shown in table 2 below, it can be seen that accuacy of the VovNet _27slim model is only at the mid-upstream level in many models, but F1-score is the highest, and has certain advantages in speed and Fps, and then the classification network is finally selected as VovNet _27slim by comparing with the actual verification effect graphs of other models.

Model	Accuracy	Precision	Recall	F1-score	Time	Fps
							ResNet_18	98.753	96.920	98.199	97.546	9	297.23
ResNet_34	98.981	97.893	98.047	97.970	9	167.63
							ResNet_50	98.209	95.533	97.541	96.503	10	127.01
ResNext_50-32x4d	98.646	96.645	98.064	97.338	12	96.10
							RegNetx_200mf	98.152	96.073	96.597	96.333	9	127.51
RegNety_200mf	98.424	97.005	96.685	96.844	10	96.12
							MobileNetv2_1.0	98.829	97.513	97.833	97.671	9	153.09
MobileNetv3_small	98.829	97.232	98.154	97.686	8	173.50
							MobileNetv3_large	98.880	97.393	98.184	97.783	9	142.21
GhostNet_1.0	98.506	96.505	97.626	97.055	9	79.50
							ShuffleNetv2_1.0	98.240	95.393	97.881	96.584	9	130.96
DenseNet_121	99.000	97.316	98.789	98.035	11	55.88
							VovNet_27slim	98.886	97.051	98.616	97.814	9	273.27
se-ResNet_18	99.019	97.920	98.176	98.048	9	232.09
							se-ResNet_34	99.063	97.698	98.612	98.148	9	125.24
se-ResNet_50	98.430	96.506	97.296	96.895	11	95.80

For the more important input size of the classification network, the classification network is determined as VovNet _27slim selected in the experiment, and models with the input sizes of 64x64 and 128x128 are compared and verified, although the 128x128 model has certain disadvantages in time and Fps, the indexes related to accuracy are advanced, and the Yolov5 used before the classification network has extremely high speed, so that the classification network part needs to seek the accuracy more. In summary, the present invention finally selects a model with an input size of 128x 128.

The improved model is deployed into a computer integrated environment, data input of an actual production environment is obtained through monitoring equipment (a camera or related video image acquisition equipment) of a computer-controlled farm, the data input is sent into a model trunk for prediction through a data standardized processing method with the model set, wherein the data input comprises gender classification and identification of sheldrake, body size estimation and sheldrake group counting, and the predicted parameters are sent back to on-site breeding equipment of the farm to realize real-time supervision (supported by a classification algorithm of the model, disease and other disability and fighting conditions), early warning and differentiated accurate feeding according to individual ID (supported by a target detection algorithm).

In addition, the model can also be used alone: after the number of the frames of the picture or the video is input, relevant information can be obtained, and the special frames are identified, so that the method is beneficial to backtracking the accident occurrence reason, or intelligently analyzing the special conditions.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A sheldrake gender classification and identification method based on target detection and classification network is characterized by comprising the following steps: the gender classification and identification method comprises the following steps:

preprocessing data by a method of Mixup data enhancement, mosaic data enhancement and HSV color space enhancement, then carrying out adaptive picture scaling, and adjusting the proportion of positive and negative samples in target detection by Focal local;

2. The sheldrake gender classification and identification method based on the target detection and classification network as claimed in claim 1, wherein: the Mixup data enhancement comprises: fusing images among different classes, constructing a new training sample and a new label in a linear interpolation mode, thereby expanding a training data set, and processing the data label through the following formula:

represent

3. The sheldrake gender classification and identification method based on the target detection and classification network as claimed in claim 2, wherein: the Mosaic data enhancement comprises:

4. The sheldrake gender classification and identification method based on the target detection and classification network as claimed in claim 3, wherein: the adaptive picture scaling comprises:

calculating the scaling ratio: setting the size of the original picture to x ₁ ×y ₁ The original scaled picture is x ₂ ×y ₂ Of the original zoomed pictureDividing the length and width of the original picture by the length and width of the original picture to obtain two scaling coefficients of alpha and beta, and selecting the small scaling coefficient phi = min (alpha, beta), wherein x is ₂ /x ₁ ＝α，y ₂ /y ₁ ＝β；

Calculate the scaled size: multiplying the length and the width of the original picture by a smaller scaling coefficient alpha to obtain the size x of the expected scaling picture ₃ ×y ₃ Wherein x is ₁ *φ＝x ₃ ，y ₁ *φ＝y ₃ ；

Calculating a black edge filling value: subtracting the width of the expected zoom picture from the width of the original zoom picture to obtain the width m of the original black edge to be filled ₁ (ii) a The black edge width is then left over by a divisor of 128 to obtain the desired black edge width m ₂ (ii) a Finally, dividing the width of the expected black edge by 2, namely, evenly dividing the filled black edge to two ends of the expected zoom picture, and dividing x ₁ ×y ₁ To x ₃ ×(y ₃ +m ₂ )。

5. The sheldrake gender classification and identification method based on the target detection and classification network as claimed in claim 1, wherein: the calculation formula of the Focal local comprises the following steps:

FL(p _t )＝-α _t (1-p _t ) ^γ log(p _t )

improving the accuracy of the network structure by adjusting the formula for loss, where p _t Is the classification probability of the different classes; gamma is a value larger than 0, called a modulation coefficient, and the weight of the samples which are easy to classify is reduced, so that the model concentrates more on the samples which are difficult to classify during training; alpha (alpha) ("alpha") _t Is a [0,1 ]]Fractional number between, by setting alpha _t The value of (3) controls the sharing weight of the positive and negative samples to the total loss, and a small value is taken to reduce the weight of the negative sample; gamma and alpha _t The weights of the difficult and easy classification samples can be controlled and the weights of the positive and negative samples can be adjusted through the combination of the local and the easy classification samples.

6. The sheldrake gender classification and identification method based on the target detection and classification network as claimed in claim 1, wherein: the improvement of the Yolov7 network structure by using the CBAM attention mechanism comprises the following steps: the Yolov7 network structure comprises an input end module, a backhaul module, a Head module and a Prediction module; adding a CBAM attention mechanism between the backhaul module and the Head module in the Yolov7 network structure to enhance feature extraction.

7. The sheldrake gender classification and identification method based on the target detection and classification network as claimed in claim 1, wherein: the CBAM comprises a channel attention module and a space attention module, and the flow of the CBAM attention mechanism for extracting the feature map is as follows:

the channel attention module respectively carries out global maximum pooling operation and global average pooling operation on the input feature graphs with the sizes of H, W and C to obtain two feature graphs with the sizes of 1, 1 and C, and the two feature graphs are respectively sent to a two-layer neural network; then, carrying out addition operation based on element-wise on the output features, carrying out sigmoid activation operation to generate final channel attribute feature, and finally carrying out multiplication operation on the channel attribute feature and the original input feature map to obtain the input features of the space attention module;

8. The sheldrake gender classification and identification method based on the target detection and classification network as claimed in any one of claims 1 to 7, wherein: the gender classification recognition method further comprises an evaluation step, wherein the evaluation step comprises the following steps:

by passing