CN114529840A

CN114529840A - YOLOv 4-based method and system for identifying individual identities of flocks of sheep in sheepcote

Info

Publication number: CN114529840A
Application number: CN202210109354.4A
Authority: CN
Inventors: 于文波; 穆昕钰; 张春慧; 宣传忠; 张永安; 马彦华; 姬振生; 武佩
Original assignee: Inner Mongolia Agricultural University
Current assignee: Inner Mongolia Agricultural University
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-24

Abstract

The invention relates to a method and a system for identifying individual identities of flocks of sheep in a sheepcote based on a YOLOv4 neural network model, wherein the method comprises the following steps: s1: acquiring face information of a sheep, preprocessing a face picture, labeling the face with a labeling frame to obtain a data set, and dividing the data set into a training set and a test set; s2: the method for constructing the sheep face recognition neural network model based on YOLOv4 comprises the following steps: input, Backbone, Neck and Head; s3: constructing a loss function, pre-training by using goat face data, training by using a training set, using model parameters obtained after pre-training as initial parameters of the goat face recognition neural network model, and training by using the training set; the test set is input into a trained neural network model of YOLOv4, and the performance of the test set is evaluated. The method provided by the invention adopts a non-contact identification method, has low cost and high precision, and is safe and effective, so that the problems of label dropping and easy stress generation of sheep are avoided.

Description

YOLOv 4-based method and system for identifying individual identities of flocks of sheep in sheepcote

Technical Field

The invention relates to the field of modern intelligent animal husbandry, in particular to a method and a system for identifying individual identities of sheepfold in a sheepcote based on a YOLOv4 neural network model.

Background

In recent years, the livestock husbandry at home and abroad is developing from a traditional mode to intellectualization, precision and scale. Sheep are only an important livestock and are an important part of the livestock industry at present. The identification of individual sheep is widely regarded as an important aspect of large-scale and accurate sheep raising industry. The main method for recognizing the body of the sheep in the current sheep raising industry is a contact type recognition method, namely a manual labeling or ear labeling method. The most common sheep identification method applied in large-scale farms is the ear tag method based on radio frequency. Along with the increase of breed scale, the ear tag is because of rubbing each other between the sheep and baits the mark phenomenon that appears and is showing and increase, beats the ear tag simultaneously and can make the sheep only appear stress reaction, and mark hole inflammation also can cause certain influence to sheep health, and the mark phenomenon that drops in addition can increase the cost that raiser sheep was only raised. Therefore, in the breeding industry in recent years, a non-contact individual identification method is gradually emphasized by people, wherein the non-contact method based on computer vision can effectively save labor cost and improve working efficiency, and particularly can efficiently replace manpower to supervise livestock growth conditions all day long under a large-scale breeding environment, so that the labor cost of farmers and pastures is greatly reduced, and the breeding efficiency is improved. Zusau et al published patent "a milk cow rumination behavior identification method based on SSD convolutional neural network" provides an identification method for cattle, but the difficulty of data set acquisition of flocks of sheep is greater compared with flocks of cattle. The cattle population living density is small, the moving range is small, the mutual shielding condition is few, the breeding density in the sheep pen is high, the moving amount is large, and the mutual shielding phenomenon is serious; liu Bing et al published patent Yolov 4-based chicken farm raised chicken identification algorithm proposes a detection method in a farm aiming at raised chickens, but does not identify each individual chicken.

Therefore, how to realize the individual identification of the flocks of sheep in a non-contact manner becomes a problem to be solved urgently.

Disclosure of Invention

In order to solve the technical problem, the invention provides a method and a system for identifying individual identities of sheep flocks in a sheep house based on a Yolov4 neural network model.

The technical solution of the invention is as follows: a method for identifying individual identities of flocks of sheep in a sheepcote based on a YOLOv4 neural network model comprises the following steps:

step S1: acquiring facial information of each sheep in a sheep flock, preprocessing a facial picture, labeling the face with a labeling frame to obtain a data set, and dividing the data set into a training set and a test set;

step S2: constructing a sheep face recognition neural network model based on YOLOv4, wherein the sheep face recognition neural network model comprises the following steps: input, Backbone, Neck and Head: backbone is used as a main feature extraction network, CSPnet is added on the basis of Darknet53 to construct CSPDarknet53, and a Mish activation function is adopted; the Neck is used as a reinforced feature extraction network, and SSP and PANet are respectively used for extracting context features and parameter aggregation; according to the labeling box, the Head recalculates the size of the anchor box by using a Kmeans + + algorithm, carries out prediction based on the output characteristics of the Neck, and calculates a final target detection box after non-maximum inhibition is carried out on the generated prediction box;

step S3: constructing a loss function, pre-training by using the goat face data, taking model parameters obtained after pre-training as initial parameters of the goat face recognition neural network model, and training the model by using the training set to obtain the trained goat face recognition neural network model; and finally, inputting the test set into the trained sheep face recognition neural network model, and evaluating the performance of the model.

Compared with the prior art, the invention has the following advantages:

the invention discloses a method for identifying individual identities of flocks of sheep in a sheep house based on a YOLOv4 neural network model, which is a multi-source video stream fusion technology and can effectively solve the problem that flocks of sheep in the sheep house are shielded mutually in the data collection process. The method provided by the invention is a non-contact identification method, and the sheep cannot only have stress reaction. Different from the traditional characteristic point extraction and classification method, the target detection method based on the convolutional neural network does not need to artificially extract characteristics, can obtain more essential and richer essential characteristics of the target through repeated characteristic extraction and stacking, and has the advantages of smaller system scale and simpler structure, higher stability, stronger anti-interference capability, strong universality and stronger identification precision for deformation and shielding of the detected target.

Drawings

Fig. 1 is a flowchart of an identification method of an individual herd of sheep in a sheep pen based on a YOLOv4 neural network model in an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a CSPDarknet53 module according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a CSP according to an embodiment of the present invention;

FIG. 4 is a Mish function image according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a neural network model for sheep face recognition based on YOLOv4 according to an embodiment of the present invention;

FIG. 6 is a schematic overall flowchart of a sheep face recognition neural network model based on YOLOv4 in an embodiment of the present invention;

FIG. 7 is a graph showing mAP indexes of test results according to an embodiment of the present invention;

fig. 8 is a block diagram of a system for identifying individual identities of flocks of sheep in a sheepcote based on a YOLOv4 neural network model in an embodiment of the present invention.

Detailed Description

The invention provides a method for identifying individual identities of flocks of sheep in a sheep hurdle based on a YOLOv4 neural network model, which is low in cost, high in precision, safe and effective by adopting a non-contact identification method, so that the problems of label dropping and easy stress generation of sheep in the existing contact technology are solved, and the problems that the sheep is easy to shield and cannot accurately identify in the existing non-contact technology are also solved.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.

Example one

As shown in fig. 1, an identification method of an individual flock in a sheephouse based on a YOLOv4 neural network model provided by an embodiment of the present invention includes the following steps:

step S3: constructing a loss function, pre-training by using the goat face data, taking model parameters obtained after pre-training as initial parameters of the goat face recognition neural network model, and training the model parameters by using a training set to obtain the trained goat face recognition neural network model; and finally, inputting the test set into the trained sheep face recognition neural network model, and evaluating the performance of the model.

In one embodiment, the step S1: the face information of each sheep in the acquisition flocks of sheep carries out the preliminary treatment to the face picture, carries out manual mark to the goat face with the mark frame, obtains the data set to divide into training set and test set, specifically include:

step S11: collecting video information of sheep in a sheep flock at different time intervals;

first, the sheep were numbered on their bodies for subsequent classification of the sheep data sets. Secondly, a plurality of cameras are arranged above the height of the sheep by 10-20cm, the distance is 1-2m outside the fence, and the number of the sheep in the visual field of the cameras is preferably controlled within 20. The method comprises the steps of shooting 24-hour videos of a sheep flock in different time periods such as sunny days and cloudy days, and transmitting video files to a server by using methods such as a network cable technology and a WIFI technology.

Step S12: intercepting video information according to a preset frame intercepting frequency to obtain a sheep image, extracting and matching feature points, and fusing facial data of each sheep to obtain an image containing complete facial data of each sheep;

and cutting invalid data in the video acquired by the camera by using AdobePremierePro2019 software for the acquired video file, storing the invalid data, extracting video frames of the stored valid video by using an OpenCV open source software library in python, setting the frame cutting frequency to be 30, and storing the video frames in a JPG format.

The method comprises the steps of extracting a picture from a video frame by utilizing a multi-source video stream fusion technology to extract feature points, carrying out coordinate transformation and unification among a plurality of groups of cameras by utilizing a stereoscopic vision principle according to the arrangement positions of the cameras, combining the time correlation characteristic of video data shooting, realizing feature point matching of multi-source video data, effectively fusing the picture containing the face information of the sheep, realizing effective fusion of the multi-source video data, and finally obtaining the picture containing the complete face information of the sheep.

And storing all pictures of the same sheep into the same folder according to the identity tag on the sheep, wherein the name of the folder is the same as that of the sheep tag. If a plurality of sheep exist in the picture, the picture is stored in a plurality of folders, then the data set is cleaned, the fuzzy pictures and the pictures with high similarity in the folders are removed, and 600 pictures are reserved for each sheep.

Step S13: and (3) carrying out translation and rotation augmentation on the picture of the complete facial data of each sheep in the folder, labeling the faces of the sheep to obtain a data set, and dividing the data set into a training set and a test set according to a preset proportion.

Labeling the goat face in the photo in the folder by using a LabelImg visual image calibration tool to obtain a labeling frame, generating a voc-format data set, if the types are more, converting the voc format into a coco data set format to obtain final data, and performing 4: the scale of 1 is divided into a training set and a test set.

In one embodiment, the step S2: constructing a sheep face recognition neural network model based on YOLOv4, wherein the sheep face recognition neural network model comprises the following steps: input, Backbone, Neck and Head: backbone is used as a main feature extraction network, CSPnet is added on the basis of Darknet53 to construct CSPDarknet53, and a Mish activation function is adopted; the Neck is used as a reinforced feature extraction network, and SSP and PANet are respectively used for extracting context features and parameter aggregation; according to the labeling box, the Head recalculates the size of the anchor box by using a Kmeans + + algorithm, predicts based on the output characteristics of the Neck, and calculates a final target detection box after non-maximum inhibition is performed on the generated prediction box, wherein the method specifically comprises the following steps:

step S21: building a Backbone as a main feature extraction network, adding CSPnet on the basis of Darknet53 to build a CSPDarknet53, wherein the CSPDarknet53 comprises the following steps: 1 convolutional layer and 5 Resblock _ body modules, wherein the Resblock _ body modules comprise: 2 volume blocks, normalization and Mish activation functions;

the Backbone is constructed as a main feature extraction network, CSPnet is added on the basis of Darknet53 to construct CSPDarknet53, the structure of the network is shown in figure 2, the network has a 53-layer structure, pictures with Input of 416 x3 are Input, a feature layer is changed into 416 x 32 after a first layer of convolution, and then 5 Resblock _ body large convolution blocks formed by a series of convolution, an activation function and standardization exist are provided, the Resblock _ body can continuously downsample the pictures, the width and height of the pictures are continuously compressed, and the number of channels is continuously expanded. The first Resblock _ body contains a Resblock residual network, and the input becomes a 208 × 208 × 64 feature layer. The second Resblock _ body contains two Resblock residual networks and the input becomes the feature layer of 104 x 128. The third Resblock _ body contains eight Resblock residual networks and the input becomes the 52 × 52 × 256 feature layer. The fourth Resblock _ body contains eight Resblock residual networks and the input becomes the 26 × 26 × 512 feature layer. The fifth reblock _ body contains four reblock residual networks and the input becomes a 13 x 1024 feature layer.

The Resblock _ body is a core for constructing a CSPDarknet53 network, wherein a CSP has a structure as shown in fig. 3, in which part2 is a main part, features of an input are extracted after stacking a series of residual networks, and part1 on one side is similar to a large residual side structure, so that repeated calculation of reducing gradients of part2 can be skipped, and the input is directly stacked with an output after being processed by a small amount. Resblock _ body is essentially a large rolling block consisting of one downsampling and a number of residual networks. Firstly, zero padding and a convolution block with the step size of 2x2 are used for high and wide compression, then a large residual edge is established, the large residual edge can bypass a plurality of residual structures, the trunk part can be circulated, and the residual structures are inside the circulation. For the whole CSPdark net structure block, it is a large residual block + a plurality of small residual blocks inside. The Resblock _ body based residual network consists of Resblock. Resblock is formed by convolution, standardization and activation functions, and is provided with two convolution blocks, the size of a first convolution fast convolution kernel is 1x1, the size of a second convolution fast convolution kernel is 3x3, and when a residual error network is forward, a residual error block is formed by adding input and output after two convolutions. The activation function of Resblock is formed by a Mish activation function, and the formula is as follows:

Mish＝x×tanh(ln1+e^x)

wherein x is a network parameter;

the image of the Mish function is shown in FIG. 4. The embodiment of the invention adopts the Mish activation function, compared with the ReLU, the Mish function is unbounded, and the saturation caused by capping is avoided. And the gradient of the Mish function is smoother, and the smooth activation function allows information to go deep into the neural network better, so that better accuracy and generalization capability are obtained.

As shown in fig. 5, after the input picture passes through the Backbone feature extraction network, three effective feature layers, i.e., C3: effective feature layer of (52 × 52 × 256), C4: effective feature layer of (26 × 26 × 512), C5: and (13 × 13 × 1024) effective feature layers, and inputting the three effective feature layers into a Neck reinforced feature extraction network for subsequent operation.

Step S22: constructing a neutral as an enhanced feature extraction network, adopting SSP to increase a receptive field and separate the most important context and PANET to aggregate parameters of SSP and different Backbone feature layers, adjusting based on the existing structure of YOLOv4, performing convolution and stacking on the output of SSP to obtain S1, performing convolution and up-sampling, performing convolution 5 times with one path of output P3 of PANET, stacking and outputting P5. The P5 and the other two paths of outputs P1 and P3 of the PANET are used as the input of the subsequent Head together;

firstly, extracting an effective feature layer C5 output by a fifth resblock _ body in the network from the trunk features: (13 × 13 × 1024) and then the pooled layers with pooling kernel size of 5 × 5, 9 × 9, 13 × 13 in SPP were pooled maximally and then stacked to obtain pooled output, and then 3 convolutions were performed and input to the PANet structure.

PANet is an example segmentation network that can extract features repeatedly. In the YOLOv4 network, for the final output S1 of the SPP structure and the third effective feature layer C3 output in the backbone feature extraction network: (52 × 52 × 256) and a fourth effective feature layer C4: the (26 × 26 × 512) effective feature layer uses a PANet structure.

And (3) final output S1 in the SPP structure enters into a PANET to carry out convolution and up-sampling operation, and the effective feature layer C4 of the fourth output of the trunk feature extraction network: (26 × 26 × 512) performing convolution once and inputting the convolution into the PANet, wherein the two effective feature layers S1 and C4 are stacked in the PANet and then subjected to convolution for 1 × 1, 3 × 3,1 × 1, 3 × 3 and 1 × 1 five times to obtain an effective feature layer P2; the effective feature layer C3 of the third output of the backbone feature extraction network: (52 × 52 × 256) is convolved once and input into PANet to be convolved for 1 × 1, 3 × 3, and 1 × 1 five times, while P2 is convolved + upsampled, then two effective feature layers are stacked to obtain an effective feature layer P1, P1 is subjected to two operations, namely output to YOLO Head and downsampled, after downsampling, 1 × 1, 3 × 3, and 1 × 1 five times are convolved with P2 and stacked to obtain P3, and P3 is output to Head. The P3 is downsampled, and then convolved with S1 for 1 × 1, 3 × 3, and 1 × 1 five times, and stacked to obtain P4, which is output to Head.

Step S23: building a Head, including the integration of a plurality of 1 × 1 convolutions and 3 × 3 convolutions, recalculating the sizes of a plurality of anchor frames with different sizes by using a Kmeans + + algorithm according to manually labeled frames, allocating 3 anchor frames to each feature layer according to the feature layers corresponding to inputs P1, P3 and P5 allocated to the Head from small to large, outputting a prediction frame, and calculating a final target detection frame after non-maximum inhibition is performed on the prediction frame.

YOLOv4 uses the same Head part as YOLOv3, and is the integration of a series of 1 × 1 convolution and 3 × 3 convolution, the function of 3 × 3 convolution is feature integration, and the function of 1 × 1 convolution is to adjust the number of channels.

As shown in the yolohard part in fig. 5, three feature layers, i.e., a C3 layer, a C4 layer, and a C5 layer, which are output from the backhbone, i.e., three feature layers having shapes of (52, 256), (26, 512), (13, 1024), are respectively connected to yolohard via the enhanced feature extraction network hack, and the obtained feature maps have sizes of 13 × 13, 26 × 26, and 52 × 52, respectively. If 20 classes of Voc format datasets are used, the dimension is (5+20) × 3 ═ 75; when there are more classes, 80 classes are used in the coco dataset, and the dimension is (5+80) × 3 ═ 255. 5 of these represent 5 parameters: position x, y, width and height w, h and confidence.

When the anchor frame matching is carried out on the detection frame obtained in the Head, the size of the anchor frame is the detection frame which is usually selected by the KMeans algorithm in a compromise mode, the detection effect on objects with different sizes in some specific problems is not ideal, and the problems of false detection, missed detection or repeated detection can be caused. Therefore, the embodiment of the invention retrains the size of the anchor box by adopting a Kmeans + + algorithm.

In an embodiment, in step S23, recalculating sizes of a plurality of anchor frames with different sizes by using a Kmeans + + algorithm, allocating 3 anchor frames to each feature layer according to the feature layers corresponding to inputs P1, P3, and P5 allocated to Head from small to large, outputting a prediction frame, and after performing non-maximum suppression on the prediction frame, calculating a final target detection frame, specifically including:

step S231: randomly selecting a label frame as an initial clustering center c₁；

Calculating the length and width of all the labeled frames, wherein the length is equal to the horizontal coordinate of the lower right corner-the horizontal coordinate of the upper left corner, and the width is equal to the vertical coordinate of the lower right corner-the vertical coordinate of the upper left corner, and randomly selecting one labeled frame as an initial clustering center c₁；

Step S232: for each sample point x_iRespectively find x_iThe shortest distance to all cluster centers that are present is denoted by D (x, followed by

Obtaining the probability of each sample becoming the next clustering center, and finally selecting the next clustering center according to a wheel disc method;

step S233: repeating the step S232, stopping when K clustering centers are obtained, and finding x_iCluster center with minimum distance, x_iDividing the center into categories corresponding to the centers;

step S234: for each class c_iBy using

Recalculating the center of the category;

step S235: circulating the steps S232 to S233, and stopping when the obtained clustering center is not changed any more;

through the steps, the anchor frame is retrained, and the trained anchor frame is used as a new prior frame, so that the detection precision can be improved.

Step S236: and predicting the output characteristics of the Neck according to the anchor frame, and obtaining a final target detection frame after non-maximum inhibition is performed on the generated prediction frame.

In one embodiment, the step S3: constructing a loss function, pre-training by using goat face data, and taking model parameters obtained after pre-training as initial parameters of a goat face recognition neural network model, wherein the method specifically comprises the following steps:

step S31: the mean square error loss MSE shown in the formula (1) is constructed, in the model training process, in the confidence coefficient loss function and classification loss function part, the constructed mean square error loss MSE is used for replacing cross entropy loss BCE in the original YOLOv4 model, so that the detection and recognition capability of the sheep on a small target in a complex environment during face recognition neural network model training is improved:

wherein, y_mIn order to be the true data,

for fit data, M is the number of samples;

step S32: the method comprises the steps of pre-training a sheep face recognition neural network model by using sheep face data, enabling the model to better extract essential features of a sheep face, using a transfer learning thought in a YOLO model during training, using a pre-training result as an initial parameter in the training of the sheep face recognition model, and training the model by using a training set to accelerate the convergence rate of the model so as to obtain the trained sheep face recognition neural network model.

In order to accelerate the convergence rate of the model, the embodiment of the invention firstly pre-trains the goat face data, and 29000 goat face images of 100 different sizes, sexes and hair colors are collected and subjected to 100 generation training. After the model is pre-trained by a large number of sheep face pictures, the model can better extract the essential features of the sheep face. By utilizing the transfer learning thought during training in YOLO, the pre-training result is used as the initial parameter during training of the sheep face recognition model, so that the convergence speed of the model can be effectively accelerated and the model performance can be improved when the sheep face recognition neural network model is trained.

Fig. 6 is a schematic overall flow chart of the YOLOv 4-based neural network model for sheep face recognition in the embodiment of the present invention, from the construction of a data set, the construction of a model, and training to the final model verification.

In the embodiment of the invention, 16 sheep are collected, 15000 photos form a test set, and the sheep face recognition neural network model is tested, so that the obtained result is shown in figure 7, wherein the ordinate represents different sheep, the abscissa represents the mAP index, and the final average mAP value is 92.01%.

The invention discloses a method for identifying individual identities of flocks of sheep in a sheep house based on a YOLOv4 neural network model, which is a multi-source video stream fusion technology and can effectively solve the problem that flocks of sheep in the sheep house are shielded mutually in the data collection process. The method provided by the invention is a non-contact identification method, so that the sheep can not only have stress reaction. Different from the traditional characteristic point extraction and classification method, the target detection method based on the convolutional neural network does not need to artificially extract characteristics, can obtain more essential and richer essential characteristics of the target through repeated characteristic extraction and stacking, and has the advantages of smaller system scale and simpler structure, higher stability, stronger anti-interference capability, strong universality and stronger identification precision for deformation and shielding of the detected target.

Example two

As shown in fig. 8, an embodiment of the present invention provides a system for identifying individuals of flocks of sheep in a sheepcote based on a YOLOv4 neural network model, which includes the following modules:

the acquisition data set module 41 is used for acquiring the facial information of each sheep in the sheep flock, preprocessing the facial image, labeling the face with a labeling frame to obtain a data set, and dividing the data set into a training set and a test set;

the model building module 42 is configured to build a neural network model based on YOLOv4 for sheep face recognition, where the neural network model based on sheep face recognition includes: input, Backbone, Neck and Head: backbone is used as a main feature extraction network, CSPnet is added on the basis of Darknet53 to construct CSPDarknet53, and a Mish activation function is adopted; the Neck is used as a reinforced feature extraction network, and SSP and PANet are respectively used for extracting context features and parameter aggregation; according to the labeling box, the Head recalculates the size of the anchor box by using a Kmeans + + algorithm, carries out prediction based on the output characteristics of the Neck, and calculates a final target detection box after non-maximum inhibition is carried out on the generated prediction box;

the training model module 43 is used for building a loss function, pre-training the loss function by using the goat face data, using model parameters obtained after pre-training as initial parameters of the goat face recognition neural network model, and training the model parameters by using a training set to obtain the trained goat face recognition neural network model; and finally, inputting the test set into the trained sheep face recognition neural network model, and evaluating the performance of the model.

The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims

1. A method for identifying an individual status of a flock of sheep in a sheepcote based on a YOLOv4 neural network model is characterized by comprising the following steps:

step S1: acquiring face information of each sheep in a flock, preprocessing a face picture, labeling the face with a labeling frame to obtain a data set, and dividing the data set into a training set and a test set;

step S2: constructing a sheep face recognition neural network model based on YOLOv4, wherein the sheep face recognition neural network model comprises the following steps: input, Backbone, Neck and Head: backbone is used as a main feature extraction network, CSPnet is added on the basis of Darknet53 to construct CSPDarknet53, and a Mish activation function is adopted; the Neck is used as a reinforced feature extraction network, and SSP and PANet are respectively used for extracting context features and parameter aggregation; according to the labeling box, the Head recalculates the size of the anchor box by using a K means + + algorithm, predicts based on the output characteristics of the Neck, and calculates a final target detection box after non-maximum inhibition is performed on the generated prediction box;

2. The method for identifying individual flocks of sheep in sheepcote based on YOLOv4 neural network model as claimed in claim 1, wherein the step S1: the face information of each sheep in the acquisition flocks of sheep carries out the preliminary treatment to the face picture, carries out manual mark to the goat face with the mark frame, obtains the data set to divide into training set and test set, specifically include:

step S12: intercepting the video information according to a preset frame intercepting frequency to obtain a sheep image, extracting and matching feature points, and fusing face data of each sheep to obtain an image containing complete face data of each sheep;

step S13: and carrying out translation and rotation augmentation on the picture of the complete facial data of each sheep, then carrying out manual labeling on the sheep face to obtain a data set, and dividing the data set into a training set and a test set according to a preset proportion.

3. The method for identifying individual flocks of sheep in sheepcote based on YOLOv4 neural network model as claimed in claim 1, wherein the step S2: constructing a sheep face recognition neural network model based on YOLOv4, wherein the sheep face recognition neural network model comprises the following steps: input, Backbone, Neck and Head: backbone is used as a main feature extraction network, CSPnet is added on the basis of Darknet53 to construct CSPDarknet53, and a Mish activation function is adopted; the Neck is used as a reinforced feature extraction network, and SSP and PANet are respectively used for extracting context features and parameter aggregation; according to the labeling box, the Head recalculates the size of the anchor box by using a K means + + algorithm, predicts based on the output characteristics of the Neck, and calculates a final target detection box after non-maximum inhibition is performed on the generated prediction box, wherein the method specifically comprises the following steps:

step S23: building a Head, including the integration of a plurality of 1 × 1 convolutions and 3 × 3 convolutions, recalculating a plurality of anchor frame sizes with different sizes by using a K means + + algorithm according to the manual labeling frames, allocating 3 anchor frames to each feature layer according to the feature layers corresponding to inputs P1, P3 and P5 allocated to the Head from small to large, outputting a prediction frame, and calculating a final target detection frame after non-maximum inhibition is performed on the prediction frame.

4. The method for identifying individuals of flocks of sheep in a sheep hurdle based on YOLOv4 neural network model, according to the step S23, using K means + + algorithm to recalculate the sizes of a plurality of anchor frames with different sizes, each feature layer is allocated with 3 anchor frames according to the feature layer corresponding to the inputs P1, P3, P5 allocated to Head from small to large, a prediction frame is output, and after the prediction frame is subjected to non-maximum suppression, a final target detection frame is calculated, specifically comprising:

step S231: randomly selecting one marking frame as an initial clustering center c₁；

Step S232: for each sample point x_iSeparately find x_iThe shortest distance to all the cluster centers existing at present is represented by D (x), and then

step S234: for each class c_iBy using

Recalculating the center of the category;

step S236: and predicting the output characteristics of the Neck according to the anchor frame, and calculating a final target detection frame after non-maximum inhibition is performed on the generated prediction frame.

5. The method for identifying individual sheep flock individuals in a sheep hurdle based on the YOLOv4 neural network model as claimed in claim 1, wherein the step S3 is to construct a loss function, pre-train the loss function by using sheep face and face data, and use model parameters obtained after pre-training as initial parameters of the sheep face recognition neural network model, and the method specifically comprises:

step S31: constructing mean square error loss MSE shown as a formula (1), and in the model training process, replacing cross entropy loss BCE in the original YOLOv4 model with the constructed mean square error loss MSE at the confidence coefficient loss function and classification loss function part to improve the detection and recognition capability of the sheep to small targets in a complex environment during face recognition neural network model training:

wherein, y_mIn order to be the real data,

for fit data, M is the number of samples;

step S32: the method comprises the steps of pre-training a goat face recognition neural network model by using goat face data, enabling the model to be capable of well extracting essential features of the goat face, using a transfer learning thought during training in a YOLO model, using a pre-training result as an initial parameter during training of the goat face recognition model, and using a training set to train the model so as to accelerate the convergence rate of the model and obtain the trained goat face recognition neural network model.

6. A sheep flock individual identification system in a sheep house based on a YOLOv4 neural network model is characterized by comprising the following modules:

the acquisition data set module is used for acquiring the facial information of each sheep in a sheep flock, preprocessing the facial picture, labeling the face with a labeling frame to obtain a data set, and dividing the data set into a training set and a test set;

a model building module for building a neural network model based on YOLOv4 for sheep face recognition, wherein the neural network model based on sheep face recognition comprises: input, Backbone, Neck and Head: backbone is used as a main feature extraction network, CSPnet is added on the basis of Darknet53 to construct CSPDarknet53, and a Mish activation function is adopted; the Neck is used as a reinforced feature extraction network, and SSP and PANet are respectively used for extracting context features and parameter aggregation; according to the labeling box, the Head recalculates the size of the anchor box by using a K means + + algorithm, predicts based on the output characteristics of the Neck, and calculates a final target detection box after the generated prediction box is subjected to non-maximum inhibition;

training a model module: the method comprises the steps of constructing a loss function, pre-training by using goat face data, using model parameters obtained after pre-training as initial parameters of the goat face recognition neural network model, and training the model by using a training set to obtain a trained goat face recognition neural network model; and finally, inputting the test set into the trained sheep face recognition neural network model, and evaluating the performance of the model.