CN115239766A

CN115239766A - Pig detection and tracking method and device with rapid adaptability

Info

Publication number: CN115239766A
Application number: CN202210960003.4A
Authority: CN
Inventors: 李搏扬; 李伟铭; 张绘国; 林军; 徐庸辉; 张亮; 希瑞·梁; 苗春燕
Original assignee: Sino Singapore International Joint Research Institute
Current assignee: Sino Singapore International Joint Research Institute
Priority date: 2022-05-27
Filing date: 2022-08-11
Publication date: 2022-10-25
Also published as: CN114926502A

Abstract

The invention discloses a pig detection and tracking method with rapid adaptability, which comprises the following steps: inputting collected video data without labels of a large number of pigs into a Vision Transformer model; pre-training the Vision Transformer model to obtain a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model; inputting a very small amount of position labeling picture data with pigs into a pig detection model to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model; inputting the obtained characteristic information and the pig frame information into a pig tracking module; evaluating a pair of pig frames with adjacent frames being the same pig, and outputting pig tracking and labeling information; the invention can realize the detection and tracking of the pig only by a small amount of labeled data and obtain better tracking effect.

Description

Pig detection and tracking method and device with rapid adaptability

Technical Field

The invention belongs to the technical field of deep learning, and particularly relates to a pig detection and tracking method and device with rapid adaptability.

Background

The camera is installed in the farm, the camera is used for shooting and recording the pigs, the positions of the pigs are detected and continuously tracked from the obtained video data, and the intelligent pig breeding system is an important application of the intelligent breeding industry. By continuously detecting the position of the pigs, the identity of each pig can be identified as well as early warnings of activity in the pen, diet status and disease.

However, conventional machine learning requires a large amount of manually labeled data. As most of pig farms in China are operated by small and medium-sized farmers, the design of the farms conforms to local conditions, environmental factors (such as illumination) and camera positions are greatly different, and the marking data of different farms are difficult to be used universally. The model trained on one farm has difficulty achieving higher performance on the other farm. The cost is higher for each farm individually labeled. Therefore, there is a pressing need to develop machine learning techniques that enable efficient learning using very little manually labeled data.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a pig detecting and tracking method and device with quick adaptability; the method can be quickly adaptive to different farm scenes, realizes detection and tracking of the pigs, and can obtain a better technical effect only by a small amount of labeled data.

In order to achieve the purpose, the invention provides a pig detection and tracking method with quick adaptability, which comprises the following steps:

step S1: collecting a large amount of unmarked video data of pigs in real time through a plurality of cameras installed on a farm;

step S2: inputting the unmarked video data into a Vision Transformer model;

and step S3: pre-training the Vision Transformer model by the non-labeled video data to obtain a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model;

and step S4: inputting a very small amount of data with the pig position labeling pictures into a pig detection model, training the pig detection model through the data to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model;

step S5: inputting the characteristic information obtained in the step S3 and the pig frame information obtained in the step S4 into a pig tracking module;

step S6: and evaluating a pair of pig frames with adjacent frames being the same pig by the pig tracking module, and outputting pig tracking and labeling information.

Preferably, the step S3 of pre-training the Vision Transformer model by using label-free video data includes the following steps;

step S31: extracting T frames of videos from input unmarked video data, wherein each frame is W in width and H in height, and one video can be expressed as a three-order tensor of T multiplied by W multiplied by H;

step S32: dividing the third order tensor into small squares of K frames multiplied by N pixels multiplied by M pixels; a total of (T/K) x (W/N) x (H/M) dice;

step S33: randomly deleting small squares of 50% -60% k frames × N pixels × M pixels, arranging each of the remaining small squares in pixel point order, and converting into a vector;

step S34: all the generated vectors pass through a full connection layer, the dimensionality of the vector output by the full connection layer is D dimension, the output D dimension vector is embedded into a position code to be changed into a D dimension vector with a position code, the position code is the only code which can identify the small square in the third-order tensor of T multiplied by W multiplied by H, and the D dimension vector with the position code is used as the input vector of a Transformer network;

step S35: calculating difference information between the pixel value of the original small square and a vector output by a transform network by using a square difference loss function, and feeding back the difference information to the model so as to perform gradient descent and optimize model parameters; through multiple gradient descent iterations, pixel information of all (T/K) × (W/N) × (H/M) small blocks can be predicted according to information contained in 40% -50% small blocks, and a pre-trained Vision Transformer model is passed.

Preferably, the step S4 of training the pig detection model by using a very small amount of image data with pig position marks includes the following steps;

step S41: dividing each picture marked with the position of the pig to obtain (W/N) × (H/M) small squares;

step S42: defining P anchor frames with different sizes on each square, using a single-layer full-connection network, taking the anchor frame containing the characteristics of the pig as a positive sample, taking the anchor frame without the characteristics of the pig as a negative sample, inputting the positive sample and the negative sample into the network, and finally obtaining a network which can classify the anchor frames into the pig anchor frames and non-pig anchor frames through training;

step S43: adjusting the position of an anchor point frame by using a single-layer full-connection network to enable the anchor point frame to be closer to a marked pig frame, wherein the pig frame is an image of pig position marking information, and the pig position marking information comprises pig center coordinates and height and width of the center coordinates from a boundary; taking the actual pig position information as a positive sample, judging the actual pig position information as the position information of the pig anchor point frame as a negative sample, and finally obtaining a network capable of outputting the pig frame information through training; this step trains only the two single-layer fully-connected networks; the Transformer network remains unchanged.

Preferably, the pig tracking module in the step S5 includes the following steps;

step S51: detecting all pig frames of two adjacent frames by the pig detection model;

step S52: arranging pixel points in each pig frame in sequence, converting the pixel points into vectors, inputting the vectors into a pre-trained Vision Transformer network, and extracting characteristic information of each pig frame;

step S53: combining every two pigs of adjacent frames, and calculating the cosine similarity of the pigs by using the characteristic information;

step S54: finding the best match between all pig frames of the previous frame and all pig frames of the next frame by using Hungarian algorithm; no network training is needed, and the requirement on training data is greatly reduced.

The invention also provides a device adopting the pig detection and tracking method with the rapid adaptability, which comprises a plurality of cameras arranged in the farm, wherein the cameras pick up the pigs and acquire a large amount of unmarked video data of the pigs in real time;

uploading the unmarked video data and storing the unmarked video data into a computer server, wherein a Vision Transformer model, a pig detection model and a pig tracking model are also loaded on the computer server;

the video Transformer model is pre-trained by taking the label-free video data as input; obtaining a pre-trained Vision Transformer model, wherein the pre-trained Vision Transformer model realizes the extraction of characteristic information;

carrying out pig detection model training through data truly labeled on a small amount of farms, and extracting pig frame information by the trained pig detection model;

inputting the obtained characteristic information and the pig frame information into a pig tracking module together, so as to track pigs of the whole video; an output is obtained with the pig frame video data.

Compared with the prior art, the invention has the beneficial effects that:

firstly, inputting unmarked video data into a Vision Transformer model; obtaining a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model; secondly, inputting a very small amount of data with the pig position labeling pictures into a pig detection model to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model; thirdly, inputting the obtained characteristic information and the pig frame information into a pig tracking module; finally, a pair of pig frames with adjacent frames being the same pig is evaluated through a pig tracking module, and pig tracking marking information is output; the method can be quickly adaptive to different farm scenes, realizes detection and tracking of the pigs, and can obtain better technical effect only by a small amount of labeled data.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of the steps of a method for detecting and tracking a rapidly adaptive pig according to the present invention;

FIG. 2 is a flow diagram of Vision Transformer pre-training provided by the present invention;

FIG. 3 is a flow chart of a pig detection model provided by the present invention;

FIG. 4 is a flow chart of data testing provided by the present invention;

fig. 5 is a flow chart of the pig detecting and tracking device with rapid adaptability provided by the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are one embodiment of the present invention, and not all embodiments of the present invention. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.

Example one

Referring to fig. 1 to 3, a method for detecting and tracking a pig with fast adaptability is provided in an embodiment of the present invention.

First, a farm to which the present invention is applied will be explained: the novel pig farm is characterized in that a plurality of pigs are cultured inside the farm, a plurality of cameras are arranged inside the farm, the cameras pick up the pigs, and a large amount of unmarked video data of the pigs are collected in real time.

Next, a method for detecting and tracking pigs with quick adaptability will be described in detail, and fig. 1 shows a flowchart of the present invention.

Referring to fig. 1, the present invention mainly comprises the following steps: step S1: a plurality of cameras installed on a farm are used for acquiring unmarked video data of a large number of pigs in real time.

Step S2: and inputting the unmarked video data into a Vision Transformer model.

And step S3: pre-training the Vision Transformer model by the non-labeled video data to obtain a pre-trained Vision Transformer model; and outputting feature information from the pre-trained Vision Transformer model.

Further, as shown in fig. 2, the pre-training of the Vision Transformer model by using the label-free video data in step S3 includes the following steps;

step S32: dividing the third order tensor into small squares of K frames multiplied by N pixels multiplied by M pixels; a total of (T/K) x (W/N) x (H/M) squares;

step S33: randomly deleting small squares of K frames multiplied by N pixels multiplied by M pixels, arranging each residual small square according to the sequence of pixel points, and converting the residual small squares into vectors, wherein the K frames multiplied by N pixels multiplied by M pixels are 50% -60%;

And step S4: inputting a small amount of data with the pig position labeling pictures into a pig detection model, training the pig detection model through the data to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model.

Further, as shown in fig. 3, the step S4 of training the pig detection model by using a very small amount of image data with pig position marks includes the following steps;

step S43: adjusting the position of an anchor point frame by using a single-layer full-connection network to enable the anchor point frame to be closer to a marked pig frame, wherein the pig frame is visualized by pig position marking information, and the pig position marking information comprises pig center coordinates and the height and width of the center coordinates from a boundary; taking the actual pig position information as a positive sample and judging the actual pig position information as the position information of the pig anchor point frame as a negative sample, and finally obtaining a network capable of outputting the pig frame information through training; this step trains only the two single-layer fully-connected networks; the Transformer network remains unchanged.

Step S5: and (4) inputting the characteristic information obtained in the step (S3) and the pig frame information obtained in the step (S4) into a pig tracking module.

Specifically, the pig tracking module in the step S5 includes the following steps;

step S51: identifying all pig frames of two adjacent frames through the pig detection model;

step S53: combining every two pig frames of adjacent frames, and calculating the cosine similarity of the pig frames by using the characteristic information;

step S54: finding the best match between all pig boxes of the previous frame and all pig boxes of the next frame by using Hungarian algorithm; no network training is needed, and the requirement on training data is greatly reduced.

Example two

Referring to fig. 4, a second embodiment of the present invention provides an embodiment of testing data based on the pig detection and tracking method with fast adaptability described in the first embodiment, which includes the following steps:

step S91, inputting video data into a pre-trained Vision Transformer model to obtain characteristic information of a video;

s92, inputting the video data into a pig detection model to frame pig frame information in each frame from the original video data;

step S93, inputting the characteristic information and the pig frame information into a pig tracking module, calculating the cosine identity of each pair of pig frame combinations by combining the inter-frame pig frames, and obtaining the probability that the inter-frame pig frame combinations are combined into the same pig;

step S94, matching the optimal combination of adjacent frames of pigs as the same pig by using the Hungarian algorithm by using the probability as input;

and S95, repeating the steps from S91 to S93, and performing iterative processing on each frame to realize pig tracking of the whole video.

EXAMPLE III

Referring to fig. 5, a third embodiment of the present invention provides a device for detecting and tracking a pig with quick adaptability according to the first embodiment.

Referring to fig. 5, the device for detecting and tracking pigs with fast adaptability includes a plurality of cameras installed in a farm, wherein the cameras take a picture of the pigs and acquire a large amount of unmarked video data of the pigs in real time;

the video Transformer model is pre-trained by taking the label-free video data as input; obtaining a pre-trained Vision Transformer model, and extracting characteristic information by the pre-trained Vision Transformer model;

In conclusion, the method can be quickly adaptive to different farm scenes, detection and tracking of the pigs are realized, and a good technical effect can be achieved only by a small amount of labeled data.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A pig detection and tracking method with rapid adaptability is characterized in that: the method comprises the following steps:

step S1: a plurality of cameras installed on a farm are used for acquiring unmarked video data of a large number of pigs in real time;

step S2: inputting the unmarked video data into a Vision Transformer model;

2. The method for detecting and tracking pigs with rapid adaptability according to claim 1, characterized in that: the step S3 of pre-training the Vision Transformer model by using the annotated video data comprises the following steps;

step S32: dividing the third-order tensor into small squares of K frames multiplied by N pixels multiplied by M pixels; a total of (T/K) x (W/N) x (H/M) squares;

3. The method for detecting and tracking pigs with rapid adaptability according to claim 2, characterized in that: the step S4 of training the pig detection model by using a small amount of data with the pig position labeling pictures comprises the following steps;

step S42: defining P anchor points with different sizes on each square, using a single-layer full-connection network, taking the anchor points containing the characteristics of pigs as positive samples and the anchor points not containing the characteristics of pigs as negative samples to be input into the network, and finally obtaining a network which can classify the anchor points into the pig anchor points and non-pig anchor points through training;

step S43: the method comprises the steps that a single-layer full-connection network is used, the position of an anchor point frame is adjusted to be closer to a marked pig frame, the pig frame is an image of pig position marking information, the actual pig position information is used as a positive sample, the position information of the pig anchor point frame is judged to be a negative sample, and the network capable of outputting the pig frame information is finally obtained through training; this step trains only the two single-layer fully-connected networks; the Transformer network remains unchanged.

4. The method for detecting and tracking rapidly adaptive pigs according to claim 3, characterized in that: the pig tracking module in the step S5 comprises the following steps;

5. A device adopting the pig detection and tracking method with the rapid adaptability of any one of claims 1 to 4 is characterized by comprising a plurality of cameras arranged in a farm, wherein the cameras take pictures of the pigs, and a large amount of video data without marks of the pigs are acquired in real time;

inputting the obtained characteristic information and the pig frame information into a pig tracking module together, so as to track pigs in the whole video; an output is obtained with the pig frame video data.