CN115239766A - Pig detection and tracking method and device with rapid adaptability - Google Patents

Pig detection and tracking method and device with rapid adaptability Download PDF

Info

Publication number
CN115239766A
CN115239766A CN202210960003.4A CN202210960003A CN115239766A CN 115239766 A CN115239766 A CN 115239766A CN 202210960003 A CN202210960003 A CN 202210960003A CN 115239766 A CN115239766 A CN 115239766A
Authority
CN
China
Prior art keywords
pig
information
model
frame
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210960003.4A
Other languages
Chinese (zh)
Inventor
李搏扬
李伟铭
张绘国
林军
徐庸辉
张亮
希瑞·梁
苗春燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sino Singapore International Joint Research Institute
Original Assignee
Sino Singapore International Joint Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sino Singapore International Joint Research Institute filed Critical Sino Singapore International Joint Research Institute
Publication of CN115239766A publication Critical patent/CN115239766A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/292Multi-camera tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7753Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pig detection and tracking method with rapid adaptability, which comprises the following steps: inputting collected video data without labels of a large number of pigs into a Vision Transformer model; pre-training the Vision Transformer model to obtain a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model; inputting a very small amount of position labeling picture data with pigs into a pig detection model to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model; inputting the obtained characteristic information and the pig frame information into a pig tracking module; evaluating a pair of pig frames with adjacent frames being the same pig, and outputting pig tracking and labeling information; the invention can realize the detection and tracking of the pig only by a small amount of labeled data and obtain better tracking effect.

Description

Pig detection and tracking method and device with rapid adaptability
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a pig detection and tracking method and device with rapid adaptability.
Background
The camera is installed in the farm, the camera is used for shooting and recording the pigs, the positions of the pigs are detected and continuously tracked from the obtained video data, and the intelligent pig breeding system is an important application of the intelligent breeding industry. By continuously detecting the position of the pigs, the identity of each pig can be identified as well as early warnings of activity in the pen, diet status and disease.
However, conventional machine learning requires a large amount of manually labeled data. As most of pig farms in China are operated by small and medium-sized farmers, the design of the farms conforms to local conditions, environmental factors (such as illumination) and camera positions are greatly different, and the marking data of different farms are difficult to be used universally. The model trained on one farm has difficulty achieving higher performance on the other farm. The cost is higher for each farm individually labeled. Therefore, there is a pressing need to develop machine learning techniques that enable efficient learning using very little manually labeled data.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a pig detecting and tracking method and device with quick adaptability; the method can be quickly adaptive to different farm scenes, realizes detection and tracking of the pigs, and can obtain a better technical effect only by a small amount of labeled data.
In order to achieve the purpose, the invention provides a pig detection and tracking method with quick adaptability, which comprises the following steps:
step S1: collecting a large amount of unmarked video data of pigs in real time through a plurality of cameras installed on a farm;
step S2: inputting the unmarked video data into a Vision Transformer model;
and step S3: pre-training the Vision Transformer model by the non-labeled video data to obtain a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model;
and step S4: inputting a very small amount of data with the pig position labeling pictures into a pig detection model, training the pig detection model through the data to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model;
step S5: inputting the characteristic information obtained in the step S3 and the pig frame information obtained in the step S4 into a pig tracking module;
step S6: and evaluating a pair of pig frames with adjacent frames being the same pig by the pig tracking module, and outputting pig tracking and labeling information.
Preferably, the step S3 of pre-training the Vision Transformer model by using label-free video data includes the following steps;
step S31: extracting T frames of videos from input unmarked video data, wherein each frame is W in width and H in height, and one video can be expressed as a three-order tensor of T multiplied by W multiplied by H;
step S32: dividing the third order tensor into small squares of K frames multiplied by N pixels multiplied by M pixels; a total of (T/K) x (W/N) x (H/M) dice;
step S33: randomly deleting small squares of 50% -60% k frames × N pixels × M pixels, arranging each of the remaining small squares in pixel point order, and converting into a vector;
step S34: all the generated vectors pass through a full connection layer, the dimensionality of the vector output by the full connection layer is D dimension, the output D dimension vector is embedded into a position code to be changed into a D dimension vector with a position code, the position code is the only code which can identify the small square in the third-order tensor of T multiplied by W multiplied by H, and the D dimension vector with the position code is used as the input vector of a Transformer network;
step S35: calculating difference information between the pixel value of the original small square and a vector output by a transform network by using a square difference loss function, and feeding back the difference information to the model so as to perform gradient descent and optimize model parameters; through multiple gradient descent iterations, pixel information of all (T/K) × (W/N) × (H/M) small blocks can be predicted according to information contained in 40% -50% small blocks, and a pre-trained Vision Transformer model is passed.
Preferably, the step S4 of training the pig detection model by using a very small amount of image data with pig position marks includes the following steps;
step S41: dividing each picture marked with the position of the pig to obtain (W/N) × (H/M) small squares;
step S42: defining P anchor frames with different sizes on each square, using a single-layer full-connection network, taking the anchor frame containing the characteristics of the pig as a positive sample, taking the anchor frame without the characteristics of the pig as a negative sample, inputting the positive sample and the negative sample into the network, and finally obtaining a network which can classify the anchor frames into the pig anchor frames and non-pig anchor frames through training;
step S43: adjusting the position of an anchor point frame by using a single-layer full-connection network to enable the anchor point frame to be closer to a marked pig frame, wherein the pig frame is an image of pig position marking information, and the pig position marking information comprises pig center coordinates and height and width of the center coordinates from a boundary; taking the actual pig position information as a positive sample, judging the actual pig position information as the position information of the pig anchor point frame as a negative sample, and finally obtaining a network capable of outputting the pig frame information through training; this step trains only the two single-layer fully-connected networks; the Transformer network remains unchanged.
Preferably, the pig tracking module in the step S5 includes the following steps;
step S51: detecting all pig frames of two adjacent frames by the pig detection model;
step S52: arranging pixel points in each pig frame in sequence, converting the pixel points into vectors, inputting the vectors into a pre-trained Vision Transformer network, and extracting characteristic information of each pig frame;
step S53: combining every two pigs of adjacent frames, and calculating the cosine similarity of the pigs by using the characteristic information;
step S54: finding the best match between all pig frames of the previous frame and all pig frames of the next frame by using Hungarian algorithm; no network training is needed, and the requirement on training data is greatly reduced.
The invention also provides a device adopting the pig detection and tracking method with the rapid adaptability, which comprises a plurality of cameras arranged in the farm, wherein the cameras pick up the pigs and acquire a large amount of unmarked video data of the pigs in real time;
uploading the unmarked video data and storing the unmarked video data into a computer server, wherein a Vision Transformer model, a pig detection model and a pig tracking model are also loaded on the computer server;
the video Transformer model is pre-trained by taking the label-free video data as input; obtaining a pre-trained Vision Transformer model, wherein the pre-trained Vision Transformer model realizes the extraction of characteristic information;
carrying out pig detection model training through data truly labeled on a small amount of farms, and extracting pig frame information by the trained pig detection model;
inputting the obtained characteristic information and the pig frame information into a pig tracking module together, so as to track pigs of the whole video; an output is obtained with the pig frame video data.
Compared with the prior art, the invention has the beneficial effects that:
firstly, inputting unmarked video data into a Vision Transformer model; obtaining a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model; secondly, inputting a very small amount of data with the pig position labeling pictures into a pig detection model to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model; thirdly, inputting the obtained characteristic information and the pig frame information into a pig tracking module; finally, a pair of pig frames with adjacent frames being the same pig is evaluated through a pig tracking module, and pig tracking marking information is output; the method can be quickly adaptive to different farm scenes, realizes detection and tracking of the pigs, and can obtain better technical effect only by a small amount of labeled data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of the steps of a method for detecting and tracking a rapidly adaptive pig according to the present invention;
FIG. 2 is a flow diagram of Vision Transformer pre-training provided by the present invention;
FIG. 3 is a flow chart of a pig detection model provided by the present invention;
FIG. 4 is a flow chart of data testing provided by the present invention;
fig. 5 is a flow chart of the pig detecting and tracking device with rapid adaptability provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are one embodiment of the present invention, and not all embodiments of the present invention. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.
Example one
Referring to fig. 1 to 3, a method for detecting and tracking a pig with fast adaptability is provided in an embodiment of the present invention.
First, a farm to which the present invention is applied will be explained: the novel pig farm is characterized in that a plurality of pigs are cultured inside the farm, a plurality of cameras are arranged inside the farm, the cameras pick up the pigs, and a large amount of unmarked video data of the pigs are collected in real time.
Next, a method for detecting and tracking pigs with quick adaptability will be described in detail, and fig. 1 shows a flowchart of the present invention.
Referring to fig. 1, the present invention mainly comprises the following steps: step S1: a plurality of cameras installed on a farm are used for acquiring unmarked video data of a large number of pigs in real time.
Step S2: and inputting the unmarked video data into a Vision Transformer model.
And step S3: pre-training the Vision Transformer model by the non-labeled video data to obtain a pre-trained Vision Transformer model; and outputting feature information from the pre-trained Vision Transformer model.
Further, as shown in fig. 2, the pre-training of the Vision Transformer model by using the label-free video data in step S3 includes the following steps;
step S31: extracting T frames of videos from input unmarked video data, wherein each frame is W in width and H in height, and one video can be expressed as a three-order tensor of T multiplied by W multiplied by H;
step S32: dividing the third order tensor into small squares of K frames multiplied by N pixels multiplied by M pixels; a total of (T/K) x (W/N) x (H/M) squares;
step S33: randomly deleting small squares of K frames multiplied by N pixels multiplied by M pixels, arranging each residual small square according to the sequence of pixel points, and converting the residual small squares into vectors, wherein the K frames multiplied by N pixels multiplied by M pixels are 50% -60%;
step S34: all the generated vectors pass through a full connection layer, the dimensionality of the vector output by the full connection layer is D dimension, the output D dimension vector is embedded into a position code to be changed into a D dimension vector with a position code, the position code is the only code which can identify the small square in the third-order tensor of T multiplied by W multiplied by H, and the D dimension vector with the position code is used as the input vector of a Transformer network;
step S35: calculating difference information between the pixel value of the original small square and a vector output by a transform network by using a square difference loss function, and feeding back the difference information to the model so as to perform gradient descent and optimize model parameters; through multiple gradient descent iterations, pixel information of all (T/K) × (W/N) × (H/M) small blocks can be predicted according to information contained in 40% -50% small blocks, and a pre-trained Vision Transformer model is passed.
And step S4: inputting a small amount of data with the pig position labeling pictures into a pig detection model, training the pig detection model through the data to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model.
Further, as shown in fig. 3, the step S4 of training the pig detection model by using a very small amount of image data with pig position marks includes the following steps;
step S41: dividing each picture marked with the position of the pig to obtain (W/N) × (H/M) small squares;
step S42: defining P anchor frames with different sizes on each square, using a single-layer full-connection network, taking the anchor frame containing the characteristics of the pig as a positive sample, taking the anchor frame without the characteristics of the pig as a negative sample, inputting the positive sample and the negative sample into the network, and finally obtaining a network which can classify the anchor frames into the pig anchor frames and non-pig anchor frames through training;
step S43: adjusting the position of an anchor point frame by using a single-layer full-connection network to enable the anchor point frame to be closer to a marked pig frame, wherein the pig frame is visualized by pig position marking information, and the pig position marking information comprises pig center coordinates and the height and width of the center coordinates from a boundary; taking the actual pig position information as a positive sample and judging the actual pig position information as the position information of the pig anchor point frame as a negative sample, and finally obtaining a network capable of outputting the pig frame information through training; this step trains only the two single-layer fully-connected networks; the Transformer network remains unchanged.
Step S5: and (4) inputting the characteristic information obtained in the step (S3) and the pig frame information obtained in the step (S4) into a pig tracking module.
Specifically, the pig tracking module in the step S5 includes the following steps;
step S51: identifying all pig frames of two adjacent frames through the pig detection model;
step S52: arranging pixel points in each pig frame in sequence, converting the pixel points into vectors, inputting the vectors into a pre-trained Vision Transformer network, and extracting characteristic information of each pig frame;
step S53: combining every two pig frames of adjacent frames, and calculating the cosine similarity of the pig frames by using the characteristic information;
step S54: finding the best match between all pig boxes of the previous frame and all pig boxes of the next frame by using Hungarian algorithm; no network training is needed, and the requirement on training data is greatly reduced.
Step S6: and evaluating a pair of pig frames with adjacent frames being the same pig by the pig tracking module, and outputting pig tracking and labeling information.
Example two
Referring to fig. 4, a second embodiment of the present invention provides an embodiment of testing data based on the pig detection and tracking method with fast adaptability described in the first embodiment, which includes the following steps:
step S91, inputting video data into a pre-trained Vision Transformer model to obtain characteristic information of a video;
s92, inputting the video data into a pig detection model to frame pig frame information in each frame from the original video data;
step S93, inputting the characteristic information and the pig frame information into a pig tracking module, calculating the cosine identity of each pair of pig frame combinations by combining the inter-frame pig frames, and obtaining the probability that the inter-frame pig frame combinations are combined into the same pig;
step S94, matching the optimal combination of adjacent frames of pigs as the same pig by using the Hungarian algorithm by using the probability as input;
and S95, repeating the steps from S91 to S93, and performing iterative processing on each frame to realize pig tracking of the whole video.
EXAMPLE III
Referring to fig. 5, a third embodiment of the present invention provides a device for detecting and tracking a pig with quick adaptability according to the first embodiment.
Referring to fig. 5, the device for detecting and tracking pigs with fast adaptability includes a plurality of cameras installed in a farm, wherein the cameras take a picture of the pigs and acquire a large amount of unmarked video data of the pigs in real time;
uploading the unmarked video data and storing the unmarked video data into a computer server, wherein a Vision Transformer model, a pig detection model and a pig tracking model are also loaded on the computer server;
the video Transformer model is pre-trained by taking the label-free video data as input; obtaining a pre-trained Vision Transformer model, and extracting characteristic information by the pre-trained Vision Transformer model;
carrying out pig detection model training through data truly labeled on a small amount of farms, and extracting pig frame information by the trained pig detection model;
inputting the obtained characteristic information and the pig frame information into a pig tracking module together, so as to track pigs of the whole video; an output is obtained with the pig frame video data.
In conclusion, the method can be quickly adaptive to different farm scenes, detection and tracking of the pigs are realized, and a good technical effect can be achieved only by a small amount of labeled data.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (5)

1. A pig detection and tracking method with rapid adaptability is characterized in that: the method comprises the following steps:
step S1: a plurality of cameras installed on a farm are used for acquiring unmarked video data of a large number of pigs in real time;
step S2: inputting the unmarked video data into a Vision Transformer model;
and step S3: pre-training the Vision Transformer model by the non-labeled video data to obtain a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model;
and step S4: inputting a very small amount of data with the pig position labeling pictures into a pig detection model, training the pig detection model through the data to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model;
step S5: inputting the characteristic information obtained in the step S3 and the pig frame information obtained in the step S4 into a pig tracking module;
step S6: and evaluating a pair of pig frames with adjacent frames being the same pig by the pig tracking module, and outputting pig tracking and labeling information.
2. The method for detecting and tracking pigs with rapid adaptability according to claim 1, characterized in that: the step S3 of pre-training the Vision Transformer model by using the annotated video data comprises the following steps;
step S31: extracting T frames of videos from input unmarked video data, wherein each frame is W in width and H in height, and one video can be expressed as a three-order tensor of T multiplied by W multiplied by H;
step S32: dividing the third-order tensor into small squares of K frames multiplied by N pixels multiplied by M pixels; a total of (T/K) x (W/N) x (H/M) squares;
step S33: randomly deleting small squares of K frames multiplied by N pixels multiplied by M pixels, arranging each residual small square according to the sequence of pixel points, and converting the residual small squares into vectors, wherein the K frames multiplied by N pixels multiplied by M pixels are 50% -60%;
step S34: all the generated vectors pass through a full connection layer, the dimensionality of the vector output by the full connection layer is D dimension, the output D dimension vector is embedded into a position code to be changed into a D dimension vector with a position code, the position code is the only code which can identify the small square in the third-order tensor of T multiplied by W multiplied by H, and the D dimension vector with the position code is used as the input vector of a Transformer network;
step S35: calculating difference information between the pixel value of the original small square and a vector output by a transform network by using a square difference loss function, and feeding back the difference information to the model so as to perform gradient descent and optimize model parameters; through multiple gradient descent iterations, pixel information of all (T/K) × (W/N) × (H/M) small blocks can be predicted according to information contained in 40% -50% small blocks, and a pre-trained Vision Transformer model is passed.
3. The method for detecting and tracking pigs with rapid adaptability according to claim 2, characterized in that: the step S4 of training the pig detection model by using a small amount of data with the pig position labeling pictures comprises the following steps;
step S41: dividing each picture marked with the position of the pig to obtain (W/N) × (H/M) small squares;
step S42: defining P anchor points with different sizes on each square, using a single-layer full-connection network, taking the anchor points containing the characteristics of pigs as positive samples and the anchor points not containing the characteristics of pigs as negative samples to be input into the network, and finally obtaining a network which can classify the anchor points into the pig anchor points and non-pig anchor points through training;
step S43: the method comprises the steps that a single-layer full-connection network is used, the position of an anchor point frame is adjusted to be closer to a marked pig frame, the pig frame is an image of pig position marking information, the actual pig position information is used as a positive sample, the position information of the pig anchor point frame is judged to be a negative sample, and the network capable of outputting the pig frame information is finally obtained through training; this step trains only the two single-layer fully-connected networks; the Transformer network remains unchanged.
4. The method for detecting and tracking rapidly adaptive pigs according to claim 3, characterized in that: the pig tracking module in the step S5 comprises the following steps;
step S51: detecting all pig frames of two adjacent frames by the pig detection model;
step S52: arranging pixel points in each pig frame in sequence, converting the pixel points into vectors, inputting the vectors into a pre-trained Vision Transformer network, and extracting characteristic information of each pig frame;
step S53: combining every two pig frames of adjacent frames, and calculating the cosine similarity of the pig frames by using the characteristic information;
step S54: finding the best match between all pig frames of the previous frame and all pig frames of the next frame by using Hungarian algorithm; no network training is needed, and the requirement on training data is greatly reduced.
5. A device adopting the pig detection and tracking method with the rapid adaptability of any one of claims 1 to 4 is characterized by comprising a plurality of cameras arranged in a farm, wherein the cameras take pictures of the pigs, and a large amount of video data without marks of the pigs are acquired in real time;
uploading the unmarked video data and storing the unmarked video data into a computer server, wherein a Vision Transformer model, a pig detection model and a pig tracking model are also loaded on the computer server;
the video Transformer model is pre-trained by taking the label-free video data as input; obtaining a pre-trained Vision Transformer model, wherein the pre-trained Vision Transformer model realizes the extraction of characteristic information;
carrying out pig detection model training through data truly labeled on a small amount of farms, and extracting pig frame information by the trained pig detection model;
inputting the obtained characteristic information and the pig frame information into a pig tracking module together, so as to track pigs in the whole video; an output is obtained with the pig frame video data.
CN202210960003.4A 2022-05-27 2022-08-11 Pig detection and tracking method and device with rapid adaptability Pending CN115239766A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022105849584 2022-05-27
CN202210584958.4A CN114926502A (en) 2022-05-27 2022-05-27 Pig detection and tracking method and device with rapid adaptability

Publications (1)

Publication Number Publication Date
CN115239766A true CN115239766A (en) 2022-10-25

Family

ID=82810206

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210584958.4A Pending CN114926502A (en) 2022-05-27 2022-05-27 Pig detection and tracking method and device with rapid adaptability
CN202210960003.4A Pending CN115239766A (en) 2022-05-27 2022-08-11 Pig detection and tracking method and device with rapid adaptability

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202210584958.4A Pending CN114926502A (en) 2022-05-27 2022-05-27 Pig detection and tracking method and device with rapid adaptability

Country Status (1)

Country Link
CN (2) CN114926502A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091786B (en) * 2023-04-11 2023-06-20 厦门农芯数字科技有限公司 Holographic body ruler self-coding method, system, equipment and storage medium for pig weight estimation

Also Published As

Publication number Publication date
CN114926502A (en) 2022-08-19

Similar Documents

Publication Publication Date Title
Boominathan et al. Crowdnet: A deep convolutional network for dense crowd counting
US9710695B2 (en) Characterizing pathology images with statistical analysis of local neural network responses
CN110717526A (en) Unsupervised transfer learning method based on graph convolution network
TWI743837B (en) Training data increment method, electronic apparatus and computer-readable medium
CN115239766A (en) Pig detection and tracking method and device with rapid adaptability
CN113660484B (en) Audio and video attribute comparison method, system, terminal and medium based on audio and video content
Yusufu et al. A video text detection and tracking system
CN112381175A (en) Circuit board identification and analysis method based on image processing
CN112925905B (en) Method, device, electronic equipment and storage medium for extracting video subtitles
CN114581709A (en) Model training, method, apparatus, and medium for recognizing target in medical image
CN113408630A (en) Transformer substation indicator lamp state identification method
CN115272340B (en) Industrial product defect detection method and device
CN110728316A (en) Classroom behavior detection method, system, device and storage medium
CN113807218B (en) Layout analysis method, device, computer equipment and storage medium
CN115272223A (en) Image reproduction automatic detection technology based on deep learning
CN114298173A (en) Data processing method, device and equipment
Rafi et al. L2-constrained remnet for camera model identification and image manipulation detection
CN112258453A (en) Positioning landmark detection method for industrial fault inspection robot
El Maghraby Improving Custom Vision cognitive services model
CN113283279B (en) Multi-target tracking method and device in video based on deep learning
KR101484531B1 (en) Method and system of panel area detection using sparse representation based on image segmentation
Ouenniche et al. A Deep Learning-Based Approach for Camera Motion Classification
CN112329514A (en) Book checking method and system based on fast R-CNN algorithm
CN116189187A (en) Nameplate color and text integrated detection method applied to factory monitoring scene
Ibrahim et al. Gait silhouette extraction from videos containing illumination variates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination