CN115239766A - Pig detection and tracking method and device with rapid adaptability - Google Patents
Pig detection and tracking method and device with rapid adaptability Download PDFInfo
- Publication number
- CN115239766A CN115239766A CN202210960003.4A CN202210960003A CN115239766A CN 115239766 A CN115239766 A CN 115239766A CN 202210960003 A CN202210960003 A CN 202210960003A CN 115239766 A CN115239766 A CN 115239766A
- Authority
- CN
- China
- Prior art keywords
- pig
- information
- model
- frame
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/292—Multi-camera tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7753—Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a pig detection and tracking method with rapid adaptability, which comprises the following steps: inputting collected video data without labels of a large number of pigs into a Vision Transformer model; pre-training the Vision Transformer model to obtain a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model; inputting a very small amount of position labeling picture data with pigs into a pig detection model to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model; inputting the obtained characteristic information and the pig frame information into a pig tracking module; evaluating a pair of pig frames with adjacent frames being the same pig, and outputting pig tracking and labeling information; the invention can realize the detection and tracking of the pig only by a small amount of labeled data and obtain better tracking effect.
Description
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a pig detection and tracking method and device with rapid adaptability.
Background
The camera is installed in the farm, the camera is used for shooting and recording the pigs, the positions of the pigs are detected and continuously tracked from the obtained video data, and the intelligent pig breeding system is an important application of the intelligent breeding industry. By continuously detecting the position of the pigs, the identity of each pig can be identified as well as early warnings of activity in the pen, diet status and disease.
However, conventional machine learning requires a large amount of manually labeled data. As most of pig farms in China are operated by small and medium-sized farmers, the design of the farms conforms to local conditions, environmental factors (such as illumination) and camera positions are greatly different, and the marking data of different farms are difficult to be used universally. The model trained on one farm has difficulty achieving higher performance on the other farm. The cost is higher for each farm individually labeled. Therefore, there is a pressing need to develop machine learning techniques that enable efficient learning using very little manually labeled data.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a pig detecting and tracking method and device with quick adaptability; the method can be quickly adaptive to different farm scenes, realizes detection and tracking of the pigs, and can obtain a better technical effect only by a small amount of labeled data.
In order to achieve the purpose, the invention provides a pig detection and tracking method with quick adaptability, which comprises the following steps:
step S1: collecting a large amount of unmarked video data of pigs in real time through a plurality of cameras installed on a farm;
step S2: inputting the unmarked video data into a Vision Transformer model;
and step S3: pre-training the Vision Transformer model by the non-labeled video data to obtain a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model;
and step S4: inputting a very small amount of data with the pig position labeling pictures into a pig detection model, training the pig detection model through the data to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model;
step S5: inputting the characteristic information obtained in the step S3 and the pig frame information obtained in the step S4 into a pig tracking module;
step S6: and evaluating a pair of pig frames with adjacent frames being the same pig by the pig tracking module, and outputting pig tracking and labeling information.
Preferably, the step S3 of pre-training the Vision Transformer model by using label-free video data includes the following steps;
step S31: extracting T frames of videos from input unmarked video data, wherein each frame is W in width and H in height, and one video can be expressed as a three-order tensor of T multiplied by W multiplied by H;
step S32: dividing the third order tensor into small squares of K frames multiplied by N pixels multiplied by M pixels; a total of (T/K) x (W/N) x (H/M) dice;
step S33: randomly deleting small squares of 50% -60% k frames × N pixels × M pixels, arranging each of the remaining small squares in pixel point order, and converting into a vector;
step S34: all the generated vectors pass through a full connection layer, the dimensionality of the vector output by the full connection layer is D dimension, the output D dimension vector is embedded into a position code to be changed into a D dimension vector with a position code, the position code is the only code which can identify the small square in the third-order tensor of T multiplied by W multiplied by H, and the D dimension vector with the position code is used as the input vector of a Transformer network;
step S35: calculating difference information between the pixel value of the original small square and a vector output by a transform network by using a square difference loss function, and feeding back the difference information to the model so as to perform gradient descent and optimize model parameters; through multiple gradient descent iterations, pixel information of all (T/K) × (W/N) × (H/M) small blocks can be predicted according to information contained in 40% -50% small blocks, and a pre-trained Vision Transformer model is passed.
Preferably, the step S4 of training the pig detection model by using a very small amount of image data with pig position marks includes the following steps;
step S41: dividing each picture marked with the position of the pig to obtain (W/N) × (H/M) small squares;
step S42: defining P anchor frames with different sizes on each square, using a single-layer full-connection network, taking the anchor frame containing the characteristics of the pig as a positive sample, taking the anchor frame without the characteristics of the pig as a negative sample, inputting the positive sample and the negative sample into the network, and finally obtaining a network which can classify the anchor frames into the pig anchor frames and non-pig anchor frames through training;
step S43: adjusting the position of an anchor point frame by using a single-layer full-connection network to enable the anchor point frame to be closer to a marked pig frame, wherein the pig frame is an image of pig position marking information, and the pig position marking information comprises pig center coordinates and height and width of the center coordinates from a boundary; taking the actual pig position information as a positive sample, judging the actual pig position information as the position information of the pig anchor point frame as a negative sample, and finally obtaining a network capable of outputting the pig frame information through training; this step trains only the two single-layer fully-connected networks; the Transformer network remains unchanged.
Preferably, the pig tracking module in the step S5 includes the following steps;
step S51: detecting all pig frames of two adjacent frames by the pig detection model;
step S52: arranging pixel points in each pig frame in sequence, converting the pixel points into vectors, inputting the vectors into a pre-trained Vision Transformer network, and extracting characteristic information of each pig frame;
step S53: combining every two pigs of adjacent frames, and calculating the cosine similarity of the pigs by using the characteristic information;
step S54: finding the best match between all pig frames of the previous frame and all pig frames of the next frame by using Hungarian algorithm; no network training is needed, and the requirement on training data is greatly reduced.
The invention also provides a device adopting the pig detection and tracking method with the rapid adaptability, which comprises a plurality of cameras arranged in the farm, wherein the cameras pick up the pigs and acquire a large amount of unmarked video data of the pigs in real time;
uploading the unmarked video data and storing the unmarked video data into a computer server, wherein a Vision Transformer model, a pig detection model and a pig tracking model are also loaded on the computer server;
the video Transformer model is pre-trained by taking the label-free video data as input; obtaining a pre-trained Vision Transformer model, wherein the pre-trained Vision Transformer model realizes the extraction of characteristic information;
carrying out pig detection model training through data truly labeled on a small amount of farms, and extracting pig frame information by the trained pig detection model;
inputting the obtained characteristic information and the pig frame information into a pig tracking module together, so as to track pigs of the whole video; an output is obtained with the pig frame video data.
Compared with the prior art, the invention has the beneficial effects that:
firstly, inputting unmarked video data into a Vision Transformer model; obtaining a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model; secondly, inputting a very small amount of data with the pig position labeling pictures into a pig detection model to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model; thirdly, inputting the obtained characteristic information and the pig frame information into a pig tracking module; finally, a pair of pig frames with adjacent frames being the same pig is evaluated through a pig tracking module, and pig tracking marking information is output; the method can be quickly adaptive to different farm scenes, realizes detection and tracking of the pigs, and can obtain better technical effect only by a small amount of labeled data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of the steps of a method for detecting and tracking a rapidly adaptive pig according to the present invention;
FIG. 2 is a flow diagram of Vision Transformer pre-training provided by the present invention;
FIG. 3 is a flow chart of a pig detection model provided by the present invention;
FIG. 4 is a flow chart of data testing provided by the present invention;
fig. 5 is a flow chart of the pig detecting and tracking device with rapid adaptability provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are one embodiment of the present invention, and not all embodiments of the present invention. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention without any creative work belong to the protection scope of the present invention.
Example one
Referring to fig. 1 to 3, a method for detecting and tracking a pig with fast adaptability is provided in an embodiment of the present invention.
First, a farm to which the present invention is applied will be explained: the novel pig farm is characterized in that a plurality of pigs are cultured inside the farm, a plurality of cameras are arranged inside the farm, the cameras pick up the pigs, and a large amount of unmarked video data of the pigs are collected in real time.
Next, a method for detecting and tracking pigs with quick adaptability will be described in detail, and fig. 1 shows a flowchart of the present invention.
Referring to fig. 1, the present invention mainly comprises the following steps: step S1: a plurality of cameras installed on a farm are used for acquiring unmarked video data of a large number of pigs in real time.
Step S2: and inputting the unmarked video data into a Vision Transformer model.
And step S3: pre-training the Vision Transformer model by the non-labeled video data to obtain a pre-trained Vision Transformer model; and outputting feature information from the pre-trained Vision Transformer model.
Further, as shown in fig. 2, the pre-training of the Vision Transformer model by using the label-free video data in step S3 includes the following steps;
step S31: extracting T frames of videos from input unmarked video data, wherein each frame is W in width and H in height, and one video can be expressed as a three-order tensor of T multiplied by W multiplied by H;
step S32: dividing the third order tensor into small squares of K frames multiplied by N pixels multiplied by M pixels; a total of (T/K) x (W/N) x (H/M) squares;
step S33: randomly deleting small squares of K frames multiplied by N pixels multiplied by M pixels, arranging each residual small square according to the sequence of pixel points, and converting the residual small squares into vectors, wherein the K frames multiplied by N pixels multiplied by M pixels are 50% -60%;
step S34: all the generated vectors pass through a full connection layer, the dimensionality of the vector output by the full connection layer is D dimension, the output D dimension vector is embedded into a position code to be changed into a D dimension vector with a position code, the position code is the only code which can identify the small square in the third-order tensor of T multiplied by W multiplied by H, and the D dimension vector with the position code is used as the input vector of a Transformer network;
step S35: calculating difference information between the pixel value of the original small square and a vector output by a transform network by using a square difference loss function, and feeding back the difference information to the model so as to perform gradient descent and optimize model parameters; through multiple gradient descent iterations, pixel information of all (T/K) × (W/N) × (H/M) small blocks can be predicted according to information contained in 40% -50% small blocks, and a pre-trained Vision Transformer model is passed.
And step S4: inputting a small amount of data with the pig position labeling pictures into a pig detection model, training the pig detection model through the data to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model.
Further, as shown in fig. 3, the step S4 of training the pig detection model by using a very small amount of image data with pig position marks includes the following steps;
step S41: dividing each picture marked with the position of the pig to obtain (W/N) × (H/M) small squares;
step S42: defining P anchor frames with different sizes on each square, using a single-layer full-connection network, taking the anchor frame containing the characteristics of the pig as a positive sample, taking the anchor frame without the characteristics of the pig as a negative sample, inputting the positive sample and the negative sample into the network, and finally obtaining a network which can classify the anchor frames into the pig anchor frames and non-pig anchor frames through training;
step S43: adjusting the position of an anchor point frame by using a single-layer full-connection network to enable the anchor point frame to be closer to a marked pig frame, wherein the pig frame is visualized by pig position marking information, and the pig position marking information comprises pig center coordinates and the height and width of the center coordinates from a boundary; taking the actual pig position information as a positive sample and judging the actual pig position information as the position information of the pig anchor point frame as a negative sample, and finally obtaining a network capable of outputting the pig frame information through training; this step trains only the two single-layer fully-connected networks; the Transformer network remains unchanged.
Step S5: and (4) inputting the characteristic information obtained in the step (S3) and the pig frame information obtained in the step (S4) into a pig tracking module.
Specifically, the pig tracking module in the step S5 includes the following steps;
step S51: identifying all pig frames of two adjacent frames through the pig detection model;
step S52: arranging pixel points in each pig frame in sequence, converting the pixel points into vectors, inputting the vectors into a pre-trained Vision Transformer network, and extracting characteristic information of each pig frame;
step S53: combining every two pig frames of adjacent frames, and calculating the cosine similarity of the pig frames by using the characteristic information;
step S54: finding the best match between all pig boxes of the previous frame and all pig boxes of the next frame by using Hungarian algorithm; no network training is needed, and the requirement on training data is greatly reduced.
Step S6: and evaluating a pair of pig frames with adjacent frames being the same pig by the pig tracking module, and outputting pig tracking and labeling information.
Example two
Referring to fig. 4, a second embodiment of the present invention provides an embodiment of testing data based on the pig detection and tracking method with fast adaptability described in the first embodiment, which includes the following steps:
step S91, inputting video data into a pre-trained Vision Transformer model to obtain characteristic information of a video;
s92, inputting the video data into a pig detection model to frame pig frame information in each frame from the original video data;
step S93, inputting the characteristic information and the pig frame information into a pig tracking module, calculating the cosine identity of each pair of pig frame combinations by combining the inter-frame pig frames, and obtaining the probability that the inter-frame pig frame combinations are combined into the same pig;
step S94, matching the optimal combination of adjacent frames of pigs as the same pig by using the Hungarian algorithm by using the probability as input;
and S95, repeating the steps from S91 to S93, and performing iterative processing on each frame to realize pig tracking of the whole video.
EXAMPLE III
Referring to fig. 5, a third embodiment of the present invention provides a device for detecting and tracking a pig with quick adaptability according to the first embodiment.
Referring to fig. 5, the device for detecting and tracking pigs with fast adaptability includes a plurality of cameras installed in a farm, wherein the cameras take a picture of the pigs and acquire a large amount of unmarked video data of the pigs in real time;
uploading the unmarked video data and storing the unmarked video data into a computer server, wherein a Vision Transformer model, a pig detection model and a pig tracking model are also loaded on the computer server;
the video Transformer model is pre-trained by taking the label-free video data as input; obtaining a pre-trained Vision Transformer model, and extracting characteristic information by the pre-trained Vision Transformer model;
carrying out pig detection model training through data truly labeled on a small amount of farms, and extracting pig frame information by the trained pig detection model;
inputting the obtained characteristic information and the pig frame information into a pig tracking module together, so as to track pigs of the whole video; an output is obtained with the pig frame video data.
In conclusion, the method can be quickly adaptive to different farm scenes, detection and tracking of the pigs are realized, and a good technical effect can be achieved only by a small amount of labeled data.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (5)
1. A pig detection and tracking method with rapid adaptability is characterized in that: the method comprises the following steps:
step S1: a plurality of cameras installed on a farm are used for acquiring unmarked video data of a large number of pigs in real time;
step S2: inputting the unmarked video data into a Vision Transformer model;
and step S3: pre-training the Vision Transformer model by the non-labeled video data to obtain a pre-trained Vision Transformer model; outputting characteristic information from the pre-trained Vision Transformer model;
and step S4: inputting a very small amount of data with the pig position labeling pictures into a pig detection model, training the pig detection model through the data to obtain a trained pig detection model, and outputting pig frame information from the trained pig detection model;
step S5: inputting the characteristic information obtained in the step S3 and the pig frame information obtained in the step S4 into a pig tracking module;
step S6: and evaluating a pair of pig frames with adjacent frames being the same pig by the pig tracking module, and outputting pig tracking and labeling information.
2. The method for detecting and tracking pigs with rapid adaptability according to claim 1, characterized in that: the step S3 of pre-training the Vision Transformer model by using the annotated video data comprises the following steps;
step S31: extracting T frames of videos from input unmarked video data, wherein each frame is W in width and H in height, and one video can be expressed as a three-order tensor of T multiplied by W multiplied by H;
step S32: dividing the third-order tensor into small squares of K frames multiplied by N pixels multiplied by M pixels; a total of (T/K) x (W/N) x (H/M) squares;
step S33: randomly deleting small squares of K frames multiplied by N pixels multiplied by M pixels, arranging each residual small square according to the sequence of pixel points, and converting the residual small squares into vectors, wherein the K frames multiplied by N pixels multiplied by M pixels are 50% -60%;
step S34: all the generated vectors pass through a full connection layer, the dimensionality of the vector output by the full connection layer is D dimension, the output D dimension vector is embedded into a position code to be changed into a D dimension vector with a position code, the position code is the only code which can identify the small square in the third-order tensor of T multiplied by W multiplied by H, and the D dimension vector with the position code is used as the input vector of a Transformer network;
step S35: calculating difference information between the pixel value of the original small square and a vector output by a transform network by using a square difference loss function, and feeding back the difference information to the model so as to perform gradient descent and optimize model parameters; through multiple gradient descent iterations, pixel information of all (T/K) × (W/N) × (H/M) small blocks can be predicted according to information contained in 40% -50% small blocks, and a pre-trained Vision Transformer model is passed.
3. The method for detecting and tracking pigs with rapid adaptability according to claim 2, characterized in that: the step S4 of training the pig detection model by using a small amount of data with the pig position labeling pictures comprises the following steps;
step S41: dividing each picture marked with the position of the pig to obtain (W/N) × (H/M) small squares;
step S42: defining P anchor points with different sizes on each square, using a single-layer full-connection network, taking the anchor points containing the characteristics of pigs as positive samples and the anchor points not containing the characteristics of pigs as negative samples to be input into the network, and finally obtaining a network which can classify the anchor points into the pig anchor points and non-pig anchor points through training;
step S43: the method comprises the steps that a single-layer full-connection network is used, the position of an anchor point frame is adjusted to be closer to a marked pig frame, the pig frame is an image of pig position marking information, the actual pig position information is used as a positive sample, the position information of the pig anchor point frame is judged to be a negative sample, and the network capable of outputting the pig frame information is finally obtained through training; this step trains only the two single-layer fully-connected networks; the Transformer network remains unchanged.
4. The method for detecting and tracking rapidly adaptive pigs according to claim 3, characterized in that: the pig tracking module in the step S5 comprises the following steps;
step S51: detecting all pig frames of two adjacent frames by the pig detection model;
step S52: arranging pixel points in each pig frame in sequence, converting the pixel points into vectors, inputting the vectors into a pre-trained Vision Transformer network, and extracting characteristic information of each pig frame;
step S53: combining every two pig frames of adjacent frames, and calculating the cosine similarity of the pig frames by using the characteristic information;
step S54: finding the best match between all pig frames of the previous frame and all pig frames of the next frame by using Hungarian algorithm; no network training is needed, and the requirement on training data is greatly reduced.
5. A device adopting the pig detection and tracking method with the rapid adaptability of any one of claims 1 to 4 is characterized by comprising a plurality of cameras arranged in a farm, wherein the cameras take pictures of the pigs, and a large amount of video data without marks of the pigs are acquired in real time;
uploading the unmarked video data and storing the unmarked video data into a computer server, wherein a Vision Transformer model, a pig detection model and a pig tracking model are also loaded on the computer server;
the video Transformer model is pre-trained by taking the label-free video data as input; obtaining a pre-trained Vision Transformer model, wherein the pre-trained Vision Transformer model realizes the extraction of characteristic information;
carrying out pig detection model training through data truly labeled on a small amount of farms, and extracting pig frame information by the trained pig detection model;
inputting the obtained characteristic information and the pig frame information into a pig tracking module together, so as to track pigs in the whole video; an output is obtained with the pig frame video data.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210584958.4A CN114926502A (en) | 2022-05-27 | 2022-05-27 | Pig detection and tracking method and device with rapid adaptability |
CN2022105849584 | 2022-05-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115239766A true CN115239766A (en) | 2022-10-25 |
Family
ID=82810206
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210584958.4A Pending CN114926502A (en) | 2022-05-27 | 2022-05-27 | Pig detection and tracking method and device with rapid adaptability |
CN202210960003.4A Pending CN115239766A (en) | 2022-05-27 | 2022-08-11 | Pig detection and tracking method and device with rapid adaptability |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210584958.4A Pending CN114926502A (en) | 2022-05-27 | 2022-05-27 | Pig detection and tracking method and device with rapid adaptability |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN114926502A (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116091786B (en) * | 2023-04-11 | 2023-06-20 | 厦门农芯数字科技有限公司 | Holographic body ruler self-coding method, system, equipment and storage medium for pig weight estimation |
-
2022
- 2022-05-27 CN CN202210584958.4A patent/CN114926502A/en active Pending
- 2022-08-11 CN CN202210960003.4A patent/CN115239766A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN114926502A (en) | 2022-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9710695B2 (en) | Characterizing pathology images with statistical analysis of local neural network responses | |
CN109784385A (en) | A kind of commodity automatic identifying method, system, device and storage medium | |
CN113660484B (en) | Audio and video attribute comparison method, system, terminal and medium based on audio and video content | |
CN112381175A (en) | Circuit board identification and analysis method based on image processing | |
TWI743837B (en) | Training data increment method, electronic apparatus and computer-readable medium | |
CN112488071A (en) | Method, device, electronic equipment and storage medium for extracting pedestrian features | |
CN112925905B (en) | Method, device, electronic equipment and storage medium for extracting video subtitles | |
CN115239766A (en) | Pig detection and tracking method and device with rapid adaptability | |
CN111199050A (en) | System for automatically desensitizing medical records and application | |
CN113807218B (en) | Layout analysis method, device, computer equipment and storage medium | |
CN109684953B (en) | Method and device for pig tracking based on target detection and particle filter algorithm | |
Kavitha et al. | Multiple Object Recognition Using OpenCV | |
CN113408630A (en) | Transformer substation indicator lamp state identification method | |
CN117218672A (en) | Deep learning-based medical records text recognition method and system | |
CN115272340B (en) | Industrial product defect detection method and device | |
CN117011216A (en) | Defect detection method and device, electronic equipment and storage medium | |
CN115272223A (en) | Image reproduction automatic detection technology based on deep learning | |
CN112258453A (en) | Positioning landmark detection method for industrial fault inspection robot | |
CN113283279B (en) | Multi-target tracking method and device in video based on deep learning | |
Ouenniche et al. | A deep learning-based approach for camera motion classification | |
CN112329514A (en) | Book checking method and system based on fast R-CNN algorithm | |
CN116189187A (en) | Nameplate color and text integrated detection method applied to factory monitoring scene | |
Ibrahim et al. | Gait silhouette extraction from videos containing illumination variates | |
Mehta et al. | Text Detection from Scene Videos having Blurriness and Text of Different Sizes | |
Nackathaya et al. | Automated Vehicle Damage Detection and Repair Cost Estimation Using Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |