CN109711389B

CN109711389B - Lactating sow posture conversion recognition method based on Faster R-CNN and HMM

Info

Publication number: CN109711389B
Application number: CN201910041539.4A
Authority: CN
Inventors: 薛月菊; 杨晓帆; 郑婵; 陈畅新; 王卫星; 甘海明
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2019-01-16
Filing date: 2019-01-16
Publication date: 2023-06-23
Anticipated expiration: 2039-01-16
Also published as: CN109711389A

Abstract

The invention discloses a lactating sow posture conversion and identification method based on a fast R-CNN and an HMM, which comprises the following steps: 1. enhancing the depth image quality; 2. recognizing the posture of the sow by using an improved Faster R-CNN, and taking the posture with the highest probability of each frame as a posture sequence; taking the first 5 detection frames with the highest probability as candidate areas; 3. correcting the classification errors of the gesture sequences by median filtering with the length of 5; detecting suspected conversion fragments by using the gesture conversion times of the video segments; in the suspected conversion segment, a viterbi algorithm is used for constructing a sow positioning pipeline according to the candidate region; 4. dividing each frame of sow in a positioning pipeline by using a maximum inter-class variance method, and calculating the heights of each part of the sow body to form a height sequence; 5. inputting the height sequence into an HMM model, and dividing the suspected conversion segment into a gesture conversion segment and an unconverted segment; and classifying the single gesture fragments and the gesture conversion fragments to obtain a recognition result. According to the sow gesture conversion and recognition method, the sow gesture conversion and recognition can be automatically performed under the conditions of light change and night, and a foundation is laid for high-risk behavior recognition.

Description

Lactating sow posture conversion recognition method based on Faster R-CNN and HMM

Technical Field

The invention relates to the technical field of video recognition, in particular to a lactating sow gesture conversion recognition method based on Faster R-CNN and HMM.

Background

In the concentrated feeding environment of the pig farm, the maternal behavior of the sow is closely related to the survival rate of the piglets, and the quality of the maternal behavior is mainly reflected in posture conversion. The posture conversion behavior of the sow is highly subjective, time-consuming and labor-consuming through direct observation of artificial eyes or video monitoring observation. The automatic identification of the sow posture conversion behavior can provide basic research information for the characteristics and rules of the sow sexual behavior, prevent the piglets from stepping and pressing to die, improve the survival rate of the piglets, reduce the labor cost of pig farm management, and have great significance for improving the breeding level of live pigs.

Sensor technology has been used to monitor sow posture or posture conversion, as disclosed in the patent of publication No. CN105850773a as a live pig posture monitoring device and method based on micro inertial sensor. And acquiring the attitude information and the behavior information of the live pigs by adopting a triaxial magnetometer sensor HMC5883 and a 6-axis micro inertial sensor MPU-6050 integrating a triaxial acceleration sensor and a triaxial gyroscope. The patent of publication No. CN106326919A discloses a live pig behavior classification method based on BP neural network, which uses MPU-6050 sensor to collect live pig acceleration, angular velocity and attitude angle information as input in real time, and identifies four behavior modes of live pig station, walking, lying and lying according to a pre-established BP neural network model. In order to overcome the influence of sow stress, sensor falling or damage and the like, researchers begin to acquire sow posture information by adopting computer vision. The patent of publication No. CN107844797A discloses a method for automatically recognizing the posture of a lactating sow based on a depth image, wherein a DPM algorithm is adopted to obtain a target area of the sow, a target area result is input into a sow posture recognition depth convolution neural network, and the sow posture is recognized. The patent of publication No. CN108830144A discloses a lactating sow gesture recognition method based on improved Faster R-CNN, and by introducing a residual structure and a Center Loss supervision signal, model precision and training speed, effective recognition of five gestures of standing, sitting, prone and lateral of the lactating sow is finally realized. And body deformation, adhesion between sow and piglet, scene illumination change and the like in sow posture conversion bring great challenges to all-weather computer vision-based sow posture conversion and identification, so that relevant research results at home and abroad are hardly reported.

Aiming at the difficulties brought to the gesture conversion and identification of the lactating sow by day-night alternating light change, adhesion of sows and piglets, pig body deformation and the like in a free-fence pig house scene, the invention adopts a depth video image as a data source of the gesture conversion and identification of the sow, provides a lactating sow gesture conversion and identification algorithm based on Faster R-CNN and HMM, tests prove the effectiveness of the method, and provides a reference technology for the automatic identification of high-risk behaviors of the sow.

Therefore, providing a method capable of accurately identifying the posture conversion of the lactating sow in a complex environment is a technical problem to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of the above, the invention aims to provide a lactating sow posture conversion and identification method based on Faster R-CNN and HMM, which can accurately identify sow posture conversion under complex conditions of sow deformation, adhesion with piglets or walls, different sizes, night scenes and the like. The specific scheme for achieving the purpose is as follows:

the invention discloses a lactating sow posture conversion and identification method based on a fast R-CNN and an HMM, which comprises the following steps:

s1, acquiring depth video images of sows, and establishing a sow gesture conversion identification video image library;

s2, establishing a Faster R-CNN sow posture detection model and an HMM high-order sequence classification model;

s3, detecting the depth video image frame by using a fast R-CNN, obtaining a sow gesture sequence, marking suspected conversion fragments and single gesture fragments in the sow gesture sequence, selecting candidate regions in the suspected conversion fragments, and constructing a sow positioning pipeline according to the candidate regions;

s4, dividing the image in the sow positioning pipeline frame by frame, calculating the average height of each part of the sow body to form a height sequence of a suspected conversion segment, and dividing the suspected conversion segment into unconverted segments or attitude conversion segments by adopting an HMM height sequence classification model according to the height sequence;

s5, combining the unconverted fragments and the single-posture fragments to form combined single-posture fragments, and finally classifying the combined single-posture fragments and the combined posture conversion fragments to obtain a sow posture conversion recognition result.

Faster R-CNN utilizes convolution and pooling to extract high-quality image characteristics, and well solves the problems of illumination change, pig deformation, different sizes and the like in a pig house scene. According to the invention, candidate areas are generated through Faster R-CNN, a Viterbi algorithm is adopted to construct a sow positioning pipeline, on the basis, height sequences of a sow trunk part, a sow tail part and upper and lower sides of a sow body in a suspected conversion segment are extracted through Otsu segmentation and morphological processing, and HMM is used for identifying gesture conversion.

Preferably, the specific process of step S1 is as follows:

s11, data acquisition: acquiring a overlooking sow depth video image in real time;

s12, constructing a database: removing video segments with sow body loss and camera shake, and constructing a training set, a verification set and a test set;

s13, preprocessing the depth video image frame by adopting median filtering, and improving the image contrast by a method of limiting contrast self-adaptive histogram equalization;

s14, randomly extracting at most 5 depth video images from each video segment with unchanged sow posture to respectively obtain m images of standing, sitting, lying and lying on side, 4m images in total, respectively performing clockwise rotation by 90 degrees, 180 degrees and 270 degrees and horizontal and vertical mirror image amplification, and finally forming 24m images serving as a training set of Faster R-CNN; randomly extracting at most 5 depth images from each video segment with unchanged sow posture, and respectively obtaining n standing, sitting, lying and lateral lying images as a fast R-CNN verification set; randomly extracting at most 5 depth images from each video segment with unchanged sow posture to respectively obtain t standing, sitting, lying and lateral lying states as a test set of Faster R-CNN; manually labeling the training set, the verification set and the test set data, namely labeling the boundary boxes and the gesture categories of the sows in the images;

s15, randomly selecting a plurality of segments of an image sequence with and without gesture conversion in a training set to serve as a training set of an HMM model; randomly selecting a plurality of segments of the image sequence with and without gesture conversion in the verification set to serve as a verification set of the HMM model; taking all suspected conversion fragments in the test set as a test set of the HMM model; and (5) manually labeling the data of the training set, the verification set and the test set, namely labeling whether the sow has gesture conversion in the video segment.

Preferably, the specific process of step S2 is as follows:

s21, training a Faster R-CNN model, wherein the specific process is as follows,

s211, selecting ZFNet as a network structure, and performing model training through a Caffe frame;

s212, fine-tuning parameters by adopting a random gradient descent method and a back propagation algorithm, and initializing a network layer by adopting Gaussian distribution;

wherein the process of fine tuning the parameters selects a maximum number of iterations of 9×10 ⁴ Wherein the first 6×10 ⁴ The secondary learning rate is 10 < -3 >, and the later 3 multiplied by 10 ⁴ The secondary learning rate is 10 < -4 >, impulse is 0.9, and the attenuation coefficient of the weight is 5 multiplied by 10 ^-4 The mini-batch is 256, and Gaussian distribution initialization with the mean value of 0 and the standard deviation of 0.1 is adopted for the network layer.

S213, according to the size of the sow in the depth image, setting anchor point areas as 962, 1282 and 1602, and aspect ratios as 1:1, 1:3 and 3:1;

s22, training an HMM height sequence classification model; extracting height sequences of an HMM model training set and a verification set, wherein the number of lines of each section of height sequence is 4, and the height sequences respectively correspond to the upper side and the lower side of the trunk part, the tail part and the pig body; setting the number of kernel functions, the number of hidden states and the maximum iteration number, and subtracting the average value of each section of height sequence as pretreatment; the test was repeated 10 times, and the model with the highest accuracy was retained.

The number of kernel functions and the number of hidden states are set to be 2, the maximum iteration number is 500, and when the error is smaller than 10 < -6 >, the algorithm is finished in advance.

Preferably, the specific process of step S3 is as follows:

s31, acquiring a sow posture sequence and a candidate region, wherein the specific process is as follows,

s311, inputting the depth video image into a fast R-CNN model frame by frame for detection;

s312, selecting the gesture with the highest probability in each frame to form a gesture sequence, wherein the gesture sequence is used for detecting the suspected conversion fragments in the step S32 and classifying the single gesture fragments in the step S5, and simultaneously, the first five detection frames with the highest probability are reserved as candidate areas and used for connecting the candidate areas in the step S33;

s32: calculating the change times of the gesture sequences in each window by adopting a sliding window according to the statistical result of the gesture conversion duration, selecting fragments with the change times more than 3 as suspected conversion fragments, and using the rest fragments as single gesture fragments; the length of the sliding window is 20, and the step length is 1;

s33: for candidate areas in each frame of the suspected conversion fragment, firstly calculating the connection score of the candidate areas of the adjacent frames according to a formula (1),

s(R _t ,R _t+1 )＝φ(R _t )+φ(R _t+1 )+λ·ov(R _t ,R _t+1 ) (1)

wherein phi (R) _t ) And phi (R) _t+1 ) Respectively represent t-th frame candidate regions R _t And t+1st frame candidate region R _t+1 Maximum probability among four classes of poses, ov (R _t ,R _t+1 ) Representing candidate region R _t And R is _t+1 Lambda is a coefficient, and 2 is taken;

s34: then calculating the optimal connection sequence of each frame candidate region of the suspected conversion fragment according to the formula (2), namely constructing a sow positioning pipeline,

in the middle of

Is a set of connected sequences of candidate regions.

Preferably, the specific process of step S4 is as follows:

s41: dividing sows from the background by applying a maximum inter-class variance method Otsu in a detection frame by frame, detecting walls in the Otsu division result through Hough transformation, removing the walls, closing the positioning pipeline, disconnecting the adhesion between the sows and piglets, and finally mapping the division result to an original depth video image;

s42: uniformly rotating the segmentation result by a bicubic interpolation method; connecting the two points at the head and the tail to form a straight line, respectively making vertical lines at one quarter and three quarters of the straight line, and dividing the sow body into three parts, namely a head part, a body part and a tail part; distinguishing the head and the tail of the sow according to the arc degree, then respectively calculating the average heights of the body part, the tail and the upper side and the lower side of the sow body, and finally combining the results of all frames to form a height sequence of a suspected conversion fragment;

s43: inputting the calculated height sequence into an HMM model, estimating posterior probability through a forward-backward algorithm, and dividing the height sequence into a posture conversion segment or an unconverted segment.

Preferably, the specific process of step S5 is as follows:

s51: combining the unconverted fragments with the single-gesture fragments to obtain combined single-gesture fragments;

s52: classifying the combined single-gesture fragment categories according to the gesture with the largest proportion in the gesture sequence;

s53: and classifying the gesture conversion fragments according to the categories of the single gesture fragments to form a final prediction result.

Preferably, the final prediction result is a start frame and an end frame of various poses and pose transitions.

Preferably, the training set refers to a data set used for training the Faster R-CNN and HMM models; the verification set refers to a data set used for optimizing network structure parameters and model parameters in the training process, and an optimal model is selected; the test set is used for testing the performance of the model and evaluating the performance.

Preferably, the single posture segment refers to a segment of a sow which keeps the same posture, the suspected conversion segment refers to a segment of the sow which can possibly undergo posture conversion, and the suspected conversion segment is further divided into a posture conversion segment and an unconverted segment.

Preferably, the single gesture includes: standing, sitting, lying and lying on side, the posture transformation comprising: down conversion, up conversion and turn-over.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

(1) The invention establishes a lactating sow gesture conversion and identification video image library, wherein the database comprises various behavior gesture overlooking images in the daily life of a sow in a pig house scene, the shooting heights, the backgrounds, the scales and the like of all the images are different, and the database provides data support for later sow behavior analysis, algorithm design and the like;

(2) According to the invention, based on the Faster R-CNN and HMM models, a partial sow video image is adopted to train the Faster R-CNN sow gesture detection model and the HMM suspected conversion segment classification model, and the rest sow image is adopted as a verification set, so that the generalization performance of the model is improved, and the problem that the sow gesture conversion and identification are difficult under complex environments such as light, piglet shielding, different pig body sizes and the like is solved;

(3) On the basis of Faster R-CNN, the invention realizes automatic identification of the posture conversion of the lactating sow in the depth video image by fusing methods such as an HMM model, a Viterbi algorithm, otsu threshold segmentation and the like, and lays a foundation for subsequent research of identifying high-risk actions and welfare state evaluation of the sow;

(4) The sow behavior automatic detection and identification device is suitable for monitoring continuous and long-time sows, and is favorable for further performing sow behavior automatic detection and identification.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a lactating sow posture conversion recognition method based on fast R-CNN and HMM;

FIG. 2 is a schematic diagram of the invention for generating candidate regions and establishing positioning pipelines using the Faster R-CNN model;

FIG. 3 is a schematic view of the invention for dividing the area of the class 4 gesture of the sow;

FIG. 4 is a graph showing successful positioning rates at different cross-ratios of the present invention;

FIG. 5 is a diagram showing the comparison of the segmentation results according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

According to the lactating sow gesture conversion and identification method based on the fast R-CNN and the HMM, which is provided by the invention, the automatic identification of sow gesture conversion in a pig house scene is realized, and a basic guarantee is provided for further processing and intelligent analysis of maternal behaviors.

Referring to fig. 1, a flowchart of the present invention is shown, and step 1 is to collect video images and build a database. And 2, establishing a Faster R-CNN model and an HMM model. And 3, detecting the depth video image frame by using a fast R-CNN model, marking a suspected conversion fragment and a single gesture fragment, and constructing a sow positioning pipeline according to a candidate region in the suspected conversion fragment. And 4, extracting the heights of all parts of the sow body in the positioning pipeline, and classifying the height sequences of the suspected conversion fragments by adopting the HMM. And 5, merging the unconverted fragments and the single-gesture fragments, and classifying the merged single-gesture fragments and gesture conversion fragments. According to the method, under a Ubuntu14.04 operating system, a Caffe deep learning framework is built on a GPU hardware platform based on Nvidia GTX 980, fast R-CNN and HMM model training and testing are conducted, and automatic recognition of sow gesture conversion is completed under Matlab.

The method is concretely realized as follows:

step one, acquiring depth video images of sows, and establishing a sow gesture conversion identification video image library;

establishing a Faster R-CNN sow posture detection model and an HMM suspected conversion fragment classification model;

thirdly, detecting the depth video image frame by using a fast R-CNN, obtaining a sow gesture sequence and a candidate region, marking a suspected conversion segment and a single gesture segment in the gesture sequence, and constructing a sow positioning pipeline according to the candidate region by using a Viterbi algorithm;

dividing the image in the sow positioning pipeline frame by frame, calculating the average height of each part of the sow body to form a height sequence of a suspected conversion segment, and dividing the suspected conversion segment into unconverted segments or attitude conversion segments by adopting an HMM model according to the height sequence;

and fifthly, merging the unconverted fragments with the single posture fragments, classifying the merged single posture fragments and posture conversion fragments, and obtaining a sow posture conversion recognition result.

The first step is to collect depth video images of sows and establish a sow gesture conversion identification video image library, and specifically comprises the following steps:

1) The camera is fixed to the place just above the pig farm and adjusted to a proper height to obtain the complete pig farm area. The method is characterized in that the method comprises the steps of connecting a computer image acquisition system through a USB, acquiring overlooking sow video images in real time, storing the overlooking sow video images to a local hard disk, and collecting 123 sections of sow overlooking depth video images, wherein the size of each image is 512 multiplied by 424 pixels, and the collected sow video images are shown in fig. 2. 35 video images are randomly selected as a training set, 30 video images are used as a verification set, and the rest 58 video images are used as a test set.

2) Establishing a Faster R-CNN training set, a verification set and a test set. Randomly extracting at most 5 depth images from each video segment of the training set, which does not change the posture of each sow, respectively obtaining 1500 frames of each of standing, sitting, lying and lying on side, 6000 frames in total, respectively rotating by 90 degrees, 180 degrees and 270 degrees clockwise, horizontally and vertically mirroring and amplifying, and finally forming 36000 frames of images serving as a training set of Faster R-CNN. At most 5 depth images are randomly extracted from the video segments of each sow posture of the verification set, and standing 1441, sitting 1416, lying 1344 and lying 1585 are respectively obtained and used as the verification set of fast R-CNN. At most 5 depth images are randomly extracted from the video segments of each sow posture of the test set, and standing 967, sitting 940, lying 972 and lying 984 are respectively obtained as test sets of fast R-CNN. And (5) manually labeling the training set, the verification set and the test set data, namely labeling the boundary boxes and the gesture categories of the sows in the images.

3) And establishing an HMM training set, a verification set and a test set. Randomly selecting 60 sections of each image sequence with and without gesture conversion in a training set, and taking 120 sections as a training set of the HMM; and randomly selecting 60 sections of each image sequence with and without gesture conversion in the verification set, and taking 120 sections as the verification set of the HMM. All suspected transition fragments in the test set were used as test set for HMM. And (5) manually labeling the data of the training set, the verification set and the test set, namely labeling whether the sow has gesture conversion in the video segment.

4) Preprocessing an image by adopting a median filtering method, wherein the window size is 5 multiplied by 5, and then improving the image contrast by adopting a method of limiting contrast self-adaptive histogram equalization, wherein the image block size is set to be 16 multiplied by 16, and the truncated coefficient is 0.01.

Establishing a fast R-CNN sow posture detection model and an HMM high-order classification model, wherein the method specifically comprises the following steps of:

1) Training of the fast R-CNN model is based on a Caffe framework, ZFNet is selected as a network structure, a random gradient descent method and a back propagation algorithm are adopted to fine-tune parameters, and the maximum iteration number is 9 multiplied by 10 ⁴ Wherein the first 6×10 ⁴ The secondary learning rate is 10 < -3 >, and the later 3 multiplied by 10 ⁴ The secondary learning rate is 10 < -4 >, impulse is 0.9, and the attenuation coefficient of the weight is 5 multiplied by 10 ^-4 The mini-batch is 256, and Gaussian distribution initialization with the mean value of 0 and the standard deviation of 0.1 is adopted for the network layer. Anchor areas are set to 962, 1282, and 1602, and aspect ratios are set to 1:1, 1:3, and 3:1, depending on the size of the sow in the depth image. And when the intersection ratio of the detection frame and the manual labeling frame exceeds a threshold value of 0.7 and the categories are consistent, the detection result is considered to be correct.

2) Extracting the height sequences of the HMM training set and the verification set by adopting the method in the fourth step, wherein the number of lines of each section of the height sequences is 4, and the height sequences correspond to the upper side and the lower side of the trunk part, the tail part and the pig body respectively; according to the statistical result of the gesture conversion duration, the length of each image sequence is 60-120 frames, so the number of columns is 60-120, and the frames are corresponding.

3) The training of the HMM model is based on a Baum-Welch algorithm, the number of kernel functions and hidden state numbers of the HMM model are set to be 2, the maximum iteration number is 500, when the error is smaller than 10 < -6 >, the algorithm is finished in advance, and each section of height sequence is subtracted by the average value of each section of height sequence to be used as pretreatment. The test was repeated 10 times, and the model with the highest accuracy was retained.

Step three, detecting depth video images frame by using a fast R-CNN, obtaining a sow gesture sequence and a candidate region, marking a suspected conversion segment and a single gesture segment in the gesture sequence, and constructing a sow positioning pipeline according to the candidate region by using a Viterbi algorithm, wherein the method specifically comprises the following steps:

1) The depth video image is input into a trained fast R-CNN model frame by frame to be detected, the gesture with the highest probability in each frame is selected to form a gesture sequence for detecting the subsequent suspected conversion fragments and classifying the single gesture fragments, and meanwhile the first five detection frames with the highest probability are reserved as candidate areas for subsequently constructing a positioning pipeline. When the fast R-CNN model detection is carried out on the depth video image frame by frame, 200 detection frames are generated for each image, and after a suspected conversion fragment is obtained, the first five detection frames with the highest probability are reserved as candidate areas.

2) Correcting the gesture sequence by median filtering with the length of 5 and the step length of 1, calculating the change times of the gesture sequence in each window by adopting a sliding window with the length of 20 and the step length of 1, marking fragments with the change times of more than 3 as suspected conversion fragments, and marking the rest fragments as single gesture fragments.

3) For the first 5 candidate regions of the maximum probability in each frame of the suspected transition fragment, the connection score of the candidate regions of the adjacent frames is calculated according to the formula (5), wherein phi (R _t ) And phi (R) _t+1 ) Respectively represent t-th frame candidate regions R _t And t+1st frame candidate region R _t+1 Maximum probability among four classes of poses, ov (R _t ,R _t+1 ) Representing candidate region R _t And R is _t+1 Lambda is a coefficient, and 2 is taken;

s(R _t ,R _t+1 )＝φ(R _t )+φ(R _t+1 )+λ·ov(R _t ,R _t+1 ) (1)

4) Calculating optimal connection sequence of candidate region of each frame of suspected conversion fragment according to formula (6), namely constructing positioning pipeline of sow, wherein

Is a set of connected sequences of candidate regions.

Referring to FIG. 2, a schematic diagram of the present invention for generating candidate regions and establishing a localization pipeline using the Faster R-CNN model is shown. The fast R-CNN generates the first 5 candidate regions with the highest probability in consecutive 3 (t=3) frames, respectively, and the probabilities of the candidate frames are sequentially ordered from big to small: red, green, yellow, blue and violet, wherein the red candidate frame of the second frame is positioned incorrectly, and the positioning pipeline with the largest average connection score can be searched through the optimization of the connection sequence in the step S34.

Dividing the image in the positioning pipeline frame by frame, calculating the average height of each part of the sow body to form a height sequence of suspected conversion fragments, and dividing the suspected conversion fragments into unconverted fragments or attitude conversion fragments by adopting HMM according to the height sequence, wherein the method specifically comprises the following steps:

1) And adopting a maximum inter-class variance method (Otsu) in the detection frame by frame, and not counting pixels with gray values larger than 250 when calculating the segmentation threshold value in order to reduce the influence of impulse noise on the segmentation effect.

2) And extracting the outline of the Otsu segmentation result, detecting a straight line by adopting Hough transformation, if the straight line is positioned at the edge of the detection frame and has the same length as the edge, regarding the edge as a detected wall, repeatedly expanding one pixel for the edge, performing pixel and operation with the Otsu segmentation result after each expansion, calculating the number difference between the pixel and the pixel before expansion, and exiting the circulation when the number difference of the pixels is smaller than 80% of the length of the straight line, regarding the part of pixels as the wall and removing the part of pixels.

3) The sow and the piglet are disconnected by closing operation, the selected structural element is round, and the radius is 10 pixels.

4) And uniformly rotating the segmentation result by using a bicubic interpolation method. And connecting the two points at the head and the tail to form a straight line, and respectively making vertical lines at one quarter and three quarters of the straight line to divide the sow body into three parts, namely a head part, a body part and a tail part. And distinguishing the head and the tail of the sow according to the arc degree, and then respectively calculating the average heights of the body part, the tail and the upper side and the lower side of the sow body. And finally combining the results of each frame to form the height sequences of the trunk part, the tail part and the upper and lower sides of the sow body of the suspected conversion fragment.

5) And inputting the height sequence into a trained HMM model, estimating posterior probability through a forward-backward algorithm, and classifying suspected conversion fragments.

See fig. 3 for the region division of the 4-class posture of the sow, wherein the green region is the head, the purple region is the trunk, the blue region is the tail, the red region is the upper side of the pig body, the gray region is the lower side of the pig body, a and b divide the head, the trunk and the tail, and c and d divide the head and the upper and lower sides of the pig body.

Combining the unconverted fragments with the single posture fragments, and finally classifying the combined single posture fragments and posture conversion fragments to obtain a sow posture conversion recognition result, wherein the method specifically comprises the following steps of:

1) The unconverted segments are marked as single-pose segments and merged with other single-pose segments.

2) And classifying the combined single-gesture fragment categories according to the gesture with the largest proportion in the gesture sequence, namely counting the occurrence times of each gesture in the gesture sequence, and selecting the gesture with the largest occurrence times as the category of the single-gesture fragment.

3) The gesture conversion fragments are classified according to the single gesture fragment category, such as 5:28:00 to 5:43:00 in fig. 5, wherein 5:32:00 to 5:32:20 are gesture conversion, and the single gesture fragment category after conversion is prone due to standing of the single gesture fragment category before conversion, so that the fragments are classified as downlink conversion 1, and a final result is formed.

The single gesture includes: standing, sitting, lying down and lying on side; the gesture conversion includes: down conversion 1, down conversion 2, up conversion and turn-over, the definition is shown in Table 1

TABLE 1 introduction of the attitude and attitude transition of lactating sows

The experimental results of this experiment are described in detail below:

the invention mainly adopts a confusion matrix to evaluate the Faster R-CNN model; the accuracy (accuracy) of the HMM model was evaluated with equation (7):

accuracy＝(TP _SCS +TN _SCS )/Sum _SCS (3)

TP _SCS 、TN _SCS sum is the number of true classes and true negative classes in the suspected conversion fragments respectively _SCS The total number of the suspected conversion fragments.

The detection effect of the fast R-CNN model and the classification effect of the HMM model are calculated, and the statistical results are shown in tables 2 and 3. Wherein STD stands, SIT SITs, VTL lies prone, LTL lies on the side, PCS is posture conversion segment, UCS is unconverted segment.

TABLE 2 Classification confusion matrix for Faster R-CNN model on test set (IoU > 0.7)

Aiming at the evaluation of the final result of the algorithm, the invention adopts the successful positioning rate and the recall rate. Each index is specifically defined as the following formula (4), wherein n _IoU＞α Representing video frames with the intersection ratio of the detection frame and the artificial annotation frame being larger than alpha, s _R@1,IoU＞β And representing the gesture conversion fragments which are consistent in category and have the intersection ratio larger than beta in the algorithm segmentation and the manual segmentation. Alpha, beta is in the value range of 0,1]Selected hereinAlternative α=0.7, β=0.5. The successful positioning rate reflects the accuracy of target tracking and is used as an evaluation index of the pipeline extraction result; the recall rate comprehensively reflects the accuracy of segment point positioning and segment classification and is used as a segment result evaluation index. Fig. 4 is the successful positioning rate at different cross-ratios threshold, 97.40% when α=0.7 and 94.83% when α=0.8.

The segmentation effect of the test set (58-segment sow depth video image) was calculated using the above formula, and the statistical results are shown in table 4 and fig. 5. In table 4, DM1 is a down-conversion 1, DM2 is a down-conversion 2, am is an up-conversion, and RL is a turn-over. In FIG. 5, truth represents manual labeling segmentation, FRCNN represents a single frame detection result based on Faster R-CNN, and Ours is the method of the invention.

TABLE 3 HMM model classification confusion matrix

The average precision and average recall rate of the Faster R-CNN model on four types of gestures are 98.51% and 92.39%, respectively, and the detection speed is 0.058 s/frame. The accuracy of the HMM model was 94.91%. The successful localization rate of the final result of the algorithm is 97.40% and the recall rate is 87.84%.

Table 4 gesture transformation classification confusion matrix

The invention provides a lactating sow gesture conversion recognition method based on Faster R-CNN and HMM, which is described in detail above, wherein specific examples are applied to illustrate the principle and the implementation of the invention, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A lactating sow posture conversion recognition method based on FasterR-CNN and HMM is characterized by comprising the following steps:

s2, establishing a FasterR-CNN sow posture detection model and an HMM high-order sequence classification model;

s3, detecting the depth video image frame by using FasterR-CNN, obtaining a sow gesture sequence, marking suspected conversion fragments and single gesture fragments in the sow gesture sequence, selecting candidate regions in the suspected conversion fragments, and constructing a sow positioning pipeline according to the candidate regions;

s4, dividing the image in the sow positioning pipeline frame by frame, calculating the average height of each part of the sow body to form a height sequence of a suspected conversion segment, and dividing the suspected conversion segment into unconverted segments or attitude conversion segments by adopting an HMM height sequence classification model according to the height sequence; the specific process of the S4 is as follows:

s43: inputting the calculated height sequence into an HMM model, estimating posterior probability through a forward-backward algorithm, and dividing the height sequence into a posture conversion segment or an unconverted segment;

2. The lactating sow posture conversion recognition method based on FasterR-CNN and HMM according to claim 1, wherein the specific process of the step S1 is as follows:

s14, randomly extracting at most 5 depth video images from each video segment with unchanged sow posture to respectively obtain m images of standing, sitting, lying and lying on side, 4m images in total, respectively performing clockwise rotation by 90 degrees, 180 degrees and 270 degrees and horizontal and vertical mirror image amplification, and finally forming 24m images serving as a training set of FasterR-CNN; randomly extracting at most 5 depth images from each video segment with unchanged sow posture, and respectively obtaining n standing, sitting, lying and lateral lying images as a FaterR-CNN verification set; randomly extracting at most 5 depth images from each video segment with unchanged sow posture to respectively obtain t standing, sitting, lying and lateral lying states as a FaterR-CNN test set; manually labeling the training set, the verification set and the test set data, namely labeling the boundary boxes and the gesture categories of the sows in the images;

3. The lactating sow posture conversion recognition method based on FasterR-CNN and HMM according to claim 1, wherein the specific process of the step S2 is as follows:

s21, training a FasterR-CNN model, wherein the specific process is as follows,

s213, setting anchor point areas as 962, 1282 and 1602 according to the sizes of sows in the depth image, and setting aspect ratios as 1:1, 1:3 and 3:1;

4. The lactating sow posture conversion recognition method based on FasterR-CNN and HMM according to claim 1, wherein the specific process of the step S3 is as follows:

s311, inputting the depth video image into a FasterR-CNN model frame by frame for detection;

s32: calculating the change times of the gesture sequences in each window by adopting a sliding window according to the statistical result of the gesture conversion duration, selecting fragments with the change times more than 3 as suspected conversion fragments, and using the rest fragments as single gesture fragments;

in the method, in the process of the invention,

and->

Respectively represent t-th frame candidate regions R _t And t+1st frame candidate region R _t+1 Maximum probability among four classes of poses, ov (R _t ，R _t+1 ) Representing candidate region R _t And R is _t+1 Lambda is a coefficient, and 2 is taken;

in the middle of

Is a set of connected sequences of candidate regions.

5. The lactating sow posture conversion recognition method based on FasterR-CNN and HMM according to claim 1, wherein the specific process of the step S5 is as follows:

6. The method for recognizing the posture switching of the lactating sow based on FasterR-CNN and HMM according to claim 5, wherein the final prediction result is a start frame and an end frame of various postures and posture switching.

7. The method for identifying the posture transition of the lactating sow based on FaterR-CNN and HMM according to claim 2, wherein the training set is a data set for training FaterR-CNN and HMM models; the verification set refers to a data set used for optimizing network structure parameters and model parameters in the training process, and an optimal model is selected; the test set is used for testing the performance of the model and evaluating the performance.

8. The method for recognizing the posture conversion of the lactating sow based on FasterR-CNN and HMM according to claim 1, wherein the single posture segment refers to a segment of the sow which keeps the same posture, the suspected conversion segment refers to a segment of the sow which can possibly undergo posture conversion, and the suspected conversion segment is further divided into a posture conversion segment and an unconverted segment.

9. The method for identifying the posture conversion of the lactating sow based on FasterR-CNN and HMM according to claim 1, wherein the single posture comprises: standing, sitting, lying and lying on side, the posture transformation comprising: down conversion, up conversion and turn-over.