CN117496591A - Feeding behavior identification method based on posture estimation of group health pigs - Google Patents

Feeding behavior identification method based on posture estimation of group health pigs Download PDF

Info

Publication number
CN117496591A
CN117496591A CN202311355490.2A CN202311355490A CN117496591A CN 117496591 A CN117496591 A CN 117496591A CN 202311355490 A CN202311355490 A CN 202311355490A CN 117496591 A CN117496591 A CN 117496591A
Authority
CN
China
Prior art keywords
pigs
health
stage
pig
pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311355490.2A
Other languages
Chinese (zh)
Inventor
胡宇航
房俊龙
戴百生
沈维政
李然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Agricultural University
Original Assignee
Northeast Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Agricultural University filed Critical Northeast Agricultural University
Priority to CN202311355490.2A priority Critical patent/CN117496591A/en
Publication of CN117496591A publication Critical patent/CN117496591A/en
Pending legal-status Critical Current

Links

Abstract

The invention relates to a feeding behavior recognition method based on the posture estimation of a group of health-preserving pigs, which adopts YOLOv8 to detect a video frame of overlooking group of health-preserving pigs so as to obtain the positions of individual live pigs, a trough and a water drinking port in an image; carrying out live pig attitude estimation through an improved HRNet algorithm, wherein the network adopts a structure of multi-resolution branch parallel and high-low resolution fusion, and realizes live pig attitude estimation through thermodynamic diagram; and calculating the position relation between the individual key points of the live pigs and the feeding trough and drinking water mouth areas by adopting a ray method to realize individual feeding identification of the live pigs. The method can rapidly and accurately identify eating and drinking behaviors of live pigs, and provides a reference for further improving the welfare of the group health-preserving pig breeding.

Description

Feeding behavior identification method based on posture estimation of group health pigs
Technical Field
The invention relates to a method for identifying behaviors of a population of health pigs, in particular to a method for identifying feeding behaviors based on posture estimation of the population of health pigs.
Background
In order to realize the balance between the production cost and the benefit of the pig farm, intelligent cultivation has become a new market development. In modern intensive culture environment, the production performance and health condition of live pigs often show on the behavior and have certain regularity. Considering that the computer vision technology has the characteristics of non-contact and no stress, the group health-preserving pig behavior analysis based on computer vision is possible. The invention provides a feeding behavior recognition method based on the posture estimation of the group health pigs, which provides beneficial exploration for further improving the breeding welfare of the group health pigs.
Disclosure of Invention
The invention aims to identify individual pigs in a pig raising monitoring video in a overlook state, and provides a feeding behavior identification method based on pig raising posture estimation.
In order to solve the technical problems, the invention provides the following technical scheme: feeding behavior identification method based on posture estimation of group health pigs comprises the following steps:
step 1: recording group health preserving pig videos under actual production conditions through a camera and extracting frames; performing rectangular frame labeling on 2 food slots, 1 drinking water ports and individual group-care pigs in the group-care pig pictures by using a COCO-Annostor labeling tool, and constructing a first data set by using the extracted pictures and corresponding labeling files; marking 6 key points of the live pig individuals in the picture by using a COCO-Anfront marking tool, wherein the clear visible and non-shielding key point Visibility of the live pig individuals in the picture is set to be 2, namely, the clear visible key point is set to be 1, namely, the shielding is not visible, the coordinates of the joint points are determined by an image coordinate system, the top left corner vertex of the image is a coordinate origin, the origin is horizontal to the right and is vertical to the downward, and the extracted picture and the corresponding marking file construct a second data set;
step 2: training YOLOv8 through a first training set to obtain a target detection model; predicting the first test set through a target detection model to obtain position information of the feeding trough, the drinking water inlet and individual live pigs;
step 3: adding a coordinate attention mechanism CA on the basis of the existing HRNet to obtain a HRNet network based on CA attention, adding a main network feature extraction part in the HRNet network, and further capturing direction sensitive information and position sensitive information by the module to position the space direction;
step 4: training the HRNet network.
Based on the scheme, the HRNet network feature extraction Stage is divided into 4 stages, namely a first Stage (stem), a second Stage (Stage 2), a third Stage (Stage 3) and a fourth Stage (Stage 4), wherein the first Stage is formed by conventional 3×3 convolution, deep convolution and 1×1 convolution, and is used for receiving a picture with a size of 256×192×3, performing rough feature extraction on an input picture to obtain a feature map with a size of 64×48×256, performing 3×3 convolution on the 64×48×256 to adjust the channel number to 32, and performing downsampling on the 64×48×256 feature map by using a convolution with a 3×3 convolution step length of 2 to generate a downsampling sub-network with a size of 32×24×64 by 2 times.
And the second stage accepts the characteristic diagrams with the two resolution ratios, adds CA attention mechanisms to the two branches at the initial stage of the second stage, generates two resolution subnet characteristic diagram outputs of 64×48×32 and 32×24×64 after characteristic fusion, and generates a 16×12×128 characteristic diagram by performing convolution with the 3×3 convolution step length of 2 on the 32×24×64.
The third stage receives the three resolution characteristic diagrams, CA attention mechanisms are added to three branches in the initial stage of the third stage, three resolution subnet characteristic diagrams of 64×48×32, 32×24×64 and 16×12×128 are generated after characteristic fusion, and the 16×12×128 is subjected to convolution with 3×3 convolution step length of 2 to generate an 8×6×256 size characteristic diagram;
and the fourth stage receives the four high-low resolution feature graphs, CA attention mechanisms are added to four branches in the initial stage of the fourth stage, after feature fusion, four resolution subnet feature graphs of 64×48×32, 32×24×64, 16×12×128 and 8×6×256 are generated, 8 times, 4 times and 2 times upsampling is respectively carried out on the low resolution subnets, the resolution of the low resolution subnets reaches 64×48, and feature fusion is carried out to generate a prediction node thermal graph output with the size of 64×48×17.
Based on the scheme, the process for training the HRNet network comprises the following steps:
(1) Initializing the HRNet network by adopting a He initialization method;
(2) Randomly dividing the second training cluster pig raising picture into K batches, wherein each Batch contains N pictures, randomly selecting 1 Batch, and carrying out data enhancement on the N pictures in the Batch by adopting a rotation method, wherein the rotation angle is between 0 and 90 degrees;
the N generally takes numbers which can be divided by 8, such as 16, 32, 64, 128 and the like, and K is the number of training set pictures/N;
(3) Taking N pictures in the randomly selected batch as input of an HRNet network, and obtaining 6 predicted node heat map outputs corresponding to the group of health pigs in each batch by the HRNet network;
(4) Taking N pictures in the randomly selected batch as input of an HRNet network, and obtaining 6 predicted node heat map outputs corresponding to the group of health pigs in each batch by the HRNet network; through the 6 predicted joint point heat maps and the 6 truly labeled labels, calculating the estimated loss value of the attitude of each group of health-preserving pig pictures by using the mean square error, and taking the average value of the estimated loss values of the attitude of the N Zhang Qun health-preserving pig pictures as a final loss value, wherein the formula of the estimated loss value of the attitude of each group of health-preserving pig pictures is as follows:
where m=256×192×17, gh j Picture for representing group of health-preserving pigs in randomly selected batchCorresponding j-th true annotation joint point heat map, H j A j-th predicted joint point heat map corresponding to the Zhang Qun healthcare pig picture is represented, wherein j=1, 2, … and 6;
(5) Repeating the steps (3) and (4) until K batch training is carried out on the HRNet network once, testing a second verification set through the HRNet network, and calculating the average value of the estimated loss values of all verification sets through the calculation method of the step (4);
(6) Repeating the steps (2) - (5) for at least 500 times until the average value of the estimated loss values of the gestures of the second verification set reaches convergence, and obtaining an estimated gesture model of the optimal CA-HRNet network;
(7) The key point heat map prediction is to extract prediction information corresponding to 6 key points from a thermodynamic diagram with the size of 64 multiplied by 48 multiplied by 17; the position of the maximum value of the prediction information corresponding to each key point, namely the position with the maximum prediction score, is used as the position of the prediction key point, and the coordinates of the key point on the original image can be obtained by mapping back to the original image;
(8) Step 5, inputting the pictures in the second test set into an HRNet network loaded with an optimal posture estimation model, wherein the network generates 6 predicted node heat maps for output, and the 6 predicted node heat maps are the posture estimation results of the group health pigs;
(9) Extracting the predicted key point coordinates of the food trough and the drinking water mouth position information predicted in the step 2 and the individual prediction key points of the group of health-preserving pigs in the step 5; judging whether 3 key points of the nose, the left ear and the right ear of each live pig are in the feeding trough area or not by a ray method, and further estimating the feeding behavior of the live pigs in the image; when all the 3 key points of the live pig individual are in the trough area, the live pig individual is considered to be eating; when any one of the 3 key points is not in the trough area, the live pig individual is considered not to eat;
(10) Judging whether noses of 1 key points of each live pig are in a drinking water mouth area or not by a ray method, and further estimating drinking water behaviors of the live pigs in the image; when the 1 key points of the live pig individuals are in the drinking water mouth area, the live pig individuals are considered to drink water; when the 1 key point is not in the drinking water mouth area, the live pig individuals are considered to drink water;
the ray method is to make a horizontal ray from any predicted key point of the live pigs, wherein the key point is out of the area when the number of times of passing through the predicted rectangular frame boundary of the trough or the drinking water port area is even, and is in the area if the number of times of passing through the rectangular frame boundary is odd.
Compared with the prior art, the invention has the following beneficial effects: according to the method, the YOLOv8 is adopted to detect the overlook group health-preserving pig video frame, so that the positions of individual live pigs, food slots and drinking water ports in the image are obtained; carrying out live pig attitude estimation through an improved HRNet algorithm, wherein the network adopts a structure of multi-resolution branch parallel and high-low resolution fusion, and realizes live pig attitude estimation through thermodynamic diagram; and calculating the position relation between the individual key points of the live pigs and the feeding trough and drinking water mouth areas by adopting a ray method to realize individual feeding identification of the live pigs. The method can rapidly and accurately identify eating and drinking behaviors of live pigs, and provides a reference for further improving the welfare of the group health-preserving pig breeding.
Drawings
FIG. 1 is a schematic illustration of the experimental procedure of the present invention;
FIG. 2 is a schematic diagram of a high resolution feature pyramid based architecture of the present invention;
FIG. 3 is a schematic diagram of the CA attention mechanism of the present invention;
FIG. 4 is a schematic representation of the radiographic method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The feeding behavior recognition method based on the attitude estimation of the group health pigs provided by the embodiment comprises the following steps:
step 1: recording group health-preserving pig videos under actual production conditions through a camera, extracting frames of the recorded videos every 300 seconds for preventing repeated redundancy of data, manually deleting target unclear pictures due to the phenomena of ghost, blurring and the like in the pictures caused by live pig movement in the extraction process, randomly sequencing the pictures, and obtaining 3500 Zhang Qun RGB color images of the health-preserving pigs after summarizing the pictures; labeling 3500 pictures by using a COCO-Anfront labeling tool, labeling 2 food slots, 1 drinking water tap and group health-preserving pig individuals in the pictures by using rectangular frames, preferably labeling all the food slots, drinking water tap and live pig individuals in the pictures by using the rectangular frames with the smallest labeling targets, storing the labeling targets, generating json format files which correspond to and are the same as the pictures, recording the information of the names, the positions of the rectangular frames, the channel number and the like of each target in the pictures, and constructing a first data set by using the extracted images and the corresponding labeling files; marking the 3500 pictures by using a COCO-Antator marking tool, marking a water outlet end point of a drinking water faucet in the picture and 6 key points of a live pig individual, wherein the 6 key points of the live pig individual comprise a nose, a left ear, a right ear, a shoulder, a hip and a tail, when marking, the clear visible (Visibility) of the key points of the live pig individual in the picture, which are clear and free of shielding, is set as 2, namely clear visible, the invisible (Visibility) of the key points which exist in the picture but are blocked, namely, blocking is set as 1, the coordinates (abscissa) of the joint points are determined by an image coordinate system, the top left corner vertex of the image is a coordinate origin, the origin is horizontally right, the abscissa (positive direction), the origin is vertically downward as the ordinate (positive direction), and 2500 key point coordinate json files corresponding to the picture are acquired totally, and a second data set is constructed by using the extracted image and the corresponding marking files; the 3500 Zhang Quan health preserving pig pictures are immediately divided into a training set, a verification set and a test set, wherein 2500 pieces of training sets, 500 pieces of verification sets and 500 pieces of test sets are adopted;
step 2: training YOLOv8 through a first training set to obtain a target detection model; predicting the first test set through a target detection model to obtain position information (coordinates of top points and center points of rectangular frames) of the food groove, the drinking water inlet and the individual live pigs;
step 3: adding Coordinate Attention (CA) attention mechanism on the basis of the existing HRNet to obtain a CA attention-based HRNet network (CA-HRNet), wherein the CA-HRNet network is characterized in that a CA module is added to a main network feature extraction part in the HRNet network, so as to further capture Direction-Sensitive information (Direction-Aware) and Position-Sensitive information (Position-Sensitive), and more accurately Position the space Direction (coordinate); the HRNet network feature extraction Stage is divided into 4 stages, namely a first Stage (stream), a second Stage (Stage 2), a third Stage (Stage 3) and a fourth Stage (Stage 4), wherein the stream consists of conventional 3×3 convolution, deep convolution and 1×1 convolution, and is used for receiving a picture with the size of 256×192×3, performing rough feature extraction on an input picture to obtain a 64×48×256 feature map, performing 3×3 convolution on the 64×48×256 to only adjust the number of channels to 32, maintaining high resolution characterization, and simultaneously performing downsampling on the 64×48×256 feature map by using a convolution with the size of 2 of 3×3 convolution to generate a downsampling sub-network with the size of 32×24×64; the Stage2 receives the characteristic diagrams with the two resolution ratios, CA attention mechanisms are added to two branches at the initial Stage of the Stage2, after characteristic fusion, two resolution ratio subnet characteristic diagrams of 64×48×32 and 32×24×64 are generated, and the 32×24×64 is subjected to convolution with a convolution step length of 2 of 3×3 to generate a characteristic diagram with the size of 16×12×128; the Stage3 receives the feature graphs with the three resolutions, adds CA attention mechanisms to three branches at the initial Stage of the Stage3, generates three resolution subnet feature graphs of 64×48×32, 32×24×64 and 16×12×128 after feature fusion, generates 8×6×256 feature graphs by performing convolution with the 3×3 convolution step length of 2 on the 16×12×128, receives the four high and low resolution feature graphs, adds CA attention mechanisms to four branches at the initial Stage of the Stage4, generates four resolution subnet feature graphs of 64×48×32, 32×24×64, 16×12×128 and 8×6×256 after feature fusion, respectively performs 8 times, 4 times and 2 times up sampling on a low resolution subnet to enable the resolution of the low resolution subnet to reach 64×48, and performs feature fusion to generate a prediction joint point thermal graph output with the size of 64×48×17, wherein the network structure is shown in fig. 2.
The CA attention mechanism is a novel and efficient attention mechanism, and by embedding position information into channel attention, the information of a larger area can be captured during network learning, so that large calculation is avoided. To avoid 2D global pooling introducing location information loss, the CA Attention breaks down channel Attention into two parallel 1D feature encodings to efficiently integrate spatial coordinate information and map into generated Attention focus regions (Attention Maps); the method comprises the steps of respectively aggregating input features along the vertical and horizontal directions into two independent Direction perception feature Maps (Direction-Aware Feature Maps) by utilizing two 1D global pooling operations, and then respectively encoding the two feature Maps with embedded specific Direction information into two Attention Maps, wherein each Attention map captures the remote dependency relationship of the input feature map along one space Direction; the location information can thus be saved in the generated Attention Maps, and then both Attention Maps are applied to the input feature map by multiplication to emphasize the representation of the Attention area.
As shown in fig. 3, the CA attention mechanism network structure respectively performs average pooling on the horizontal direction and the vertical direction to obtain two 1D vectors, compresses channels by convolutions with a size of 1x1 in the spatial dimension, encodes spatial information in the vertical direction and the horizontal direction by BN and Non-Linear, performs segmentation (Split), obtains the same number of channels as the input feature diagram by convolutions with a size of 1x1, and performs normalization weighting. The CA encodes the channel relation and the long-term dependence through accurate position information, and the specific operation comprises 2 steps of coordinate information embedding and coordinate attention generation:
(1) In order to promote the attention module to capture remote space interaction with accurate position information, CA converts global pooling decomposition into one-to-one dimensional feature coding operation, and divides an input feature map into two directions of width and height to respectively carry out global average pooling, so as to respectively obtain feature maps in the two directions of width and height, wherein the formula is as follows:
wherein z is c Is the c-th channel output, x c For a given input, H is the feature map height, and W is the feature map width;
(2) After coordinate attention generation and transformation in information embedding, the part performs a transformation operation on the transformation by using a convolution transformation function. The feature graphs of the global receptive field in the width direction and the height direction are spliced together, the feature graphs are sent to a convolution module with a shared convolution kernel of 1 multiplied by 1, the dimension of the feature graphs is reduced to be the original C/r, then the feature graph F1 subjected to batch normalization processing is sent to a Sigmoid activation function to obtain a feature graph F with the shape of 1 multiplied by (W+H) multiplied by C/r, and the formula is as follows:
wherein "," represents a stitching operation along the spatial dimension, delta is a nonlinear activation function, F 1 A convolution transform function of size 1x 1;
the characteristic diagram F is convolved with convolution kernel of 1×1 according to the original height and width to obtain the characteristic diagram F with the same channel number as the original one h And F ω After Sigmoid activation function, the attention weight g of the feature map in height and width is obtained respectively h And a attention weight g in the width direction ω The formula is as follows:
g h =σ(F h (f h ))
g ω =σ(F ω (f ω ))
the weight g h 、g ω The original characteristic diagram is calculated through multiplication weighting, and the characteristic diagram with the attention weight in the width and height directions is finally obtained, wherein the formula is as follows:
step 4: training the CA-HRNet network, wherein the specific process is as follows:
(1) Initializing the CA-HRNet network by adopting a He initialization method;
(2) Randomly dividing the second training cluster health-preserving pig pictures into 78 batches, wherein each Batch contains 32 pictures;
(3) Randomly selecting 1 Batch, and carrying out data enhancement on 32 pictures in the Batch by adopting a rotation method, wherein the rotation angle is between 0 and 90 degrees;
(4) Taking 32 pictures in a randomly selected batch as the input of a CA-HRNet network, and obtaining 6 predicted node heat maps corresponding to each group of health-preserving pigs in the batch by the CA-HRNet network for outputting;
(5) For each group of health-preserving pig pictures in the randomly selected batch, calculating the estimated loss value of the attitude of each group of health-preserving pig pictures by using a mean square error through 6 corresponding predicted joint point heat maps and 6 truly marked labels, and taking the average value of the estimated loss values of the attitude of the 32 Zhang Qun health-preserving pig pictures as a final loss value, wherein the estimated loss value formula of the attitude of each group of health-preserving pig pictures is as follows:
where m=256×192×17, gh j Representing a j-th true labeling joint point heat map corresponding to a group of health-preserving pig pictures in randomly selected batch, and H j A j-th predicted joint point heat map corresponding to the Zhang Qun healthcare pig picture is represented, wherein j=1, 2, … and 6;
(6) According to the randomly selected batch attitude estimation loss value calculated in the step (5), setting the initial learning rate to be 1e-3, updating parameters of the CA-HRNet network, and enabling the randomly selected batch to complete training of the CA-HRNet network;
(7) Repeating the steps (3) - (6) until 78 latches train the CA-HRNet network once, testing 500 verification sets through the CA-HRNet network, and calculating the average value of the estimated loss values of the postures of all the verification sets through the calculation method of the step (5);
(8) Repeating the steps (2) - (7) for at least 500 times until the average value of the estimated loss values of the gestures of the verification set reaches convergence, and obtaining an estimated gesture model of the optimal CA-HRNet network;
(9) The key point heat map prediction is to extract prediction information corresponding to 6 key points from a thermodynamic diagram with the size of 64 multiplied by 48 multiplied by 17; the position of the maximum value of the prediction information corresponding to each key point, namely the position with the maximum prediction score, is used as the position of the prediction key point, and the coordinates of the key point on the original image can be obtained by mapping back to the original image;
(10) And 5, inputting the pictures in the test set into a CA-HRNet network loaded with an optimal posture estimation model, wherein the network generates 6 predicted node heat maps for output, and the 6 predicted node heat maps are the posture estimation results of the group health pigs.
(11) Extracting the predicted key point coordinates of the food trough and the drinking water mouth position information predicted in the step 2 and the individual prediction key points of the group of health-preserving pigs in the step 5; judging whether 3 key points (nose, left ear and right ear) of each live pig are in a feeding trough area or not by a ray method, and further estimating feeding behaviors of the live pigs in the image; when all the 3 key points of the live pig individual are in the trough area, the live pig individual is considered to be eating; when any of the above 3 key points is not in the trough area, the live pig individual is considered not feeding.
(12) Judging whether 1 key point (nose) of each live pig is in a drinking water mouth area or not by a ray method, and further estimating the drinking water behavior of the live pigs in the image; when the 1 key points of the live pig individuals are in the drinking water mouth area, the live pig individuals are considered to drink water; when the 1 key point is not in the drinking water mouth area, the live pig individuals are considered to drink water; the ray method is shown in fig. 4.
(13) The ray method is to make a horizontal ray from any predicted key point of the live pigs, wherein the key point is out of the area when the number of times of crossing the boundary of a trough or a drinking water port area (predicted rectangular frame) is even, and is in the area when the key point is odd.

Claims (7)

1. Feeding behavior identification method based on posture estimation of group health pigs comprises the following steps:
step 1: recording group health preserving pig videos under actual production conditions through a camera and extracting frames; performing rectangular frame labeling on 2 food slots, 1 drinking water ports and individual group-care pigs in the group-care pig pictures by using a COCO-Annostor labeling tool, and constructing a first data set by using the extracted pictures and corresponding labeling files; marking 6 key points of the live pig individuals in the picture by using a COCO-Anfront marking tool, wherein the clear visible and non-shielding key point Visibility of the live pig individuals in the picture is set to be 2, namely, the clear visible key point is set to be 1, namely, the shielding is not visible, the coordinates of the joint points are determined by an image coordinate system, the top left corner vertex of the image is a coordinate origin, the origin is horizontal to the right and is vertical to the downward, and the extracted picture and the corresponding marking file construct a second data set;
step 2: training YOLOv8 through a first training set to obtain a target detection model; predicting the first test set through a target detection model to obtain position information of the feeding trough, the drinking water inlet and individual live pigs;
step 3: adding a coordinate attention mechanism CA on the basis of the existing HRNet to obtain a HRNet network based on CA attention, adding a main network feature extraction part in the HRNet network, and further capturing direction sensitive information and position sensitive information by the module to position the space direction;
step 4: training the HRNet network.
2. The feeding behavior recognition method based on the attitude estimation of the herd of health pigs according to claim 1, wherein the feeding behavior recognition method is characterized by: the HRNet network characteristic extraction Stage is divided into 4 stages, namely a first Stage (stem), a second Stage (Stage 2), a third Stage (Stage 3) and a fourth Stage (Stage 4), wherein the first Stage is composed of conventional 3×3 convolution, deep convolution and 1×1 convolution, and is used for receiving a picture with a size of 256×192×3, performing rough characteristic extraction on an input picture to obtain a 64×48×256 characteristic diagram, performing 3×3 convolution on the 64×48×256 to adjust the channel number to 32, and performing downsampling on the 64×48×256 characteristic diagram by using a convolution with a 3×3 convolution step length of 2 to generate a downsampling sub-network with a size of 32×24×64 by 2 times.
3. The feeding behavior recognition method based on the attitude estimation of the herd of health pigs according to claim 2, wherein the feeding behavior recognition method is characterized by: and the second stage accepts the characteristic diagrams with the two resolution ratios, adds CA attention mechanisms to the two branches at the initial stage of the second stage, generates two resolution subnet characteristic diagram outputs of 64×48×32 and 32×24×64 after characteristic fusion, and generates a 16×12×128 characteristic diagram by performing convolution with the 3×3 convolution step length of 2 on the 32×24×64.
4. The feeding behavior recognition method based on the attitude estimation of the herd of health pigs according to claim 3, wherein the feeding behavior recognition method is characterized by: and the third stage receives the characteristic diagrams with the three resolution ratios, CA attention mechanisms are added to three branches in the initial stage of the third stage, the characteristic fusion is carried out to generate three resolution ratio subnet characteristic diagram outputs of 64×48×32, 32×24×64 and 16×12×128, and the 16×12×128 is subjected to convolution with the 3×3 convolution step length of 2 to generate the characteristic diagram with the size of 8×6×256.
5. The feeding behavior recognition method based on the attitude estimation of the herd of health pigs, according to claim 4, is characterized in that: and the fourth stage receives the four high-low resolution feature graphs, CA attention mechanisms are added to four branches in the initial stage of the fourth stage, after feature fusion, four resolution subnet feature graphs of 64×48×32, 32×24×64, 16×12×128 and 8×6×256 are generated, 8 times, 4 times and 2 times upsampling is respectively carried out on the low resolution subnets, the resolution of the low resolution subnets reaches 64×48, and feature fusion is carried out to generate a prediction node thermal graph output with the size of 64×48×17.
6. The feeding behavior recognition method based on the attitude estimation of the herd of health pigs according to claim 1, wherein the feeding behavior recognition method is characterized by: the process of training the HRNet network comprises the following steps:
(1) Initializing the HRNet network by adopting a He initialization method;
(2) Randomly dividing the second training cluster pig raising picture into K batches, wherein each Batch contains N pictures, randomly selecting 1 Batch, and carrying out data enhancement on the N pictures in the Batch by adopting a rotation method, wherein the rotation angle is between 0 and 90 degrees;
the N generally takes numbers which can be divided by 8, such as 16, 32, 64, 128 and the like, and K is the number of training set pictures/N;
(3) Taking N pictures in the randomly selected batch as input of an HRNet network, and obtaining 6 predicted node heat map outputs corresponding to the group of health pigs in each batch by the HRNet network;
(4) Taking N pictures in the randomly selected batch as input of an HRNet network, and obtaining 6 predicted node heat map outputs corresponding to the group of health pigs in each batch by the HRNet network; through the 6 predicted joint point heat maps and the 6 truly labeled labels, calculating the estimated loss value of the attitude of each group of health-preserving pig pictures by using the mean square error, and taking the average value of the estimated loss values of the attitude of the N Zhang Qun health-preserving pig pictures as a final loss value, wherein the formula of the estimated loss value of the attitude of each group of health-preserving pig pictures is as follows:
where m=256×192×17, gh j Representing a j-th true labeling joint point heat map corresponding to a group of health-preserving pig pictures in randomly selected batch, and H j A j-th predicted joint point heat map corresponding to the Zhang Qun healthcare pig picture is represented, wherein j=1, 2, … and 6;
(5) Repeating the steps (3) and (4) until K batch training is carried out on the HRNet network once, testing a second verification set through the HRNet network, and calculating the average value of the estimated loss values of all verification sets through the calculation method of the step (4);
(6) Repeating the steps (2) - (5) for at least 500 times until the average value of the estimated loss values of the gestures of the second verification set reaches convergence, and obtaining an estimated gesture model of the optimal CA-HRNet network;
(7) The key point heat map prediction is to extract prediction information corresponding to 6 key points from a thermodynamic diagram with the size of 64 multiplied by 48 multiplied by 17; the position of the maximum value of the prediction information corresponding to each key point, namely the position with the maximum prediction score, is used as the position of the prediction key point, and the coordinates of the key point on the original image can be obtained by mapping back to the original image;
(8) Step 5, inputting the pictures in the second test set into an HRNet network loaded with an optimal posture estimation model, wherein the network generates 6 predicted node heat maps for output, and the 6 predicted node heat maps are the posture estimation results of the group health pigs;
(9) Extracting the predicted key point coordinates of the food trough and the drinking water mouth position information predicted in the step 2 and the individual prediction key points of the group of health-preserving pigs in the step 5; judging whether 3 key points of the nose, the left ear and the right ear of each live pig are in the feeding trough area or not by a ray method, and further estimating the feeding behavior of the live pigs in the image; when all the 3 key points of the live pig individual are in the trough area, the live pig individual is considered to be eating; when any one of the 3 key points is not in the trough area, the live pig individual is considered not to eat;
(10) Judging whether noses of 1 key points of each live pig are in a drinking water mouth area or not by a ray method, and further estimating drinking water behaviors of the live pigs in the image; when the 1 key points of the live pig individuals are in the drinking water mouth area, the live pig individuals are considered to drink water; when the 1 key point is not in the drinking water mouth area, the live pig individual is considered to drink water.
7. The feeding behavior recognition method based on the attitude estimation of the herd of health pigs according to claim 6, wherein the feeding behavior recognition method is characterized by: the ray method is to make a horizontal ray from any predicted key point of the live pigs, wherein the key point is out of the area when the number of times of passing through the predicted rectangular frame boundary of the trough or the drinking water port area is even, and is in the area if the key point is odd.
CN202311355490.2A 2023-10-19 2023-10-19 Feeding behavior identification method based on posture estimation of group health pigs Pending CN117496591A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311355490.2A CN117496591A (en) 2023-10-19 2023-10-19 Feeding behavior identification method based on posture estimation of group health pigs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311355490.2A CN117496591A (en) 2023-10-19 2023-10-19 Feeding behavior identification method based on posture estimation of group health pigs

Publications (1)

Publication Number Publication Date
CN117496591A true CN117496591A (en) 2024-02-02

Family

ID=89683867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311355490.2A Pending CN117496591A (en) 2023-10-19 2023-10-19 Feeding behavior identification method based on posture estimation of group health pigs

Country Status (1)

Country Link
CN (1) CN117496591A (en)

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN108038420B (en) Human behavior recognition method based on depth video
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN109284738B (en) Irregular face correction method and system
CN112766160A (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
JP6397379B2 (en) CHANGE AREA DETECTION DEVICE, METHOD, AND PROGRAM
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN111597920B (en) Full convolution single-stage human body example segmentation method in natural scene
CN111931764B (en) Target detection method, target detection frame and related equipment
CN107766864B (en) Method and device for extracting features and method and device for object recognition
CN113343950B (en) Video behavior identification method based on multi-feature fusion
CN112084952B (en) Video point location tracking method based on self-supervision training
CN110929685A (en) Pedestrian detection network structure based on mixed feature pyramid and mixed expansion convolution
CN115690542A (en) Improved yolov 5-based aerial insulator directional identification method
CN114241422A (en) Student classroom behavior detection method based on ESRGAN and improved YOLOv5s
CN113139489A (en) Crowd counting method and system based on background extraction and multi-scale fusion network
CN115995039A (en) Enhanced semantic graph embedding for omni-directional location identification
CN110991256A (en) System and method for carrying out age estimation and/or gender identification based on face features
CN107392211B (en) Salient target detection method based on visual sparse cognition
CN103353941A (en) Natural marker registration method based on viewpoint classification
Zhang et al. Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention
WO2022205329A1 (en) Object detection method, object detection apparatus, and object detection system
CN114626476A (en) Bird fine-grained image recognition method and device based on Transformer and component feature fusion
CN114882537A (en) Finger new visual angle image generation method based on nerve radiation field
CN114663880A (en) Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination