CN116168332B

CN116168332B - Lameness detection method, apparatus and storage medium

Info

Publication number: CN116168332B
Application number: CN202310419552.5A
Authority: CN
Inventors: 王子磊; 左宇晨; 张燚鑫
Original assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Current assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2023-07-14
Anticipated expiration: 2043-04-19
Also published as: CN116168332A

Abstract

The invention discloses a lameness detection method, equipment and a storage medium, and relates to the technical field of safe cultivation informatization, wherein the method comprises the following steps: acquiring an animal video to be detected, wherein the animal video to be detected comprises multiple frames of animal images; extracting skeleton key points of target animals to be detected in each animal image to obtain key point frames; drawing a posture heat map of the target animal to be tested based on all the key point frames; and carrying out limp detection based on the gesture heat map to obtain a limp detection result. According to the invention, through detecting the skeletal key points of the target animal to be detected, the gesture heat map of the moving process of the target animal to be detected is drawn. Based on the gesture heat map of the moving process of the target to-be-detected animal, various changing states of skeleton key points of the target to-be-detected animal in the moving process are reflected, gesture feature data of lameness detection are more abundant, the technical problems that an existing lameness detection method depends on local features and is low in accuracy are solved, and the accuracy of lameness detection is improved.

Description

Lameness detection method, apparatus and storage medium

Technical Field

The invention relates to the technical field of safe cultivation informatization, in particular to a lameness detection method, equipment and a storage medium.

Background

Livestock lameness caused by hoof or leg diseases is a serious problem for breeders. In the related art, whether the livestock lameness in the livestock exercise video is judged based on the local characteristics of the artificially marked livestock and by combining the deep learning or the computer vision. However, existing lameness detection methods rely on local features, which often lead to inaccurate lameness detection.

Disclosure of Invention

The main purpose of the invention is that: a lameness detection method, equipment and storage medium are provided, and the technical problem that the existing lameness detection method depends on local characteristics and is low in accuracy is solved.

In order to achieve the above purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides a lameness detection method, comprising:

acquiring an animal video to be detected, wherein the animal video to be detected comprises multiple frames of animal images;

extracting skeleton key points of target animals to be detected in each animal image to obtain key point frames;

drawing a posture heat map of the target animal to be tested based on all the key point frames;

and carrying out limp detection based on the gesture heat map to obtain a limp detection result.

Optionally, the key point frame is a global gesture map or a local gesture map, and the global gesture map comprises a plurality of gesture partitions corresponding to the local gesture map;

drawing an attitude heat map of an animal to be tested based on all the key point frames, wherein the drawing comprises the following steps:

determining a gesture partition corresponding to the local gesture graph to obtain a local gesture partition;

and stacking all the local gesture graphs and/or all the global gesture graphs according to the local gesture partitions to obtain gesture heat graphs.

Optionally, stacking all the local gesture graphs and/or all the global gesture graphs according to the local gesture partitions to obtain a gesture heat graph, including:

splicing the local gesture graphs into gesture splicing graphs with the same size as the global gesture graphs according to the local gesture partitions;

and stacking all the gesture splicing images and/or all the global gesture images to obtain a gesture heat image.

Optionally, before the lameness detection is performed based on the gesture heat map and the lameness detection result is obtained, the method further includes:

detecting target animals to be detected in each animal image to obtain multi-frame RGB feature images;

lameness detection is carried out based on the gesture heat map to obtain a lameness detection result, and the method comprises the following steps:

extracting attitude space features from the RGB feature map;

extracting from the gesture heat map to obtain gesture time features;

and performing lameness detection based on the gesture space features and the gesture time features to obtain a lameness detection result.

Optionally, the detecting a lameness based on the gesture space feature and the gesture time feature, to obtain a lameness detection result, includes:

fusing the gesture space features and the gesture time features to obtain fused gesture features;

and carrying out lameness detection based on the fusion posture characteristics to obtain a lameness detection result.

Optionally, fusing the gesture spatial feature and the gesture temporal feature to obtain a fused gesture feature, including:

taking the gesture space features and the gesture time features as positive samples;

taking other gesture space features and other gesture time features of other animals to be detected except the target animal to be detected in the animal image as negative samples;

and according to the positive sample and the negative sample, fusing the gesture space characteristics and the gesture time characteristics to obtain fused gesture characteristics.

Optionally, before the step of acquiring the video of the animal to be tested, the method further comprises:

constructing a training sample set, wherein the training sample set comprises a first target animal data set and a first other animal data set, the first target animal data set is provided with lameness feature labels of target animals, and the first other animal data set is provided with lameness feature labels of various other animals;

based on the training sample set, training the time-space motion detection network with small samples;

according to cosine similarity between the support characteristic and the query characteristic output by the time-space motion detection network, adjusting the time-space motion detection network to obtain a lameness detection model;

and carrying out lameness detection on the gesture heat map by using a lameness detection model to obtain a lameness detection result.

Optionally, before the step of constructing the training sample set, the method further comprises:

constructing a first pre-training sample set and a second pre-training sample set, wherein the first pre-training sample set comprises a second other animal data set, the second other animal data set comprises a plurality of other animal images, the second pre-training sample set comprises a second target animal data set, and the second target animal data set is provided with a gesture feature label of a target animal;

training optical flow branches of an initial space-time motion detection network by using a first pre-training sample set to obtain optical flow characteristics and optical flow frames;

training the gesture flow branches of the initial space-time action detection network by using a second pre-training sample set to obtain gesture features;

detecting an optical flow frame by utilizing an attitude flow branch to obtain an attitude classification feature;

and adjusting the initial space-time action detection network according to the cosine distance between the optical flow characteristics and the gesture classification characteristics to obtain a space-time action detection network.

In a second aspect, the present invention also provides a lameness detection device, the device comprising: a memory, a processor, and a lameness detection program stored on the memory and executable on the processor, configured by the lameness detection program to implement the steps of any of the lameness detection methods as described above.

In a third aspect, the present invention also provides a computer-readable storage medium, on which a limp detection program is stored, which when executed by a processor implements the steps of the limp detection method as any one of the above.

The invention provides a lameness detection method, which comprises the steps of obtaining an animal video to be detected, wherein the animal video to be detected comprises multiple frames of animal images; extracting skeleton key points of target animals to be detected in each animal image to obtain key point frames; drawing a posture heat map of the target animal to be tested based on all the key point frames; and carrying out limp detection based on the gesture heat map to obtain a limp detection result.

Therefore, the invention draws the gesture heat map of the moving process of the target animal to be detected by detecting the skeletal key points of the target animal to be detected. Based on the gesture heat map of the moving process of the target to-be-detected animal, various changing states of skeleton key points of the target to-be-detected animal in the moving process are reflected, gesture feature data of lameness detection are more abundant, the technical problems that an existing lameness detection method depends on local features and is low in accuracy are solved, and the accuracy of lameness detection is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural view of a lameness detecting apparatus of the present invention;

fig. 2 is a schematic flow chart of a first embodiment of the lameness detection method of the present invention;

FIG. 3 is a schematic diagram of a refinement flow chart of step S300 in FIG. 2;

FIG. 4 is a flow chart of training a lameness detection model in an embodiment of the present invention;

fig. 5 is a flow chart of a second embodiment of the lameness detection method of the present invention;

fig. 6 is a schematic diagram of a refinement flow of step S430 in fig. 5.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In the present disclosure, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a device or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such device or system. Without further limitation, an element defined by the phrase "comprising … …" does not exclude that an additional identical element is present in a device or system comprising the element.

The problem of livestock lameness due to hoof or leg diseases is a very serious cost problem for breeders and is also an important livestock welfare problem. And traditional whether judge the domestic animal limp or not based on manual observation has the problem that intensity of labour is high, inefficiency, specialty threshold height etc.. In order to solve the defects of the current livestock lameness detection scheme, an intelligent livestock lameness detection scheme is urgent.

The current common livestock lameness detection scheme adopts a sensor-based lameness detection algorithm to detect the livestock lameness. For example, whether the livestock lameness is judged according to signals detected by a sensor such as a pressure detector or the like, or whether the livestock lameness exists is detected and the spatial position of a frame where the lameness cow is located is determined by analyzing local characteristics of the livestock according to images or videos shot by a high-definition camera. However, the sensor devices are easily damaged in the farm environment, various sensors are easily malfunctioning due to the farm environment, the cameras are more suitable for use in farms because direct contact with the animals is not required, and many sensor devices are much more costly than cameras.

However, the camera sensing approach presents a number of challenges. First, because the local characteristics of lameness of different animals are different, the local characteristics of lameness of one animal may not be used directly on the other animal, which affects the generalization between lameness of different animals. Secondly, the local characteristics are not only poorly marked but also have the problem of misjudgment, and the local characteristics only partially describe the state of the dairy cows, so that the possibility of inaccuracy exists, and whether the livestock lameness is caused by indirectly judging whether the livestock lameness is caused by detecting the local characteristics of the livestock possibly exists. Again, the effect of the computer vision algorithm depends on the quality of the data, but in the case where livestock lameness is a small probability event in the farm, the computer vision algorithm faces the problem of serious imbalance of the proportion of lameness livestock data and healthy livestock data, and at the same time, the livestock lameness data belongs to data with small inter-class gap and large intra-class variance, and it is not accurate to judge whether the livestock lameness is caused from a single modality. Finally, in practical applications, since there are a large number of blocking problems between the domestic animals, the result of the common lameness inspection method is seriously affected.

In view of the technical problems that the existing lameness detection method depends on local characteristics and has low accuracy, the invention provides the lameness detection method, and the general thought is as follows:

the method comprises the following steps: acquiring an animal video to be detected, wherein the animal video to be detected comprises multiple frames of animal images; extracting skeleton key points of target animals to be detected in each animal image to obtain key point frames; drawing a posture heat map of the target animal to be tested based on all the key point frames; and carrying out limp detection based on the gesture heat map to obtain a limp detection result.

The invention provides a lameness detection method, which is used for drawing a gesture heat map of a target animal to be detected in a moving process by detecting skeleton key points of the target animal to be detected. Based on the gesture heat map of the moving process of the target to-be-detected animal, various changing states of skeleton key points of the target to-be-detected animal in the moving process are reflected, gesture feature data of lameness detection are more abundant, the technical problems that an existing lameness detection method depends on local features and is low in accuracy are solved, and the accuracy of lameness detection is improved.

The following describes in detail a limp detection method, a device and a storage medium applied in the implementation of the present invention:

referring to fig. 1, fig. 1 is a schematic structural view of a lameness detecting apparatus of the present invention;

as shown in fig. 1, the apparatus may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include user equipment such as personal computers and notebook computers, and the optional user interface 1003 may also include standard wired, wireless interfaces. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 is not limiting of the apparatus and may include more or fewer components than shown, or certain components may be combined, or a different arrangement of components.

As shown in fig. 1, an operating system, a data storage module, a network communication module, a user interface module, and a lameness detection program may be included in the memory 1005 as one type of storage medium.

In the device shown in fig. 1, the network interface 1004 is mainly used for data communication with other network devices; the user interface 1003 is mainly used for data interaction with user equipment; the processor 1001 and the memory 1005 in the lameness detection method of the present invention may be provided in the device, and the lameness detection method calls the lameness detection program stored in the memory 1005 through the processor 1001 and executes the lameness detection method provided in the embodiment of the present invention.

The lameness detecting method, apparatus and storage medium of the present invention are described in detail below with reference to the accompanying drawings and detailed description.

Based on the above hardware structure, but not limited to the above hardware structure, referring to fig. 2 to 4, fig. 2 is a flowchart illustrating a first example of the lameness detection method according to the present invention, fig. 3 is a detailed flowchart illustrating step S300 in fig. 2, and fig. 4 is a flowchart illustrating a process for training a lameness detection model according to an embodiment of the present invention. The embodiment provides a lameness detection method, which comprises the following steps:

step S100: and obtaining an animal video to be detected, wherein the animal video to be detected comprises multiple frames of animal images.

In this embodiment, the execution body is the above-mentioned limp home detection device shown in fig. 1, and the device may be a physical server including an independent host, or may be a virtual server carried by a host cluster.

In this embodiment, the video of the animal to be detected is obtained by a monitoring camera, and the monitoring camera is usually disposed on the livestock channel of the farm. The animal to be tested can be any livestock such as cow, pig, sheep or horse. For example, when the animal to be tested is a cow, the monitoring camera may be disposed on a passage between the cowshed and the milking parlor, and take a video of the animal to be tested on the farm site. The lameness detection device can be connected with a monitoring camera, and an animal video to be detected is obtained from the monitoring camera.

Step S200: and extracting skeleton key points of the target animal to be detected in each animal image to obtain a key point frame.

In this embodiment, the animal image may include a plurality of animals to be tested, and the target animal to be tested may be one of the plurality of animals to be tested. The bone keypoints may be at least one bone keypoint of the whole body bone keypoints of the target test animal. The positions of all skeleton key points of the target to-be-detected animal can be generated by extracting a pre-trained gesture extraction model, and the gesture extraction model uses a 2D key point detection algorithm, so that key point frames of the target to-be-detected animal in the video of the to-be-detected animal are drawn frame by frame.

It should be noted that the gesture extraction model is obtained by training a gesture detection network (for example, HRNet network) in advance through a public animal data set (for example, data sets such as APT-36K, AP-10K) and a target animal data set. The specific gesture extraction model training process can be as follows: acquiring a public animal data set and a target animal data set; marking bone key points of all animals in the public animal data set and bone key points of target animals in the target animal data set to obtain a public animal training sample set and a target animal training sample set; firstly, training a gesture detection network by using a public animal training sample set to obtain an initial gesture extraction model; and training an initial gesture extraction model by using a target animal training sample set to obtain a gesture extraction model. The public animal data set comprises animal images of various animals, and all skeletal key points of the various animals in the public animal data set are marked to obtain a public training sample set. The target animal data set comprises a plurality of target animal images of the target animal, and all skeletal key points of the target animal in the target animal data set are marked to obtain a target animal training sample set. The variety of animals in the public animal dataset may be determined based on animal characteristics of the target animal. It will be appreciated that the target animal and the animal to be tested are animals of the same species. For example, when the target animal is a cow, the other animal may be a horse, pig, or other quadruped animal such as a pig.

In the specific implementation, the video of the animal to be detected is input into a gesture extraction model, and the gesture extraction model detects all skeletal key points of the target animal to be detected in the animal image frame by frame to obtain a key point frame.

Step S300: drawing a posture heat map of the target animal to be tested based on all the key point frames;

in this embodiment, a key point frame may be drawn according to each animal image in the video of the animal to be detected, and multiple key point frames corresponding to multiple animal images are stacked, so as to obtain a gesture heat map of the target animal to be detected.

Specifically, as an embodiment, as shown in fig. 3, step S300 includes:

step S310: and determining the gesture partition corresponding to the local gesture graph to obtain the local gesture partition.

The key point frame is a global posture image or a local posture image, and the global posture image comprises a plurality of posture partitions corresponding to the local posture image.

In this embodiment, because there is a shielding between the animals to be tested, the target animal to be tested part in the animal image may be shielded, and thus, the keypoint frame may only include the local posture diagram of the target animal to be tested. Correspondingly, if the target animal to be detected in the animal image is not shielded, the key point frame comprises a global posture diagram of the target animal to be detected. Thus, the multi-frame keypoint frames of the target test animal include global and/or local pose maps. In order to stack the multi-frame key point frames, the body area of the target animal to be tested corresponding to the local posture diagram needs to be confirmed.

Specifically, the global posture diagram can be divided into a plurality of posture partitions according to the physical characteristics of the target animal to be detected, which posture partition in the global posture diagram corresponds to the local posture diagram is determined, and then multiple frames of key point frames are stacked according to the posture partition corresponding to the local posture diagram. For example, the global pose graph may be equally divided into a left pose partition, a middle pose partition, and a right pose partition, where typically the left pose partition includes the head features of the target animal under test, the middle pose partition includes the body features and leg features of the target animal under test, and the right pose partition includes the tail features of the target animal under test.

Step S320: and stacking all the local gesture graphs and/or all the global gesture graphs according to the local gesture partitions to obtain gesture heat graphs.

In this embodiment, the global pose map and the local pose map are obtained by extracting global pose features and local pose features of the target animal to be detected from the key point frame through a feature extraction algorithm, and the feature extraction algorithm may be a ROIAlign algorithm. It can be appreciated that after the ROIAlign algorithm extracts the gesture features from the key point frame, the gesture features Chi Huawei are mapped to the gesture feature map with a specific size, so as to obtain a global gesture map or a local gesture map. After determining the gesture partition corresponding to the local gesture graph, the local gesture graph can be spliced and then stacked. Wherein the specific dimension is set according to the pose feature size.

Specifically, step S320 includes: splicing the local gesture graphs into gesture splicing graphs with the same size as the global gesture graphs according to the local gesture partitions; and stacking all the gesture splicing images and/or all the global gesture images to obtain a gesture heat image.

In this embodiment, after determining the local gesture partition corresponding to the local gesture map, operations such as clipping or scaling may be performed on the local gesture map, and then the clipped or scaled local gesture map is spliced into the gesture splicing map with the specific size according to the corresponding local gesture partition, so that the gesture splicing map and the global gesture map keep the same size, and local gesture features of the target animal to be tested are spliced to the corresponding gesture partition in the gesture splicing map and then stacked.

Step S400: and carrying out limp detection based on the gesture heat map to obtain a limp detection result.

In this embodiment, the limp detection may be performed on the gesture heat map by pre-establishing and training a limp detection model, whether the target animal to be detected in the gesture heat map is limp or not is judged, the limp condition is marked in the gesture heat map, a limp detection result is obtained, and the limp detection result is output.

Specifically, as an embodiment, as shown in fig. 4, before step S100, the method further includes:

step S600: a training sample set is constructed, the training sample set comprising a first target animal data set having a lameness signature of the target animal and a first other animal data set having lameness signatures of a plurality of other animals.

In this embodiment, the target animal may be any kind of animal, and the other animals include various other animals other than the target animal. The first other animal data set may be a public animal data set labeled with lameness characteristics and the first target animal data set may be a target animal data set labeled with lameness characteristics.

Step S700: based on the training sample set, the time-space motion detection network performs small sample training.

In this embodiment, the spatiotemporal motion detection network comprises a poseC3D network. The spatiotemporal motion detection network may be trained by a training mechanism of small sample learning (Few Shot Learning) to obtain a lameness detection model.

Specifically, a plurality of small sample training tasks are constructed in combination with the training sample set, and the task number of the small sample training tasks is determined according to the training sample number in the training sample set. According to the task number, a preset number of training samples of other animals of preset types in the training sample set and a preset number of training samples of target animals are selected to be used as support sets, and remaining training samples of other animals of preset types in the training sample set and remaining training samples of target animals are used as query sets to obtain a plurality of support sets and a plurality of query sets. The number of the support sets and the number of the query sets are the same as the number of the tasks, and the preset types and the preset numbers can be set according to actual training requirements. And training the space-time action detection network by sequentially utilizing the support set and the query set according to the small sample training task, and outputting support characteristics and query characteristics.

Step S800: and adjusting the time-space motion detection network according to the cosine similarity between the support characteristic and the query characteristic output by the time-space motion detection network to obtain a lameness detection model.

Step S400 includes: and carrying out lameness detection on the gesture heat map by using a lameness detection model to obtain a lameness detection result.

In this embodiment, the cosine similarity is a cosine value of an included angle between the support feature and the query feature, and may be used to evaluate the similarity between the support feature and the query feature. Wherein the range of cosine values is between [ -1,1], the closer the cosine value is to 1, the closer the directions representing the two vectors are. Therefore, the cosine similarity between the support characteristics and the query characteristics output by the space-time action detection network can be adjusted in each small sample training task, and the model parameters of the space-time action detection network can be adjusted until a plurality of small sample training tasks are completed, so that a lameness detection model is obtained.

The embodiment provides a lameness detection method, which is used for drawing a gesture heat map of a moving process of a target animal to be detected by detecting skeletal key points of the target animal to be detected. Based on the gesture heat map of the moving process of the target to-be-detected animal, various changing states of skeleton key points of the target to-be-detected animal in the moving process are reflected, gesture feature data of lameness detection are more abundant, the technical problems that an existing lameness detection method depends on local features and is low in accuracy are solved, and the accuracy of lameness detection is improved. In addition, by detecting at least one bone key point of the target to-be-detected animal, the invention draws the gesture heat map, can detect the limp situation of the target to-be-detected animal without marking local features, and avoids the situations of false detection or omission caused by poor marking of the local features influenced by environment or other factors.

According to the method, the gesture features of the animal to be detected are partitioned, and the local gesture features and/or the global gesture features are overlapped according to the gesture feature partition, so that a gesture heat map is obtained. When the limp detection is carried out based on the gesture heat map, the global gesture characteristics of the whole body of the animal to be detected are considered, and the local gesture characteristics are considered, compared with the existing limp detection method, the limp detection method only aims at the local characteristics, and the detection is carried out based on the global gesture characteristics, so that the accuracy is higher; moreover, when the target animal to be detected in the animal image is shielded, the limp detection can be performed aiming at the local characteristics which are not shielded, so that the condition of missed detection is avoided.

In addition, in the embodiment, the gesture extraction model is obtained through combined training of the public animal data set and the target animal data set, and the key point frame of the target animal to be detected is extracted. The method has the advantages that the animal gesture characteristics can be effectively extracted under the condition of no labeling, simultaneously, the gesture extraction model is optimized by utilizing the common animal training sample set of a plurality of different other animals under the condition that only a small number of target animal training sample sets of the animals to be detected exist, and the accuracy of the gesture extraction model is improved.

In addition, in the embodiment, a training sample set is constructed based on the public animal data set and the target animal data set of the target animal through small sample learning, and the lameness detection model training is performed. The lameness detection model can forget different lameness characteristics among different animals, focus on the same lameness characteristics among different animals, and enable the similarity of the same kind of lameness characteristics to be high and the similarity of the different kinds of lameness characteristics to be low. Under the condition that only a small number of animals to be detected are provided with a target animal data set marked with lameness characteristics, training samples of a lameness detection model are enriched through lameness characteristics of other animals, and the problem that an existing lameness detection method extremely depends on the training samples and is difficult to have enough training samples to train the lameness detection model is solved.

Further, referring to fig. 5 to 6, fig. 5 is a flowchart illustrating a second embodiment of the lameness detection method according to the present invention, and fig. 6 is a detailed flowchart illustrating step S430 in fig. 5. The embodiment provides a lameness detection method, before step S400, the method further includes:

step S500: and detecting the target animal to be detected in each animal image to obtain a multi-frame RGB feature map.

In this embodiment, the RGB feature map includes RGB information of the target animal to be detected in the animal image, and the RGB information of the target animal to be detected may be extracted after the position of the target animal to be detected is detected in the animal image. The target test animal can be detected in the animal image, typically using a pre-trained target detection model.

It should be noted that the target detection model is obtained by training a target detection network (e.g., YOLOX network) in advance through a common animal data set (e.g., AP-10K data set) and a target animal data set. The specific training process of the target detection model can be as follows: acquiring a public animal data set and a target animal data set; labeling various animals in the public animal data set and target animals in the target animal data set to obtain a public animal detection sample set and a target animal detection sample set; firstly training a gesture detection network by using a public animal detection sample set to obtain an initial target detection model; and training an initial target detection model by using the target animal detection sample set to obtain a target detection model.

In addition, the position of the target to-be-detected animal can be input into the gesture extraction model, so that the gesture extraction model can detect all skeletal key points of the target to-be-detected animal in the animal image frame by frame in the animal video to-be-detected video according to the position of the target to-be-detected animal detected by the target detection model.

In the specific implementation, an animal video to be detected is input into a target detection model, the target detection model detects the target animal to be detected in an animal image frame by frame, the position of the target animal to be detected in the animal image is obtained, and RGB information of the target animal to be detected is extracted from the animal image according to the position of the target animal to be detected, so that an RGB feature map is obtained.

Specifically, step S400 includes:

step S410: and extracting the attitude space features from the RGB feature map.

Step S420: and extracting the gesture time characteristics from the gesture heat map.

Step S430: and performing lameness detection based on the gesture space features and the gesture time features to obtain a lameness detection result.

In this embodiment, the lameness detection model includes two types of spatial flow branches and two types of temporal flow branches, and may combine the gesture spatial features and gesture temporal features of the target animal to be detected in the video of the animal to be detected in the motion process. The gesture space features are feature vectors corresponding to the RGB feature map, and the gesture time features are feature vectors corresponding to the gesture heat map.

Specifically, before step S600, the method further includes: constructing a first pre-training sample set and a second pre-training sample set, wherein the first pre-training sample set comprises a second other animal data set, the second other animal data set comprises a plurality of other animal images, the second pre-training sample set comprises a second target animal data set, and the second target animal data set is provided with a gesture feature label of a target animal; training optical flow branches of an initial space-time motion detection network by using a first pre-training sample set to obtain optical flow characteristics and optical flow frames; training the gesture flow branches of the initial space-time action detection network by using a second pre-training sample set to obtain gesture features; detecting an optical flow frame by utilizing an attitude flow branch to obtain an attitude classification feature; and adjusting the initial space-time action detection network according to the cosine distance between the optical flow characteristics and the gesture classification characteristics to obtain a space-time action detection network.

In this embodiment, the second other animal data set may be a public animal data set, and the second target animal data set may be a target animal data set labeled with a posture feature of the target animal. The initial spatio-temporal motion detection network may be based on a SlowFast network, introducing two time streams, including an optical flow branch and an attitude flow branch, to correspondingly process optical flow information and attitude flow information, respectively. The method comprises the steps of pre-training a SlowFast network by using a teacher-student distillation mode, training an optical flow branch through a public animal data set to serve as a teacher network, training an attitude flow branch through a target animal data set marked with attitude characteristics to serve as a student network, taking cosine distances between the optical flow characteristics output by the teacher network and the attitude flow characteristics output by the student network as hidden constraints, inputting optical flow frames into the student network to obtain attitude classification characteristics, comparing differences between the attitude flow characteristics and the attitude classification characteristics as additional constraints, adjusting model parameters of the attitude flow branch, and introducing a plurality of weak supervision losses to learn mapping relations between attitude flow information and optical flow information. After pre-training, optical flow branches in two time flows of the SlowFast network are removed, and an attitude flow branch in the two time flows and an RGB flow based on spatial information of the SlowFast network are reserved to obtain a spatial flow branch and a time flow branch of the lameness detection model.

In addition, because the animal video to be measured is longer, in order to reduce redundancy in the gesture heat map stacking, when the gesture heat map is drawn and the RGB feature map is obtained, frames can be randomly taken in the animal video to be measured according to a certain frequency, and the gesture heat map and the multi-frame RGB feature map are drawn.

Specifically, as an embodiment, as shown in fig. 6, step S430 includes:

step S431: and fusing the gesture space features and the gesture time features to obtain fused gesture features.

In this embodiment, before the lameness detection, features of two different modes, that is, a gesture space feature and a gesture time feature, are required to be performed, so that the lameness detection model performs the lameness detection. The fusion posture features comprise features of different modes of the same animal to be tested.

Specifically, step S431 includes: taking the gesture space features and the gesture time features as positive samples; taking other gesture space features and other gesture time features of other animals to be detected except the target animal to be detected in the animal image as negative samples; and according to the positive sample and the negative sample, fusing the gesture space characteristics and the gesture time characteristics to obtain fused gesture characteristics.

In this embodiment, the information of different modes of the target to-be-detected animal can be fused by combining contrast learning and using different mode characteristics of the same target to-be-detected animal as positive samples and different mode characteristics of different to-be-detected animals as negative samples. Namely, the gesture space feature and the gesture time feature are taken as positive samples; taking other gesture space features and other gesture time features of other animals to be detected except the target animal to be detected in the animal image as negative samples; and according to the positive sample and the negative sample, fusing the attitude space characteristics and the attitude time characteristics of the target to-be-tested animal in different modal characteristics of the plurality of to-be-tested animals to obtain the fused attitude characteristics of the target to-be-tested animal.

Step S432: and carrying out lameness detection based on the fusion posture characteristics to obtain a lameness detection result.

In this embodiment, the fusion gesture feature of the target animal to be detected is input into a lameness detection model, and lameness detection is performed to obtain a lameness detection result of the target animal to be detected.

The embodiment provides a lameness detection method, which combines gesture space characteristics and gesture time characteristics to detect lameness, and considers gesture characteristics in animal movement process in two dimensions in space and time. The method solves the problems that the animal lameness is detected from a single mode and the detection is inaccurate because the fitting is easy to happen under the conditions that the inter-class gap of the animal lameness samples is small, the intra-class variance is large and the number of the animal lameness samples is small.

Meanwhile, in the embodiment, the target detection model, the gesture extraction module and the lameness detection model are all trained by combining a public animal data set and a target animal data set. Therefore, only a small number of target animal data sets of different animals need to be marked for fine adjustment of the target detection model, the gesture extraction module and the lameness detection model, lameness detection of different animals can be achieved, the situation that local features of animal lameness can not be directly used on other animals is avoided, the generalization of the lameness detection method for detecting lameness of different animals is enhanced, and the expansibility of the lameness detection method is improved.

In addition, the embodiment of the invention further provides a computer storage medium, wherein a limp detection program is stored on the storage medium, and the limp detection program realizes the steps of the limp detection method or the finite state automaton matching method when being executed by a processor. Therefore, a detailed description will not be given here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application. As an example, the program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program for instructing relevant hardware, and that the above-described program may be stored in a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method of lameness detection, the method comprising:

extracting skeleton key points of the target animal to be detected in each animal image to obtain a key point frame;

performing lameness detection based on the gesture heat map to obtain a lameness detection result;

before the step of obtaining the video of the animal to be tested, the method further comprises:

based on the training sample set, training a small sample on a time-space motion detection network;

according to cosine similarity between the support characteristics and the query characteristics output by the space-time action detection network, adjusting the space-time action detection network to obtain a lameness detection model;

the lameness detection based on the gesture heat map, to obtain a lameness detection result, comprises:

performing lameness detection on the gesture heat map by using the lameness detection model to obtain a lameness detection result;

before the step of constructing the training sample set, the method further comprises:

constructing a first pre-training sample set and a second pre-training sample set, wherein the first pre-training sample set comprises a second other animal data set, the second other animal data set comprises a plurality of other animal images, the second pre-training sample set comprises a second target animal data set, and the second target animal data set is provided with a gesture feature label of the target animal;

training optical flow branches of an initial space-time motion detection network by using the first pre-training sample set to obtain optical flow characteristics and optical flow frames;

training the gesture flow branches of the initial space-time action detection network by using the second pre-training sample set to obtain gesture features;

detecting the optical flow frame by utilizing the gesture flow branches to obtain gesture classification features;

and adjusting the initial space-time action detection network according to the cosine distance between the optical flow characteristics and the gesture classification characteristics to obtain the space-time action detection network.

2. The method of claim 1, wherein the keypoint frame is a global pose map or a local pose map, the global pose map comprising a plurality of pose partitions corresponding to the local pose map;

drawing a posture heat map of the target to-be-detected animal based on all the key point frames, wherein the drawing comprises the following steps:

determining the gesture partition corresponding to the local gesture graph to obtain a local gesture partition;

and stacking all the local gesture graphs and/or all the global gesture graphs according to the local gesture partitions to obtain the gesture heat graph.

3. The method according to claim 2, wherein stacking all the local gesture graphs and/or all the global gesture graphs according to the local gesture partitions to obtain the gesture heat graph comprises:

and stacking all the gesture splicing images and/or all the global gesture images to obtain the gesture heat image.

4. The method of claim 1, wherein before the lameness detection based on the posing heat map, the method further comprises:

detecting the target animal to be detected in each animal image to obtain a multi-frame RGB feature map;

extracting attitude space features from the RGB feature map;

extracting from the gesture heat map to obtain gesture time features;

5. The method of claim 4, wherein the lameness detection based on the pose spatial features and the pose temporal features, results in the lameness detection, comprises:

6. The method of claim 5, wherein the fusing the pose spatial features and the pose temporal features to obtain fused pose features comprises:

taking the rest gesture space features and rest gesture time features of other animals to be tested except the target animal to be tested in the animal image as negative samples;

and according to the positive sample and the negative sample, fusing the gesture space characteristics and the gesture time characteristics to obtain the fused gesture characteristics.

7. A lameness detection device, characterized in that the device comprises: memory, a processor and a limp detection program stored on the memory and executable on the processor, by which limp detection program the steps of the limp detection method according to any one of claims 1 to 6 are configured to be implemented.

8. A computer-readable storage medium, characterized in that the storage medium has stored thereon a limp detection program, which when executed by a processor, implements the steps of the limp detection method as claimed in any one of claims 1 to 6.