WO2021243561A1 - 行为识别装置及方法 - Google Patents

行为识别装置及方法 Download PDF

Info

Publication number
WO2021243561A1
WO2021243561A1 PCT/CN2020/093926 CN2020093926W WO2021243561A1 WO 2021243561 A1 WO2021243561 A1 WO 2021243561A1 CN 2020093926 W CN2020093926 W CN 2020093926W WO 2021243561 A1 WO2021243561 A1 WO 2021243561A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
action
posture
target object
module
Prior art date
Application number
PCT/CN2020/093926
Other languages
English (en)
French (fr)
Inventor
黄康
韩亚宁
蔚鹏飞
王立平
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2020/093926 priority Critical patent/WO2021243561A1/zh
Publication of WO2021243561A1 publication Critical patent/WO2021243561A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • This application relates to the field of computer science and technology, in particular to a behavior recognition device and method.
  • Animal behavior research is one of the most basic research methods in the fields of neuroscience, cognitive psychology, and pharmacology. By observing the behavioral responses of animals, it is possible to verify the effects of neural circuit manipulation, cognitive psychological intervention, and drug effects. Animal behavior research has a long history. At the beginning, researchers used artificial observation to study animal behavior. The emergence of cameras has provided great convenience for animal behavior analysis. The use of video recording can record animal activities to the greatest extent, which is convenient Later review and analysis. In order to better quantify the animal's behavior in the video, digital image processing and other technologies are used to extract the animal's contour in the video, and then the animal's trajectory in the behavior can be obtained by centroidization and other methods, so as to evaluate the animal's activity amount and specific location Time to stay. However, this method of tracking trajectories largely ignores the rich movements of animals through limbs and organs, which greatly limits the evaluation of behavior.
  • machine learning has been widely used in application fields such as image recognition and video content recognition.
  • machine learning algorithms have also begun to recognize animal behavior.
  • existing animal behavior recognition methods usually do not consider the time scale of animal behavior.
  • the embodiments of the present application provide a behavior recognition device and method, which can realize unsupervised animal behavior decomposition, reduce data redundancy, and realize supervised behavior recognition.
  • an embodiment of the present application provides a behavior recognition device, which is applied to animal behavior recognition, and the device includes: a feature extraction module, an information decomposition module, and a behavior recognition module, wherein:
  • the feature extraction module is configured to extract the first group of body feature information corresponding to the time sequence of the target object from the first video;
  • the information decomposition module is configured to perform posture decomposition of the first group of body feature information to obtain a first group of posture information, and perform temporal dynamic clustering of the first group of posture information to obtain the first group of action information, based on the Calculating a first set of speed information of the target object with the first set of physical feature information, clustering the first set of speed information and the first set of action information to obtain the first set of action sequence information;
  • the behavior recognition module is configured to perform behavior recognition on the first video based on the first set of action sequence information, and output a behavior recognition result of the target object.
  • the device further includes an action recognition module
  • the action recognition module is configured to perform action recognition on the first video based on the first set of posture information, and output the action recognition result of the target object.
  • the information decomposition module is also specifically used for:
  • the information decomposition module is also specifically used for:
  • the first time range cluster the first set of posture information into the first set of action information including H action results, and if there are L action results in the first set of action information that are similar, then keep all In one of the L action results, the L is a positive integer greater than or equal to 2, and the H is a positive integer greater than or equal to L.
  • the behavior recognition module is also used to:
  • the first training set includes a first labeled second set of action sequence information, and the second set of action sequence information is obtained based on the second video ;
  • the behavior recognition module is specifically used for:
  • the first set of action sequence information is input into the behavior recognition model, and the behavior recognition result of the target object in the first video is output.
  • the action recognition module is also used to:
  • the action recognition module is specifically used for:
  • the first set of posture information is input to the action recognition model, and the action recognition result of the target object in the first video is output.
  • an embodiment of the present application provides a behavior recognition method, which is applied to animal behavior recognition, and the method includes:
  • the method further includes:
  • embodiments of the present application provide a computer device that includes a processor, a memory, and one or more programs, wherein the one or more programs are stored in the memory and are configured by The processor processes, and the program includes a method for executing the method according to any one of the second aspect.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium includes storing a computer program for data exchange, and the computer program is executed by a processor to realize Part or all of the steps described in the two aspects.
  • the embodiments of the present application provide a computer program product, wherein the above-mentioned computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the above-mentioned computer program is operable to cause a computer to execute as implemented in this application. Examples of part or all of the steps described in the second aspect.
  • the computer program product may be a software installation package.
  • the behavior recognition device and method described in the embodiments of the present application include a feature extraction module, a behavior decomposition module, and a behavior recognition module.
  • the feature extraction module is used to extract the time series correspondence of the target object from the first video.
  • the behavior decomposition module is used to decompose the first group of physical feature information to obtain the first group of posture information, and perform temporal dynamic clustering on the first group of posture information to obtain the first group of posture information.
  • the behavior decomposition module can decompose the behavior of animals into different time scales, that is, the posture layer, the action layer, and the behavior spectrum layer. Without manual marking, unsupervised animal behavior decomposition can be achieved, and the behavior decomposition module can decompose complex behaviors.
  • the original body feature data is simplified, the animal's movements are segmented and clustered, the data redundancy is reduced, and the calculation performance is improved; and the behavior recognition module automatically recognizes the decomposed action sequence information to realize supervised Behavior recognition.
  • Figure 1 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a behavior recognition device provided by an embodiment of the present application.
  • 3A is a schematic diagram of a process for extracting body feature points according to an embodiment of the present application.
  • FIG. 3B is a schematic diagram of a physical feature point mark provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another behavior recognition device provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a behavior recognition method provided by an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device may include a processor, a memory, and one or more programs.
  • the program is stored in the memory and is configured to be processed by the processor.
  • the computer device may also include a communication bus, an input device, and an output device, and the processor, memory, input device, and output device may be connected to each other through a bus.
  • the above-mentioned processor is configured to implement the following steps when executing the program stored in the memory:
  • the foregoing processor may be a central processing unit (Central Processing Unit, CPU), an intelligent processing unit (Intelligence Processing Unit, NPU), a graphics processing unit (Graphics Processing Unit, GPU), or an image processing unit (Image Processing Unit), This application does not limit this.
  • the behavior recognition method proposed in the embodiments of the present application can be used for behavior analysis of animals such as mice, monkeys, and rabbits.
  • FIG. 2 is a schematic structural diagram of a behavior recognition device 200 provided by an embodiment of the present application.
  • the behavior recognition device 200 includes: a feature extraction module 210, an information decomposition module 220, and a behavior recognition module 230, in which,
  • the feature extraction module 210 is configured to extract the time sequence of the target object corresponding to the first group of body feature information from the first video;
  • the information decomposition module 220 is configured to perform posture decomposition of the first group of body feature information to obtain a first group of posture information, and perform temporal dynamic clustering on the first group of posture information to obtain the first group of action information, based on the Calculating the first set of speed information of the target object with the first set of physical feature information, clustering the first set of speed information and the first set of action information to obtain the first set of action sequence information;
  • the behavior recognition module 230 is configured to perform behavior recognition on the first video based on the first set of action sequence information, and output the behavior recognition result of the target object.
  • the function of the feature lifting module 210 is to extract the first set of body feature information that characterizes the movement of the animal from the input original video, that is, the first video.
  • the movement of animals in a video is usually represented by pixel values, but directly using pixel values as the characterization data of animal movements will have data redundancy, and pixel values are also susceptible to noise. Therefore, in the embodiment of the present application, the feature extraction module 210 recognizes animal body parts such as limbs, head, nose, and tail from each frame of the video, so as to obtain the trajectories of these body parts over time. Further, the feature extraction module 210 may also include preprocessing operations, for example, abnormal point filtering, missing value estimation, and so on.
  • the feature extraction module 210 extracts the original body feature points corresponding to the target object in each frame of the image from the first video, and then performs preprocessing operations such as alignment and correction on the original body feature points to obtain the first set of body features information.
  • the body feature extraction model may be used to extract the original feature points of the body in the first video, as shown in FIG. 3A. First, randomly extract more than 300 frames of images from the animal behavior video used for training, and manually mark the animal's body feature points in each frame of the image.
  • the body feature extraction model to be trained is trained using the image marked with the body feature points to obtain the body feature extraction model.
  • the feature extraction module 210 uses the body feature extraction model to identify the body parts included in each frame of the image in the first video, and obtain the original body feature points of each frame of the image, that is, the first set of body feature information.
  • the body feature extraction model can use the toolkit DeepLabCut for animal feature extraction, and use the marked images to train DeepLabCut, and use the trained DeepLabCut to compare the target object in each frame of the first video. Recognize the body parts of each frame to obtain the original feature points of the body in each frame of the image.
  • the alignment process is used to adjust the body orientation of the target object in each frame of the image, that is, the body of the target object in the image after the alignment process is facing the same direction, for example, regardless of the body of the target object in the image at any time Regardless of the orientation, the head of the target object in the image is rotated to the west direction, so as to eliminate the influence of the head orientation on the body posture.
  • the correction process is used to correct the abnormal points in each frame of the image.
  • the correction processing can use median filtering to correct the abnormal points in the image.
  • the feature extraction module 210 may send the first group of physical feature information to the information decomposition module 220 through two branches connected to the information decomposition module 220 respectively.
  • the information decomposition module 220 may use the first set of physical feature information as the input for acquiring the first set of posture information; the information decomposition module 220 may also use the first physical feature information as the first set of calculations for the target object. Input of speed information.
  • the information decomposition module 220 is specifically configured to: use an unsupervised clustering algorithm to cluster the first set of physical feature information to obtain the first set of posture information including K posture results; if If the first set of posture information includes consecutively adjacent M posture results belonging to the same category, then one of the M posture results is retained, and the M is a positive integer greater than or equal to 2, so Said K is a positive integer greater than or equal to M.
  • the information decomposition module is further specifically configured to: cluster the first set of posture information into the first set of action information including H action results according to a first time range, if the first set of posture information If there are L action results in the group action information that are similar, then one of the L action results is retained.
  • the L is a positive integer greater than or equal to 2
  • the H is a positive integer greater than or equal to L.
  • animal behavior is like human language, which is composed of different levels of modular elements.
  • language generally consists of words, words, and sentences, which correspond to animal behaviors and should be composed of gestures, actions, and behaviors.
  • Pose refers to the form of an animal through its organs and limbs at any time, the result of the posture is the characteristics of the form of the animal’s organs and limbs in a frame of image, and the first set of posture information is the result of the posture in a frame of image;
  • the action is Refers to a motion unit composed of several consecutive gestures (for example, walking, sniffing), the action result is the collection of the first set of posture information in a specific time period, the first set of action information is all the action results in the first video, behavior It refers to a certain physiologically meaningful behavior (such as predation behavior) composed of several actions.
  • the function of the information decomposition module 220 is to hierarchically decompose the actions of the target object according to the characteristics of the animal's behavior. From the bottom to the top, the animal's behavior can be divided into three levels, namely, the posture layer, Action layer and behavior layer.
  • the information decomposition module 220 performs unsupervised clustering of the first group of body feature information extracted from the feature extraction module 210, thereby dividing it into a limited number of poses. Since adjacent poses have great similarity, adjacent poses may belong to the same category, so multiple consecutive poses belonging to the same category can be represented by one pose, that is, one of the adjacent multi-frame images Frame image to represent.
  • the embodiments of the present application can effectively reduce the time complexity of behavior recognition by reducing the dimensionality in terms of time.
  • the first group of physical feature information can be represented by a matrix of X ⁇ R d ⁇ n , corresponding to n d-dimensional vectors.
  • d represents the number of body feature points adopted by the target object
  • n represents the total number of frames included in the first video.
  • the information decomposition module 220 can reduce the dimensions of the n d-dimensional vectors into m d-dimensional vectors through a clustering algorithm. Specifically, the information decomposition module 220 performs unsupervised clustering of the first group of body feature information containing n d-dimensional vectors, and clusters the first group of body feature information representing the animal's posture into K posture results.
  • a set of posture information includes consecutively adjacent M posture results belonging to the same category, that is, the posture of the target object is the same or similar in this time period, then one of the posture results of the M posture results represents the time The posture results in the segment, discard the other posture results in the M posture results, for example, assuming that the first time range is 0.1s-0.3, the first group of posture information includes posture result 1, posture result 2, posture result 3, and posture Result 4: If posture result 1, posture result 2, and posture result 3 belong to the same type of posture, take posture result 2 as the posture result in this time period, delete posture result 1 and posture result 3, then the first group of posture information Include posture result 2 and posture result 4.
  • the middle one of the M posture results can be selected to represent the posture result within the time period, or the last posture result of the M posture results can be selected, and the embodiment of the present application is not limited to other selection methods.
  • the unsupervised clustering algorithm may adopt a K-means algorithm.
  • the posture decomposition processing can reduce the redundancy of data, improve the calculation performance, and can simplify the behavior of the target object, and convert theoretically infinite posture results into a finite number of posture results.
  • Decomposing the first group of body feature information to obtain the first group of posture information can be represented by X d ⁇ R d ⁇ m , which is used to indicate the m d-dimensional posture vectors after dimensionality reduction in time, and m is less than n.
  • the embodiment of the present application adopts a clustering algorithm to cluster similar actions in the first group of posture information, and further decompose the first group of action information.
  • the information decomposition module 220 takes the first group of posture information X d after time dimension reduction as input, in the time dimension, defines the first time range as the sampling point, and uses the dynamic time alignment clustering algorithm to divide the first group of posture information
  • the actions in are clustered, so that the first set of posture information is clustered into the first set of action information including H action results. Calculate the similarity between the H action results. If there are L action results in the first set of action information that are similar, that is, there are multiple action results that represent the same or similar actions, then keep the L action results One of the action results, discard the other action results in the L action results.
  • the information decomposition module 220 calculates a first set of speed information of the target object based on the first set of physical feature information, and uses the first set of speed information as a new dimension, which is similar to the first set of motion information. Clustering is performed again to obtain the first set of action sequence information, that is, the action segment of the first video after the action decomposition.
  • the clustering algorithm may adopt a hierarchical clustering algorithm.
  • the information decomposition module 220 can decompose the animal's behavior into different time scales, that is, the posture layer, the action layer, and the behavior layer, so that researchers can obtain the required time scale to quantify the animal's behavior. And statistics, and the information decomposition module 220 can take complex body features into time dimensions, reduce data redundancy, and improve recognition performance.
  • the first group of action sequence information output by the information decomposition module 220 can be used as the input of the behavior recognition module 230.
  • the behavior recognition module 230 may perform behavior recognition on the first set of action sequence information, and obtain the behavior recognition result of the target object in the first video.
  • the behavior recognition result may be a behavior with a certain physiological significance composed of several actions (for example, , Predation behavior, fighting behavior, etc.).
  • the behavior recognition module 230 is further configured to: use the first training set to train the behavior recognition model to be trained to obtain the behavior recognition model, and the first training set includes the first marked second set of action sequence information, so The second set of action sequence information is obtained based on the second video tag;
  • the behavior recognition module 230 is specifically configured to: input the first set of action sequence information into the behavior recognition model, and output the behavior recognition result of the target object in the first video.
  • the behavior of an animal is a behavior composed of multiple continuous action sequence information.
  • the behavior recognition module in the embodiment of the present application may use a semantic segmentation model in a machine learning algorithm. Before using the model, the first training set pair must be used. The semantic segmentation model is trained, and the first training set needs to manually mark the behavior part of interest.
  • the method of marking the first training set is as follows: input the second video into the feature extraction module 210, output the second set of physical feature information, input the second set of physical feature information into the information decomposition module 220, and output the second set of action sequence information.
  • the decomposed second set of action sequence information is used as the marking object, corresponding to the second video, and if the behavior of interest appears in the second video, all the action results corresponding to the behavior are marked as the behavior.
  • the marked second set of action sequence information requiring more than half an hour is used as the first training set.
  • the semantic segmentation model can be trained.
  • a supervised method can be used to perform behavior recognition on the first video.
  • the first group of action sequence information obtained by the first video through the feature extraction module 210 and the information decomposition module 220 is input into the trained semantic segmentation model, and the behavior performed by the target object in the first video is output.
  • the behavior recognition module 230 takes the action sequence information decomposed in the information decomposition module 220 as input, and the user only needs to mark the behavior data of interest as a training set for training the action recognition model, so that The supervised method automatically recognizes the animal's behavior from the video.
  • the device further includes an action recognition module 240;
  • the action recognition module 240 is configured to perform action recognition on the first video based on the first set of posture information, and output the action recognition result of the target object.
  • the first group of action sequence information output by the information decomposition module 220 can be used as the input of the action recognition module 240.
  • the action recognition module 240 can perform action recognition on the first set of action sequence information to obtain the action recognition result of the target object in the first video.
  • the action recognition result can be composed of several consecutive gesture results. Movement unit (for example, walking, sniffing, etc.).
  • the action recognition module 240 is further configured to train an action recognition model to be trained using a second training set to obtain an action recognition model, where the second training set includes a second set of sequence information with a second mark, and The second set of action sequence information is obtained based on the second video;
  • the action recognition module 240 is specifically configured to: input the first set of posture information into the action recognition model, and output the action recognition result of the target object in the first video.
  • the aforementioned action recognition model may use the Seq2Seq model used in natural language processing in the machine learning algorithm.
  • the second training set needs to be used to train the Seq2Seq model, and the second training set needs to be manually labeled.
  • the method of marking the second training set is: input the second video into the feature extraction module 210, output the second set of physical feature information, input the second set of physical feature information into the information decomposition module 220, output the second set of action sequence information, and then The decomposed second set of action sequence information is manually observed, and the actions performed in the second set of action sequence information are identified and marked, so as to give actual action meaning to each type of action.
  • the marked second set of action sequence information requiring more than 2 hours is used as the second training set.
  • the Seq2Seq model can be trained.
  • a supervised method can be used to perform action recognition on the first video.
  • the first set of posture information obtained by the first video through the feature extraction module 210 and the information decomposition module 220 is input into the trained Seq2Seq model, and the output The action performed by the target object in the first video.
  • the action recognition module 240 can mark and annotate each type of action in the decomposed action sequence information to generate a second training set to train the action recognition network model, thereby speeding up the labeling of animal actions, and
  • the supervised recognition method is used to recognize the action of the target object, which improves the accuracy of the action recognition.
  • the behavior recognition device 100 described in the embodiment of the present application includes a feature extraction module, an information decomposition module, and a behavior recognition module.
  • the feature extraction module is used to extract the time sequence corresponding to the target object from the first video.
  • the first group of body feature information the information decomposition module is used to decompose the first group of body feature information to obtain a first group of posture information, and perform temporal dynamic clustering of the first group of posture information to obtain the first group Action information, calculating a first set of speed information of the target object based on the first set of physical feature information, clustering the first set of speed information and the first set of action information to obtain a first set of action sequences Information
  • the behavior recognition module performs behavior recognition on the first video based on the first set of action sequence information, and outputs a behavior recognition result of the target object.
  • the information decomposition module can decompose the animal’s behavior into different time scales, namely, the posture layer, the action layer, and the behavior layer. Without manual marking, it can achieve unsupervised animal behavior decomposition, and the information decomposition module can decompose complex
  • the original body feature data is simplified, the animal's movements are segmented and clustered, which reduces the redundancy of the data and improves the calculation performance; and the behavior recognition module automatically recognizes the decomposed action sequence information to realize supervised behavior Recognition.
  • FIG. 5 is a schematic flowchart of a behavior recognition method provided by an embodiment of the present application, which is applied to animal behavior recognition. As shown in FIG. 5, the method includes the following steps:
  • S510 Extract the first group of body feature information corresponding to the time sequence of the target object from the first video.
  • S530 Perform behavior recognition on the first video based on the first set of action sequence information, and output a behavior recognition result of the target object.
  • the method further includes:
  • the posture decomposition of the first set of physical feature information to obtain the first set of posture information includes:
  • the performing temporal dynamic clustering of the first set of posture information to obtain the first set of action information includes:
  • the first time range cluster the first set of posture information into the first set of action information including H action results, and if there are L action results in the first set of action information that are similar, then keep all In one of the L action results, the L is a positive integer greater than or equal to 2, and the H is a positive integer greater than or equal to L.
  • the method further includes: using a first training set to train a behavior recognition model to be trained to obtain a behavior recognition model, where the first training set includes a first labeled second set of action sequence information, and the second set The action sequence information is obtained based on the second video;
  • the performing behavior recognition on the first video based on the first set of action sequence information and outputting the behavior recognition result of the target object includes: inputting the first set of action sequence information into the behavior recognition model, and outputting The behavior recognition result of the target object in the first video.
  • the method further includes: using a second training set to train an action recognition model to be trained to obtain an action recognition model, where the second training set includes a second set of second labeled sequence information, and the second set of actions The sequence information is obtained based on the second video;
  • the performing action recognition on the first video based on the first set of posture information and outputting the action recognition result of the target object includes:
  • the first set of posture information is input to the action recognition model, and the action recognition result of the target object in the first video is output.
  • An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any method as described in the above method embodiment .
  • the embodiments of the present application also provide a computer program product.
  • the above-mentioned computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the above-mentioned computer program is operable to cause a computer to execute any of the methods described in the above-mentioned method embodiments. Part or all of the steps of the method.
  • the computer program product may be a software installation package.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the above integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory.
  • a number of instructions are included to enable a computer device (which may be a personal computer, a terminal device, or a network device, etc.) to perform all or part of the steps of the foregoing methods of the various embodiments of the present application.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • the program can be stored in a computer-readable memory, and the memory can include: a flash disk , ROM, RAM, magnetic disk or CD, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

一种行为识别装置及方法,该装置包括特征提取模块(210)、信息分解模块(220)和行为识别模块(230),特征提取模块用于从第一视频中提取目标对象的时间序列对应的第一组身体特征信息;信息分解模块用于将第一组身体特征信息进行姿态分解得到第一组姿态信息,将第一组姿态信息进行时间动态聚类得到第一组动作信息,基于第一组身体特征信息计算目标对象的第一组速度信息,将第一组速度信息与第一组动作信息进行聚类,得到第一组动作序列信息;行为识别模块基于第一组动作序列信息对第一视频进行行为识别,输出目标对象的行为识别结果,能够实现无监督的动物行为分解,减少了数据的冗余性,并实现了有监督的行为识别。

Description

行为识别装置及方法 技术领域
本申请涉及计算机科学技术领域,具体涉及一种行为识别装置及方法。
背景技术
动物行为学研究是神经科学、认知心理学、药理学等领域里最基本的研究手段之一。通过观察动物的行为反应,从而能够验证神经环路操控、认知心理干预以及药物作用等产生的效果。动物行为研究的历史悠久,刚开始研究者通过人工观察的方式研究动物的行为,而摄像机的出现则为动物行为分析提供了极大的便利,采用视频记录可以最大程度地记录动物的活动,方便后期的回看与分析。为了能够更好的量化视频中动物的行为,利用数字图像处理等技术,提取动物在视频中的轮廓,进一步通过质心化等方式获取动物在行为中的轨迹,从而评估动物的活动量、特定位置停留的时间。但是这种通过追踪轨迹的方式很大程度上忽略了动物通过四肢以及器官表现出来的丰富动作,极大地限制了对行为的评估。
目前,随着机器学习技术的发展,机器学习已广泛应用于图像识别、视频内容识别等应用领域。在动物行为研究领域,也已经开始利用机器学习算法对动物行为进行识别。然而现有动物行为识别方法通常没有考虑动物行为的时间尺度。
发明内容
本申请实施例提供了一种行为识别装置及方法,能够实现无监督的动物行为分解,减少了数据的冗余性,并实现了有监督的行为识别。
第一方面,本申请实施例提供一种行为识别装置,应用于动物行为识别,所述装置包括:特征提取模块、信息分解模块和行为识别模块,其中,
所述特征提取模块,用于从第一视频中提取目标对象的时间序列对应的第一组身体特征信息;
所述信息分解模块,用于将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息与所述第一组动作信息进行聚类,得到第一组动作序列信息;
所述行为识别模块,用于基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果。
可选的,所述装置还包括动作识别模块;
所述动作识别模块,用于基于所述第一组姿态信息对所述第一视频进行动作识别,输出所述目标对象的动作识别结果。
可选的,所述信息分解模块还具体用于:
采用无监督聚类算法对所述第一组身体特征信息进行聚类,得到包括K个姿态结果的所述第一组姿态信息;若所述第一组姿态信息中包括属于同一类的连续相邻的M个姿态结果,则保留所述M个姿态结果中的其中一个姿态结果,所述M为大于或等于2的正整数,所述K为大于或等于M的正整数。
可选的,所述信息分解模块还具体用于:
根据第一时间范围,将所述第一组姿态信息聚类成包括H个动作结果的所述第一组动作信息,若所述第一组动作信息中有L个动作结果相似,则保留所述L个动作结果中的其中一个动作结果,所述L为大于或等于2的正整数,所述H为大于或等于L的正整数。
可选的,所述行为识别模块还用于:
使用第一训练集训练待训练行为识别模型,得到行为识别模型,所述第一训练集包括第一标记的第二组动作序列信息,所述第二组动作序列信息是基于第二视频得到的;
所述行为识别模块具体用于:
将所述第一组动作序列信息输入所述行为识别模型,输出所述第一视频中所述目标对象的行为识别结果。
可选的,所述动作识别模块还用于:
使用第二训练集训练待训练动作识别模型,得到动作识别模型,所述第二训练集包括第二标记的第二组序列信息,所述第二组动作序列信息是基于第二视频得到的;
所述动作识别模块具体用于:
将所述第一组姿态信息输入所述动作识别模型,输出所述第一视频中所述目标对象的动作识别结果。
第二方面,本申请实施例提供一种行为识别方法,应用于动物行为识别,所述方法包括:
从第一视频中提取目标对象的时间序列对应的的第一组身体特征信息;
将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息和所述第一组动作信息进行聚类,得到第一组动作序列信息;
基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果。
可选的,所述方法还包括:
基于所述第一组姿态信息对所述第一视频进行动作识别,输出所述目标对象的动作识别结果。
第三方面,本申请实施例提供一种计算机设备,该计算机设备包括处理器、存储器以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器处理,所述程序包括用于执行如第二方面任意一项所述的方法。
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质包括存储用于数据交换的计算机程序,所述计算机程序被处理器执行时实现如本申请实施例第二方面所描述的部分或全部步骤。
第五方面,本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第二方面所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。
可以看出,本申请实施例中所描述的行为识别装置及方法,包括特征提取模块、行为分解模块和行为识别模块,所述特征提取模块用于从第一视频中提取目标对象的时间序列对应的第一组身体特征信息,所述行为分解模块用于将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息与所述第一组动作信息进行聚类,得到第一组动作序列信息,所述行为识别模块基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果。本申请中,行为分解模块可以将动物的行为分解成不同的时间尺度,即姿态层、动作层和行为谱层,无需人工标记,可以实现无监督的动物行为分解,并且行为分解模块可以将复杂的原始身体特征数据进行简化,把动物的动作进行分割和聚类,减少了数据的冗余性,提高计算性能;而行为识别模块对分解后的动作序列信息进行自动化识别,实现了有监督 的行为识别。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种计算机设备的结构示意图
图2是本申请实施例提供的一种行为识别装置的结构示意图;
图3A是本申请实施例提供的一种提取身体特征点的流程示意图;
图3B是本申请实施例提供的一种身体特征点标记的示意图;
图4是本申请实施例提供的另一种行为识别装置的结构示意图;
图5本申请实施例提供的一种行为识别方法的流程示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结果或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
下面通过具体实施例,对本申请进行详细说明。
请参阅图1,图1是本申请实施例提供的一种计算机设备的结构示意图,如图1所示,该计算机设备可以包括处理器,存储器以及一个或多个程序,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器处理。其中,该计算机设备还可以包括通信总线、输入设备和输出设备,处理器、存储器、输入设备和输出设备之间可以通过总线相互连接。
其中,上述处理器,用于执行所述存储器上所存放的程序时,实现以下步骤:
从第一视频中提取目标对象的时间序列对应的第一组身体特征信息;
将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息和所述第一组动作信息进行聚类,得到第一组动作序列信息;
基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果。
进一步地,上述处理器可以是中央处理器(Central Processing Unit,CPU)、智能处理器(Intelligence Processing Unit,NPU)、图形处理器(Graphics Processing Unit,GPU)或图像 处理器(Image Processing Unit),本申请对此不做限定。根据该处理器,本申请实施例提出的行为识别方法可以用于老鼠、猴子、兔子等动物的行为分析。
请参阅图2,图2是本申请实施例提供的一种行为识别装置200的结构示意图,该行为识别装置200包括:特征提取模块210、信息分解模块220和行为识别模块230,其中,
所述特征提取模块210,用于从第一视频中提取目标对象的时间序列对应第一组身体特征信息;
所述信息分解模块220,用于将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息与所述第一组动作信息进行聚类,得到第一组动作序列信息;
所述行为识别模块230,用于基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果。
其中,特征提起模块210的作用是从输入的原始视频,即第一视频中提取表征动物运动的第一组身体特征信息。视频中动物的运动通常通过像素值来表示,但是直接利用像素值作为动物运动的表征数据会存在数据冗余,像素值也容易受到噪声影响。因此,在本申请实施例中,通过特征提取模块210从视频的每一帧图像中识别动物的四肢、头部、鼻子、尾巴等身体部位,从而获取这些身体部位随着时间变化产生的轨迹。进一步地,特征提取模块210还可以包括预处理操作,例如,异常点滤除、缺失值估计等。
具体地,特征提取模块210从第一视频中提取出每一帧图像中目标对象对应的身体原始特征点,再将该身体原始特征点进行对齐、矫正等预处理操作,得到第一组身体特征信息。在一些示例中,可以采用身体特征提取模型提取第一视频中的身体原始特征点,如图3A所示。首先从用于训练的动物行为视频中随机抽取300帧以上的图像,人工对每一帧图像中动物的身体特征点进行标记,例如,如图3B所示,标记出小白鼠的身体部位,即人工定义出动物身体的关键点,然后使用该标记过身体特征点的图像训练待训练身体特征提取模型,得到身体特征提取模型。特征提取模块210使用该身体特征提取模型对第一视频中每一帧图像包含的身体部位进行识别,得到每一帧图像的身体原始特征点,即第一组身体特征信息。在一些示例中,所述身体特征提取模型可以采用用于动物特征提取的工具包DeepLabCut,使用标记的图像对DeepLabCut进行训练,将训练完成后的DeepLabCut对第一视频中每一帧图像中目标对象的身体部位进行识别,得到每一帧图像的身体原始特征点。
进一步地,对齐处理是用于调整每一帧图像中目标对象的身体朝向,即通过对齐处理后的图像中目标对象的身体都朝向同一方向,例如,无论图像中的目标对象在任意时刻的身体处于何种朝向,都统一采取旋转的方式将图像中目标对象的头部朝向西方向,从而消除因头部朝向对于身体姿态的影响。矫正处理是用于矫正每一帧图像中的异常点。矫正处理可以采用中值滤波的方式对图像中的异常点进行矫正。
特征提取模块210可以分别通过与所述信息分解模块220相连的两个分支,向所述信息分解模块220发送所述第一组身体特征信息。信息分解模块220可以将所述第一组身体特征信息作为获取所述第一组姿态信息的输入;信息分解模块220也可以将所述第一身体特征信息作为计算所述目标对象的第一组速度信息的输入。
可选的,所述信息分解模块220具体用于:采用无监督聚类算法对将所述第一组身体特征信息进行聚类,得到包括K个姿态结果的所述第一组姿态信息;若所述第一组姿态信息中包括属于同一类的连续相邻的M个姿态结果,则保留所述M个姿态结果中的其中一个姿态结果,所述M为大于或等于2的正整数,所述K为大于或等于M的正整数。
可选的,所述信息分解模块还具体用于:根据第一时间范围,将所述第一组姿态信息聚类成包括H个动作结果的所述第一组动作信息,若所述第一组动作信息中有L个动作结果 相似,则保留所述L个动作结果中的其中一个动作结果,所述L为大于或等于2的正整数,所述H为大于或等于L的正整数。
其中,动物的行为就如同人类的语言一样,是由不同层次的模块元素组成的。例如,语言一般主要由字、词语、句子组成,对应到动物的行为,应该由姿态、动作和行为组成。姿态是指任意时刻动物通过器官和四肢表现出来的形态,则姿态结果为一帧图像中动物器官和四肢表现出来的形态的特征,第一组姿态信息为一帧图像中的姿态结果;动作是指若干个连续姿态组成的运动单元(例如,行走,嗅探),动作结果为在特定时间段内第一组姿态信息的集合,第一组动作信息为第一视频内的所有动作结果,行为则是指若干动作组成的具备一定生理意义的行为(例如捕食行为)。用户涉及到不同的科学问题时,通常关注的行为尺度也不一样,而现有的多数方法都混淆了行为的时间尺度,存在把动作当成行为,或者动作和行为在一个层面上进行统计对比。因此,在本申请实施例中,信息分解模块220的作用是根据动物行为的特性将所述目标对象的动作进行层次化分解,自底向上可以将动物的行为分成三个层次,即姿态层、动作层和行为层。
对于姿态层,信息分解模块220将从特征提出模块210提取出的第一组身体特征信息进行无监督的聚类,从而划分为有限个姿态。由于相邻姿态具有很大的相似性临近相邻的姿态可能属于同一类,因此可以将连续相邻且属于同一类的多个姿态用一个姿态来表示,即相邻的多帧图像用其中一帧图像来表示。本申请实施例通过从时间上降低维度,可以有效地降低行为识别的时间复杂性。
本申请通过身体特征点来表征动物的运动,第一组身体特征信息可以用X∈R d×n的矩阵表示,对应n个d维向量。其中,d表示所述目标对象采用的身体特征点的数量,n表示所述第一视频中包括的总帧数。信息分解模块220可以通过聚类算法将n个d维向量降维为m个d维向量。具体地,信息分解模块220将包含n个d维向量的第一组身体特征信息进行无监督聚类,将表示动物姿态的第一组身体特征信息聚类成K个姿态结果,若所述第一组姿态信息中包括属于同一类的连续相邻的M个姿态结果,即在该时间段内所述目标对象的姿态相同或相似,则用M个姿态结果中的其中一个姿态结果代表该时间段内的姿态结果,舍弃M个姿态结果中的其他姿态结果,例如,假设第一时间范围为0.1s-0.3,第一组姿态信息中包括姿态结果1、姿态结果2、姿态结果3和姿态结果4,若姿态结果1、姿态结果2、姿态结果3属于同一类的姿态,将姿态结果2作为在该时间段内的姿态结果,删除姿态结果1和姿态结果3,则第一组姿态信息中包括姿态结果2和姿态结果4。其中,可以选择M个姿态结果中的中间一个姿态结果代表该时间段内的姿态结果,也可以选择M个姿态结果中的最后一个姿态结果,本申请实施例也不局限于其他选择方法。
在一些示例中,所述无监督聚类算法可以采用K-均值(K-means)算法。在本申请实施例中,通过姿态分解处理可以减少数据的冗余性,提高计算性能,并且可以简化目标对象的行为,将理论上无穷多的姿态结果转换成有限个姿态结果。将第一组身体特征信息进行姿态分解后得到第一组姿态信息可以用X d∈R d×m表示,用于指示时间降维后的m个d维的姿态向量,m小于n。
其中,对于动作层,本申请实施例采用聚类算法,将第一组姿态信息中相似的动作进行聚类,进一步分解出所述第一组动作信息。具体地,信息分解模块220将时间降维后的第一组姿态信息X d作为输入,在时间维度上,定义第一时间范围作为采样点,使用动态时间对齐聚类算法将第一组姿态信息中的动作进行聚类,从而将第一组姿态信息聚类成包括H个动作结果的所述第一组动作信息。计算H个动作结果之间的相似性,若所述第一组动 作信息中有L个动作结果相似,即存在多个动作结果表征的动作是相同或相似,则保留所述L个动作结果中的其中一个动作结果,舍弃L个动作结果中的其他动作结果。
进一步地,信息分解模块220基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息作为一个新的维度,与所述第一组动作信息重新进行聚类,得到第一组动作序列信息,即第一视频进行行为分解后的动作片段。在一些示例中,该聚类算法可以采用层次聚类算法。
在本申请实施例中,信息分解模块220可以将动物的行为分解成不同的时间尺度,即姿态层、动作层和行为层,以便于研究人员获取所需的时间尺度来对动物的行为进行量化和统计,并且信息分解模块220可以将复杂的身体特征进行时间维度,减少了数据的冗余性,提高识别性能。
信息分解模块220输出的第一组动作序列信息可以作为行为识别模块230的输入。行为识别模块230可以对所述第一组动作序列信息进行行为识别,得到第一视频中所述目标对象的行为识别结果,行为识别结果可以是由若干动作组成的具备一定生理意义的行为(例如,捕食行为、打斗行为等)。
可选的,所述行为识别模块230还用于:使用第一训练集训练待训练行为识别模型,得到行为识别模型,所述第一训练集包括第一标记的第二组动作序列信息,所述第二组动作序列信息是基于所述第二视频标记得到的;
所述行为识别模块230具体用于:将所述第一组动作序列信息输入所述行为识别模型,输出所述第一视频中所述目标对象的行为识别结果。
其中,动物的行为是由多个连续动作序列信息构成的行为,本申请实施例中的行为识别模块可以采用机器学习算法中的语义分割模型,在使用该模型之前,需要使用第一训练集对语义分割模型进行训练,第一训练集需要人工标记感兴趣的行为部分。第一训练集标记的方式为:将第二视频输入特征提取模块210,输出第二组身体特征信息,将第二组身体特征信息输入信息分解模块220,输出第二组动作序列信息。然后以分解出的第二组动作序列信息作为标记对象,对应第二视频,若第二视频中出现感兴趣的行为,则将该行为对应的动作结果全部标记为该行为。在具体的实施例中,需要半个小时以上的标记的第二组动作序列信息作为第一训练集。第一训练集产生后就可以训练语义分割模型,语义分割模型训练好后可以利用有监督的方法对第一视频进行行为识别。将第一视频通过特征提取模块210和信息分解模块220得到的第一组动作序列信息输入到训练后的语义分割模型中,输出第一视频中目标对象进行的行为。
在本申请实施例中,行为识别模块230以信息分解模块220中分解出的动作序列信息作为输入,用户只需标记自己感兴趣的行为数据作为训练集用于训练动作识别模型,从而可以利用有监督的方法对自动化的从视频中识别出动物的行为。
可选的,所述装置还包括动作识别模块240;
所述动作识别模块240,用于基于所述第一组姿态信息对所述第一视频进行动作识别,输出所述目标对象的动作识别结果。
其中,信息分解模块220输出的第一组动作序列信息可以作为动作识别模块240的输入。如图4所示,动作识别模块240可以对所述第一组动作序列信息进行动作识别,得到第一视频中所述目标对象的动作识别结果,动作识别结果可以是若干个连续姿态结果组成的运动单元(例如,行走,嗅探等)。
可选的,所述动作识别模块240还用于:使用第二训练集训练待训练动作识别模型,得到动作识别模型,所述第二训练集包括第二标记的第二组序列信息,所述第二组动作序列信息是基于第二视频得到的;
所述动作识别模块240具体用于:将所述第一组姿态信息输入所述动作识别模型,输 出所述第一视频中所述目标对象的动作识别结果。
其中,上述动作识别模型可以采用机器学习算法中自然语言处理用到的Seq2Seq模型。在使用该Seq2Seq模型之前,需要使用第二训练集对Seq2Seq模型进行训练,第二训练集需要人工标记。第二训练集标记的方式为:将第二视频输入特征提取模块210,输出第二组身体特征信息,将第二组身体特征信息输入信息分解模块220,输出第二组动作序列信息,然后对分解出的第二组动作序列信息进行人工观察,鉴别出第二组动作序列信息中进行的动作并标记,以给每一类的动作赋予实际的动作含义。在具体的实施例中,需要2个小时以上的标记的第二组动作序列信息作为第二训练集。第二训练集产生后就可以训练Seq2Seq模型。Seq2Seq模型训练好后可以利用有监督的方法对第一视频进行动作识别,将第一视频通过特征提取模块210和信息分解模块220得到的第一组姿态信息输入到训练后的Seq2Seq模型中,输出第一视频中目标对象进行的动作。
在本申请实施例中,动作识别模块240可以将分解出的动作序列信息中的每一类动作进行标记注释生成第二训练集来训练动作识别网络模型,从而加快了对动物动作的标记,并且利用有监督的识别方法对目标对象的动作进行识别,提高了动作识别的准确性。
可以看出,本申请实施例中所描述的行为识别装置100,包括特征提取模块、信息分解模块和行为识别模块,所述特征提取模块用于从第一视频中提取目标对象的时间序列对应的第一组身体特征信息,所述信息分解模块用于将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息与所述第一组动作信息进行聚类,得到第一组动作序列信息,所述行为识别模块基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果。本申请中,信息分解模块可以将动物的行为分解成不同的时间尺度,即姿态层、动作层和行为层,无需人工标记,可以实现无监督的动物行为分解,并且信息分解模块可以将复杂的原始身体特征数据进行简化,把动物的动作进行分割和聚类,减少了数据的冗余性,提高计算性能;而行为识别模块对分解后的动作序列信息进行自动化识别,实现了有监督的行为识别。
请参阅图5,图5是本申请实施例提供的一种行为识别方法的流程示意图,应用于动物行为识别,如图5所示,该方法包括如下步骤:
S510、从第一视频中提取目标对象的时间序列对应的第一组身体特征信息。
S520、将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息和所述第一组动作信息进行聚类,得到第一组动作序列信息。
S530、基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果。
可选的,所述方法还包括:
基于所述第一组姿态信息对所述第一视频进行动作识别,输出所述目标对象的动作识别结果。
可选的,所述将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,包括:
采用无监督聚类算法对所述第一组身体特征信息进行聚类,得到包括K个姿态结果的所述第一组姿态信息;若所述第一组姿态信息中包括属于同一类的连续相邻的M个姿态结果,则保留所述M个姿态结果中的其中一个姿态结果,所述M为大于或等于2的正整数,所述K为大于或等于M的正整数。
可选的,所述将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,包括:
根据第一时间范围,将所述第一组姿态信息聚类成包括H个动作结果的所述第一组动作信息,若所述第一组动作信息中有L个动作结果相似,则保留所述L个动作结果中的其中一个动作结果,所述L为大于或等于2的正整数,所述H为大于或等于L的正整数。
可选的,所述方法还包括:使用第一训练集训练待训练行为识别模型,得到行为识别模型,所述第一训练集包括第一标记的第二组动作序列信息,所述第二组动作序列信息是基于第二视频得到的;
所述基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果,包括:将所述第一组动作序列信息输入所述行为识别模型,输出所述第一视频中所述目标对象的行为识别结果。
可选的,所述方法还包括:使用第二训练集训练待训练动作识别模型,得到动作识别模型,所述第二训练集包括第二标记的第二组序列信息,所述第二组动作序列信息是基于第二视频得到的;
所述基于所述第一组姿态信息对所述第一视频进行动作识别,输出所述目标对象的动作识别结果,包括:
将所述第一组姿态信息输入所述动作识别模型,输出所述第一视频中所述目标对象的动作识别结果。
可以理解的是,本申请实施例的处理方法的具体实现方式可根据上述处理装置实施例中的具体实现,其具体实现过程可以参照上述装置实施例的相关描述,此处不再赘述。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。
本申请实施例还提供一种计算机程序产品,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可 以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、终端设备或者网络设备等)执行本申请各个实施例上述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、ROM、RAM、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种行为识别装置,其特征在于,应用于动物行为识别,所述装置包括:特征提取模块、信息分解模块和行为识别模块,其中,
    所述特征提取模块,用于从第一视频中提取目标对象的时间序列对应的第一组身体特征信息;
    所述信息分解模块,用于将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息与所述第一组动作信息进行聚类,得到第一组动作序列信息;
    所述行为识别模块,用于基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果。
  2. 根据权利要求1所述的装置,其特征在于,所述装置还包括动作识别模块;
    所述动作识别模块,用于基于所述第一组姿态信息对所述第一视频进行动作识别,输出所述目标对象的动作识别结果。
  3. 根据权利要求1所述的装置,其特征在于,所述信息分解模块还具体用于:
    采用无监督聚类算法对所述第一组身体特征信息进行聚类,得到包括K个姿态结果的所述第一组姿态信息;若所述第一组姿态信息中包括属于同一类的连续相邻的M个姿态结果,则保留所述M个姿态结果中的其中一个姿态结果,所述M为大于或等于2的正整数,所述K为大于或等于M的正整数。
  4. 根据权利要求3所述的装置,其特征在于,所述信息分解模块还具体用于:
    根据第一时间范围,将所述第一组姿态信息聚类成包括H个动作结果的所述第一组动作信息,若所述第一组动作信息中有L个动作结果相似,则保留所述L个动作结果中的其中一个动作结果,所述L为大于或等于2的正整数,所述H为大于或等于L的正整数。
  5. 根据权利要求1-4任一项所述的装置,其特征在于,所述行为识别模块还用于:
    使用第一训练集训练待训练行为识别模型,得到行为识别模型,所述第一训练集包括第一标记的第二组动作序列信息,所述第二组动作序列信息是基于第二视频得到的;
    所述行为识别模块具体用于:
    将所述第一组动作序列信息输入所述行为识别模型,输出所述第一视频中所述目标对象的行为识别结果。
  6. 根据权利要求2所述的装置,其特征在于,所述动作识别模块还用于:
    使用第二训练集训练待训练动作识别模型,得到动作识别模型,所述第二训练集包括第二标记的第二组序列信息,所述第二组动作序列信息是基于第二视频得到的;
    所述动作识别模块具体用于:
    将所述第一组姿态信息输入所述动作识别模型,输出所述第一视频中所述目标对象的动作识别结果。
  7. 一种行为识别方法,其特征在于,应用于动物行为识别,所述方法包括:
    从第一视频中提取目标对象的时间序列对应的的第一组身体特征信息;
    将所述第一组身体特征信息进行姿态分解得到第一组姿态信息,将所述第一组姿态信息进行时间动态聚类得到第一组动作信息,基于所述第一组身体特征信息计算所述目标对象的第一组速度信息,将所述第一组速度信息和所述第一组动作信息进行聚类,得到第一组动作序列信息;
    基于所述第一组动作序列信息对所述第一视频进行行为识别,输出所述目标对象的行为识别结果。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    基于所述第一组姿态信息对所述第一视频进行动作识别,输出所述目标对象的动作识别结果。
  9. 一种计算机设备,其特征在于,包括处理器、存储器以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器处理,所述程序包括用于执行如权利要求7或8方法中的步骤的指令。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质包括存储用于数据交换的计算机程序,所述计算机程序被处理器执行时实现如权利要求7或8所述的方法。
PCT/CN2020/093926 2020-06-02 2020-06-02 行为识别装置及方法 WO2021243561A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/093926 WO2021243561A1 (zh) 2020-06-02 2020-06-02 行为识别装置及方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/093926 WO2021243561A1 (zh) 2020-06-02 2020-06-02 行为识别装置及方法

Publications (1)

Publication Number Publication Date
WO2021243561A1 true WO2021243561A1 (zh) 2021-12-09

Family

ID=78831652

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093926 WO2021243561A1 (zh) 2020-06-02 2020-06-02 行为识别装置及方法

Country Status (1)

Country Link
WO (1) WO2021243561A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596530A (zh) * 2022-03-23 2022-06-07 中国航空油料有限责任公司浙江分公司 一种基于非接触式光学ai的飞机加油智能管理方法及装置
CN115100745A (zh) * 2022-07-05 2022-09-23 北京甲板智慧科技有限公司 基于Swin Transformer模型的运动实时计数方法和系统
CN116912947A (zh) * 2023-08-25 2023-10-20 东莞市触美电子科技有限公司 智能屏幕、屏幕控制方法、装置、设备及其存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528586A (zh) * 2016-05-13 2017-03-22 上海理工大学 一种人体行为视频识别方法
CN108305283A (zh) * 2018-01-22 2018-07-20 清华大学 基于深度相机和基本姿势的人体行为识别方法及装置
CN108596068A (zh) * 2018-04-17 2018-09-28 广东工业大学 一种动作识别的方法和装置
JP2019053647A (ja) * 2017-09-19 2019-04-04 富士ゼロックス株式会社 行動推定装置及び行動推定プログラム
CN110298332A (zh) * 2019-07-05 2019-10-01 海南大学 行为识别的方法、系统、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528586A (zh) * 2016-05-13 2017-03-22 上海理工大学 一种人体行为视频识别方法
JP2019053647A (ja) * 2017-09-19 2019-04-04 富士ゼロックス株式会社 行動推定装置及び行動推定プログラム
CN108305283A (zh) * 2018-01-22 2018-07-20 清华大学 基于深度相机和基本姿势的人体行为识别方法及装置
CN108596068A (zh) * 2018-04-17 2018-09-28 广东工业大学 一种动作识别的方法和装置
CN110298332A (zh) * 2019-07-05 2019-10-01 海南大学 行为识别的方法、系统、计算机设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DENG, TIANTIAN: "Human Activity Recognition Research Based on Hierarchical Model", CHINESE MASTER’S THESES FULL-TEXT DATABASE, INFORMATION SCIENCE AND TECHNOLOGY, no. 1, 15 January 2011 (2011-01-15), pages 1 - 70, XP055876543, ISSN: 1674-0246 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596530A (zh) * 2022-03-23 2022-06-07 中国航空油料有限责任公司浙江分公司 一种基于非接触式光学ai的飞机加油智能管理方法及装置
CN115100745A (zh) * 2022-07-05 2022-09-23 北京甲板智慧科技有限公司 基于Swin Transformer模型的运动实时计数方法和系统
CN116912947A (zh) * 2023-08-25 2023-10-20 东莞市触美电子科技有限公司 智能屏幕、屏幕控制方法、装置、设备及其存储介质
CN116912947B (zh) * 2023-08-25 2024-03-12 东莞市触美电子科技有限公司 智能屏幕、屏幕控制方法、装置、设备及其存储介质

Similar Documents

Publication Publication Date Title
WO2021143353A1 (zh) 一种手势信息处理方法、装置、电子设备及存储介质
Zhang et al. Facial expression analysis under partial occlusion: A survey
Uddin et al. Depression level prediction using deep spatiotemporal features and multilayer bi-ltsm
WO2021243561A1 (zh) 行为识别装置及方法
Neverova et al. Moddrop: adaptive multi-modal gesture recognition
Jiang et al. A survey on artificial intelligence in Chinese sign language recognition
Li et al. Data-free prior model for facial action unit recognition
Xu et al. A hierarchical spatio-temporal model for human activity recognition
CN109815826A (zh) 人脸属性模型的生成方法及装置
Onal Ertugrul et al. D-pattnet: Dynamic patch-attentive deep network for action unit detection
Tariq et al. Recognizing emotions from an ensemble of features
Oveisi et al. Tree-structured feature extraction using mutual information
Chen et al. Automated pain detection from facial expressions using facs: A review
Prasetio et al. The facial stress recognition based on multi-histogram features and convolutional neural network
CN110909680A (zh) 人脸图像的表情识别方法、装置、电子设备及存储介质
CN110705490B (zh) 视觉情感识别方法
Yan et al. RAF-AU database: in-the-wild facial expressions with subjective emotion judgement and objective au annotations
Yan et al. A lightweight weakly supervised learning segmentation algorithm for imbalanced image based on rotation density peaks
CN112418166A (zh) 一种基于多模态信息的情感分布学习方法
Dantcheva et al. Expression recognition for severely demented patients in music reminiscence-therapy
CN110083724B (zh) 一种相似图像检索方法、装置及系统
CN114781441A (zh) Eeg运动想象分类方法及多空间卷积神经网络模型
Huang et al. Identifying user-specific facial affects from spontaneous expressions with minimal annotation
Sun et al. General-to-specific learning for facial attribute classification in the wild
Jia et al. An action unit co-occurrence constraint 3DCNN based action unit recognition approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20939330

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20939330

Country of ref document: EP

Kind code of ref document: A1