CN109800659B - Action recognition method and device - Google Patents

Action recognition method and device Download PDF

Info

Publication number
CN109800659B
CN109800659B CN201811604771.6A CN201811604771A CN109800659B CN 109800659 B CN109800659 B CN 109800659B CN 201811604771 A CN201811604771 A CN 201811604771A CN 109800659 B CN109800659 B CN 109800659B
Authority
CN
China
Prior art keywords
bone
sequence
image
channel
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811604771.6A
Other languages
Chinese (zh)
Other versions
CN109800659A (en
Inventor
张一帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Institute of Automation of Chinese Academy of Science
Original Assignee
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences, Institute of Automation of Chinese Academy of Science filed Critical Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Priority to CN201811604771.6A priority Critical patent/CN109800659B/en
Publication of CN109800659A publication Critical patent/CN109800659A/en
Application granted granted Critical
Publication of CN109800659B publication Critical patent/CN109800659B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention provides a method and a device for recognizing actions, which comprise the following steps: acquiring skeleton data of an object to be identified in a video during motion; generating a bone sequence of the object to be identified according to the bone data; generating a bone feature image corresponding to the bone sequence, wherein the bone feature image comprises a plurality of bone points; and inputting the bone characteristic image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone characteristic image. The invention converts the problem of motion recognition into the problem of bone sequence image classification, converts the bone sequence into the bone characteristic image, and then classifies the bone characteristic image, so that the recognition is more accurate and the efficiency is higher.

Description

Action recognition method and device
Technical Field
The invention relates to the field of identification, in particular to a method and a device for identifying actions.
Background
Human motion recognition has a variety of modalities such as appearance, depth, optical flow, and body skeleton. Among these modalities, dynamic human bone often complements other modalities, conveying important information. Therefore, human motion recognition can be performed by the bone sequence.
However, the existing method for recognizing the motion of the skeleton is to directly connect the coordinates of the skeleton points in series into a one-dimensional long vector and perform time sequence analysis on the one-dimensional long vector, and the recognition method has low accuracy.
Therefore, the present invention provides a method and an apparatus for motion recognition to overcome the disadvantages of the prior art.
Disclosure of Invention
The invention aims to provide a motion recognition method and a motion recognition device, which solve the problem that the motion category recognition of the current bone sequence is inaccurate.
According to an aspect of the present invention, there is provided a motion recognition method including:
acquiring skeleton data of an object to be identified in a video during motion;
generating a bone sequence of the object to be identified according to the bone data;
generating a bone feature image corresponding to the bone sequence, wherein the bone feature image comprises a plurality of bone points;
and inputting the bone characteristic image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone characteristic image.
Further, generating a bone feature image corresponding to the bone sequence, comprising:
arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
arranging the three-channel data into a three-channel matrix according to a time sequence;
and carrying out normalization processing on the three-channel matrix to obtain a bone characteristic image.
Further, normalizing the three-channel matrix to obtain a bone feature image, including:
the normalization is shown as follows:
Figure BDA0001923354900000021
wherein the content of the first and second substances,
Figure BDA0001923354900000022
the pixel value of the position coordinate (i, j) on the c channel of the bone feature image is obtained;
Figure BDA0001923354900000023
and
Figure BDA0001923354900000024
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function.
Further, inputting the bone feature image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone feature image, including:
extracting the characteristics of the bone characteristic image by using a preset convolutional neural network model;
converting the features into feature vectors using a full-connectivity layer;
and determining the type of the bone feature image according to the feature vector, wherein the type is the action category of the object to be identified.
According to another aspect of the present invention, there is disclosed a motion recognition apparatus comprising:
the acquisition module is used for acquiring bone data of an object to be identified in the video during motion;
the generating module is used for generating a bone sequence of the object to be identified according to the bone data;
the characteristic image module is used for generating a bone characteristic image corresponding to the bone sequence, and the bone characteristic image comprises a plurality of bone points;
and the determining module is used for inputting the bone characteristic image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone characteristic image.
Further, the feature image module includes:
the first sequencing submodule is used for arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
the second sequencing submodule is used for arranging the three-channel data into a three-channel matrix according to a time sequence;
and the normalization submodule is used for performing normalization processing on the three-channel matrix to obtain a bone characteristic image.
Further, the normalization sub-module is configured to,
the normalization is shown as follows:
Figure BDA0001923354900000031
wherein the content of the first and second substances,
Figure BDA0001923354900000032
at position coordinate (i, j) on the c-th channel of the bone feature imageA pixel value;
Figure BDA0001923354900000041
and
Figure BDA0001923354900000042
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function.
Further, the determining module comprises:
the extraction submodule is used for extracting the characteristics of the bone characteristic image by utilizing a preset convolutional neural network model;
a conversion submodule for converting the features into feature vectors using a full connection layer;
and the determining submodule is used for determining the type of the bone feature image according to the feature vector, wherein the type is the action category of the object to be identified.
Compared with the closest prior art, the technical scheme has the beneficial effects that:
the technical scheme provided by the invention is that the bone data of an object to be identified in a video during motion is obtained, then the bone data is converted into a bone sequence of the object to be identified, the bone sequence is converted into a bone characteristic image, all bone points in the bone characteristic image are sequenced by using a preset replacement network, and finally the sequenced bone characteristic image is classified by using a convolutional neural network to obtain the action characteristic corresponding to the bone characteristic image. The invention converts the problem of motion recognition into the problem of bone sequence image classification, converts the bone sequence into the bone characteristic image, and then classifies the bone characteristic image, so that the recognition is more accurate and the efficiency is higher.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the present invention provides an action recognition method, which comprises the following steps:
s101, obtaining skeleton data of an object to be identified in a video during motion;
s102, generating a bone sequence of the object to be identified according to the bone data;
s103, generating a bone characteristic image corresponding to the bone sequence, wherein the bone characteristic image comprises a plurality of bone points;
and S104, inputting the bone feature image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone feature image.
In the embodiment of the application, bone data of an object to be identified in a video during motion are acquired, then the bone data are converted into a bone sequence of the object to be identified, the bone sequence is converted into a bone characteristic image, and finally the bone characteristic image is classified by using a convolutional neural network to obtain action characteristics corresponding to the bone characteristic image. The invention converts the problem of motion recognition into the problem of bone sequence image classification, converts the bone sequence into the bone characteristic image, and then classifies the bone characteristic image, so that the recognition is more accurate and the efficiency is higher.
In some embodiments of the present application, given a bone sequence v of a T frame, the coordinates of the kth bone point in the T frame are expressed as
Figure BDA0001923354900000061
Wherein T is ∈ [1,2, …, T],k∈[1,2,…,N]And N represents the number of skeletal points in a frame. The skeletal data of the t-th frame is denoted St={J1,J2,…,JN}. Generating a bone feature image corresponding to the bone sequence mainly comprises three steps:
firstly, arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
step two, arranging the three-channel data into a three-channel matrix according to a time sequence;
and thirdly, carrying out normalization processing on the three-channel matrix to obtain a bone characteristic image.
In step one, the coordinates (x, y, z) in three dimensions are considered as three channels. Taking x dimension as an example, let StThe x dimension in (a) is taken in a predefined order O ═ O1,o2,…,ok,…,oK) Arranging into a feature vector to obtain a feature vector ftX channel feature vector ft xWherein
Figure BDA0001923354900000062
The order of arrangement O determines the proximity of skeletal points in the image. In step two, three-channel bone characteristics of all frames are determined
Figure BDA0001923354900000063
And a three-channel matrix M is arranged according to the time sequence. Taking the x-channel as an example,
Figure BDA0001923354900000064
the size of M is 3 × T × K, where T is the length of the video sequence and K is the length of the permutation order O.
In step three, M is normalized and quantized to an RGB image I as follows:
Figure BDA0001923354900000065
wherein the content of the first and second substances,
Figure BDA0001923354900000066
is the pixel value at position coordinate (I, j) on the c-th channel of image I;
Figure BDA0001923354900000067
and
Figure BDA0001923354900000068
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function. The normalization is performed by subtracting the minimum value in the channel from the element in the matrix and dividing by the maximum value variation in all channels. The values are then quantized to [0, 255 ]]The interval of (2).
In some embodiments of the present application, inputting a bone feature image into a preset convolutional neural network model for classification, and obtaining an action category corresponding to the bone feature image, includes:
extracting the characteristics of the bone characteristic image by using a preset convolutional neural network model;
converting the features into feature vectors using a full-connectivity layer;
and determining the type of the bone feature image according to the feature vector, wherein the type is the action category of the object to be identified.
The invention also provides a motion recognition device based on the same inventive concept, which comprises:
the acquisition module is used for acquiring bone data of an object to be identified in the video during motion;
the generating module is used for generating a bone sequence of the object to be identified according to the bone data;
the characteristic image module is used for generating a bone characteristic image corresponding to the bone sequence, and the bone characteristic image comprises a plurality of bone points;
and the determining module is used for inputting the bone characteristic image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone characteristic image.
Optionally, the feature image module includes:
the first sequencing submodule is used for arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
the second sequencing submodule is used for arranging the three-channel data into a three-channel matrix according to a time sequence;
and the normalization submodule is used for performing normalization processing on the three-channel matrix to obtain a bone characteristic image.
Optionally, the normalization sub-module is configured to,
the normalization is shown as follows:
Figure BDA0001923354900000081
wherein the content of the first and second substances,
Figure BDA0001923354900000082
the pixel value of the position coordinate (i, j) on the c channel of the bone feature image is obtained;
Figure BDA0001923354900000083
and
Figure BDA0001923354900000084
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function.
Optionally, the determining module includes:
the extraction submodule is used for extracting the characteristics of the bone characteristic image by utilizing a preset convolutional neural network model;
a conversion submodule for converting the features into feature vectors using a full connection layer;
and the determining submodule is used for determining the type of the bone feature image according to the feature vector, wherein the type is the action feature of the object to be identified.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A motion recognition method, comprising:
acquiring skeleton data of an object to be identified in a video during motion;
generating a bone sequence of the object to be identified according to the bone data;
generating a bone characteristic image corresponding to the bone sequence, wherein the bone characteristic image comprises a plurality of bone points, and sequencing all the bone points in the bone characteristic image by using a preset replacement network;
inputting the sequenced bone characteristic images into a preset convolutional neural network model for classification to obtain action categories corresponding to the bone characteristic images;
wherein generating a bone feature image corresponding to the bone sequence comprises:
arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
arranging the three-channel data into a three-channel matrix according to a time sequence;
carrying out normalization processing on the three-channel matrix to obtain a bone characteristic image;
the method comprises the following steps of arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence:
will be provided with
Figure 719259DEST_PATH_IMAGE001
The x dimension value in (1) is according to a predefined sequence
Figure 251872DEST_PATH_IMAGE002
Arranging into a feature vector to obtain a feature vector ftX channel feature vector of
Figure 27936DEST_PATH_IMAGE003
Wherein, in the step (A),
Figure 103339DEST_PATH_IMAGE004
the arrangement order O determines the proximity of the bone points in the image,
Figure 362282DEST_PATH_IMAGE001
representing the t frame bone data, and N representing the number of bone points in one frame;
the process of arranging the three-channel data into a three-channel matrix according to the time sequence is as follows:
three-channel bone characteristics of all frames
Figure 382191DEST_PATH_IMAGE005
A three-channel matrix M is arranged according to the time sequence, wherein in the x channel, the matrix
Figure 774995DEST_PATH_IMAGE006
M has a size of 3 × T × K, T being the length of the video sequence, and K being the length of the arrangement order O.
2. The method of claim 1, wherein normalizing the three-channel matrix to obtain a bone feature image comprises:
the normalization is shown as follows:
Figure 829538DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 524962DEST_PATH_IMAGE008
the pixel value of the position coordinate (i, j) on the c channel of the bone feature image is obtained;
Figure 173112DEST_PATH_IMAGE009
and
Figure 979394DEST_PATH_IMAGE010
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function.
3. The method of claim 1, wherein the step of inputting the sorted bone feature images into a preset convolutional neural network model for classification to obtain action classes corresponding to the bone feature images comprises:
extracting the characteristics of the bone characteristic image by using a preset convolutional neural network model;
converting the features into feature vectors using a full-connectivity layer;
and determining the type of the bone feature image according to the feature vector, wherein the type is the action category of the object to be identified.
4. An action recognition device, comprising:
the acquisition module is used for acquiring bone data of an object to be identified in the video during motion;
the generating module is used for generating a bone sequence of the object to be identified according to the bone data;
the characteristic image module is used for generating a bone characteristic image corresponding to the bone sequence, the bone characteristic image comprises a plurality of bone points, and all the bone points in the bone characteristic image are sequenced by using a preset replacement network;
the determining module is used for inputting the sequenced bone characteristic images into a preset convolutional neural network model for classification to obtain action categories corresponding to the bone characteristic images;
wherein the feature image module comprises:
the first sequencing submodule is used for arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
the second sequencing submodule is used for arranging the three-channel data into a three-channel matrix according to a time sequence;
the normalization submodule is used for performing normalization processing on the three-channel matrix to obtain a bone characteristic image;
wherein the first ordering submodule is configured to:
will be provided with
Figure 13078DEST_PATH_IMAGE001
The x dimension value in (1) is according to a predefined sequence
Figure 879403DEST_PATH_IMAGE011
Arranging into a feature vector to obtain a feature vector ftX channel feature vector of
Figure 873904DEST_PATH_IMAGE003
Wherein, in the step (A),
Figure 359243DEST_PATH_IMAGE004
the arrangement order O determines the proximity of the bone points in the image,
Figure 388379DEST_PATH_IMAGE001
representing the t frame bone data, and N representing the number of bone points in one frame;
the second ordering submodule is configured to:
three-channel bone characteristics of all frames
Figure 815818DEST_PATH_IMAGE005
A three-channel matrix M is arranged according to the time sequence, wherein in the x channel, the matrix
Figure 32035DEST_PATH_IMAGE006
M has a size of 3 × T × K, T being the length of the video sequence, and K being the length of the arrangement order O.
5. The apparatus of claim 4, wherein the normalization submodule is configured to,
the normalization is shown as follows:
Figure 321065DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 939129DEST_PATH_IMAGE008
the pixel value of the position coordinate (i, j) on the c channel of the bone feature image is obtained;
Figure 147256DEST_PATH_IMAGE013
and
Figure 850770DEST_PATH_IMAGE010
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function.
6. The apparatus of claim 4, wherein the determining module comprises:
the extraction submodule is used for extracting the characteristics of the bone characteristic image by utilizing a preset convolutional neural network model;
a conversion submodule for converting the features into feature vectors using a full connection layer;
and the determining submodule is used for determining the type of the bone feature image according to the feature vector, wherein the type is the action category of the object to be identified.
CN201811604771.6A 2018-12-26 2018-12-26 Action recognition method and device Active CN109800659B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811604771.6A CN109800659B (en) 2018-12-26 2018-12-26 Action recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811604771.6A CN109800659B (en) 2018-12-26 2018-12-26 Action recognition method and device

Publications (2)

Publication Number Publication Date
CN109800659A CN109800659A (en) 2019-05-24
CN109800659B true CN109800659B (en) 2021-05-25

Family

ID=66557740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811604771.6A Active CN109800659B (en) 2018-12-26 2018-12-26 Action recognition method and device

Country Status (1)

Country Link
CN (1) CN109800659B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476115B (en) * 2020-03-23 2023-08-29 深圳市联合视觉创新科技有限公司 Human behavior recognition method, device and equipment
CN112861808B (en) * 2021-03-19 2024-01-23 泰康保险集团股份有限公司 Dynamic gesture recognition method, device, computer equipment and readable storage medium
CN113229832A (en) * 2021-03-24 2021-08-10 清华大学 System and method for acquiring human motion information
CN113537121A (en) * 2021-07-28 2021-10-22 浙江大华技术股份有限公司 Identity recognition method and device, storage medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318248A (en) * 2014-10-21 2015-01-28 北京智谷睿拓技术服务有限公司 Action recognition method and action recognition device
CN106203363A (en) * 2016-07-15 2016-12-07 中国科学院自动化研究所 Human skeleton motion sequence Activity recognition method
CN106203503A (en) * 2016-07-08 2016-12-07 天津大学 A kind of action identification method based on skeleton sequence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170161555A1 (en) * 2015-12-04 2017-06-08 Pilot Ai Labs, Inc. System and method for improved virtual reality user interaction utilizing deep-learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104318248A (en) * 2014-10-21 2015-01-28 北京智谷睿拓技术服务有限公司 Action recognition method and action recognition device
CN106203503A (en) * 2016-07-08 2016-12-07 天津大学 A kind of action identification method based on skeleton sequence
CN106203363A (en) * 2016-07-15 2016-12-07 中国科学院自动化研究所 Human skeleton motion sequence Activity recognition method

Also Published As

Publication number Publication date
CN109800659A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
CN109800659B (en) Action recognition method and device
CN110728200B (en) Real-time pedestrian detection method and system based on deep learning
CN109993160B (en) Image correction and text and position identification method and system
CN107256246B (en) printed fabric image retrieval method based on convolutional neural network
CN109299639B (en) Method and device for facial expression recognition
JP5997545B2 (en) Signal processing method and signal processing apparatus
JP6527421B2 (en) Person recognition apparatus and program thereof
CN109145964B (en) Method and system for realizing image color clustering
CN108876795A (en) A kind of dividing method and system of objects in images
CN109213886B (en) Image retrieval method and system based on image segmentation and fuzzy pattern recognition
CN111108508A (en) Facial emotion recognition method, intelligent device and computer-readable storage medium
CN112381895A (en) Method and device for calculating cardiac ejection fraction
Fernando et al. Low cost approach for real time sign language recognition
US20220164577A1 (en) Object detection method, object detection apparatus, and non-transitory computer-readable storage medium storing computer program
CN107729863B (en) Human finger vein recognition method
CN107368847B (en) Crop leaf disease identification method and system
CN113111797A (en) Cross-view gait recognition method combining self-encoder and view transformation model
Youlian et al. Face detection method using template feature and skin color feature in rgb color space
CN112598013A (en) Computer vision processing method based on neural network
CN112084840A (en) Finger vein identification method based on three-dimensional NMI
CN113723410A (en) Digital tube digital identification method and device
CN109359543B (en) Portrait retrieval method and device based on skeletonization
CN113362455B (en) Video conference background virtualization processing method and device
CN108256401B (en) Method and device for obtaining target attribute feature semantics
CN114387600A (en) Text feature recognition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 211135 floor 3, building 3, Qilin artificial intelligence Industrial Park, 266 Chuangyan Road, Nanjing, Jiangsu

Patentee after: Zhongke Nanjing artificial intelligence Innovation Research Institute

Patentee after: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Address before: 211135 3rd floor, building 3, 266 Chuangyan Road, Jiangning District, Nanjing City, Jiangsu Province

Patentee before: NANJING ARTIFICIAL INTELLIGENCE CHIP INNOVATION INSTITUTE, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Patentee before: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES