Disclosure of Invention
The invention aims to provide a motion recognition method and a motion recognition device, which solve the problem that the motion category recognition of the current bone sequence is inaccurate.
According to an aspect of the present invention, there is provided a motion recognition method including:
acquiring skeleton data of an object to be identified in a video during motion;
generating a bone sequence of the object to be identified according to the bone data;
generating a bone feature image corresponding to the bone sequence, wherein the bone feature image comprises a plurality of bone points;
and inputting the bone characteristic image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone characteristic image.
Further, generating a bone feature image corresponding to the bone sequence, comprising:
arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
arranging the three-channel data into a three-channel matrix according to a time sequence;
and carrying out normalization processing on the three-channel matrix to obtain a bone characteristic image.
Further, normalizing the three-channel matrix to obtain a bone feature image, including:
the normalization is shown as follows:
wherein the content of the first and second substances,
the pixel value of the position coordinate (i, j) on the c channel of the bone feature image is obtained;
and
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function.
Further, inputting the bone feature image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone feature image, including:
extracting the characteristics of the bone characteristic image by using a preset convolutional neural network model;
converting the features into feature vectors using a full-connectivity layer;
and determining the type of the bone feature image according to the feature vector, wherein the type is the action category of the object to be identified.
According to another aspect of the present invention, there is disclosed a motion recognition apparatus comprising:
the acquisition module is used for acquiring bone data of an object to be identified in the video during motion;
the generating module is used for generating a bone sequence of the object to be identified according to the bone data;
the characteristic image module is used for generating a bone characteristic image corresponding to the bone sequence, and the bone characteristic image comprises a plurality of bone points;
and the determining module is used for inputting the bone characteristic image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone characteristic image.
Further, the feature image module includes:
the first sequencing submodule is used for arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
the second sequencing submodule is used for arranging the three-channel data into a three-channel matrix according to a time sequence;
and the normalization submodule is used for performing normalization processing on the three-channel matrix to obtain a bone characteristic image.
Further, the normalization sub-module is configured to,
the normalization is shown as follows:
wherein the content of the first and second substances,
at position coordinate (i, j) on the c-th channel of the bone feature imageA pixel value;
and
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function.
Further, the determining module comprises:
the extraction submodule is used for extracting the characteristics of the bone characteristic image by utilizing a preset convolutional neural network model;
a conversion submodule for converting the features into feature vectors using a full connection layer;
and the determining submodule is used for determining the type of the bone feature image according to the feature vector, wherein the type is the action category of the object to be identified.
Compared with the closest prior art, the technical scheme has the beneficial effects that:
the technical scheme provided by the invention is that the bone data of an object to be identified in a video during motion is obtained, then the bone data is converted into a bone sequence of the object to be identified, the bone sequence is converted into a bone characteristic image, all bone points in the bone characteristic image are sequenced by using a preset replacement network, and finally the sequenced bone characteristic image is classified by using a convolutional neural network to obtain the action characteristic corresponding to the bone characteristic image. The invention converts the problem of motion recognition into the problem of bone sequence image classification, converts the bone sequence into the bone characteristic image, and then classifies the bone characteristic image, so that the recognition is more accurate and the efficiency is higher.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the present invention provides an action recognition method, which comprises the following steps:
s101, obtaining skeleton data of an object to be identified in a video during motion;
s102, generating a bone sequence of the object to be identified according to the bone data;
s103, generating a bone characteristic image corresponding to the bone sequence, wherein the bone characteristic image comprises a plurality of bone points;
and S104, inputting the bone feature image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone feature image.
In the embodiment of the application, bone data of an object to be identified in a video during motion are acquired, then the bone data are converted into a bone sequence of the object to be identified, the bone sequence is converted into a bone characteristic image, and finally the bone characteristic image is classified by using a convolutional neural network to obtain action characteristics corresponding to the bone characteristic image. The invention converts the problem of motion recognition into the problem of bone sequence image classification, converts the bone sequence into the bone characteristic image, and then classifies the bone characteristic image, so that the recognition is more accurate and the efficiency is higher.
In some embodiments of the present application, given a bone sequence v of a T frame, the coordinates of the kth bone point in the T frame are expressed as
Wherein T is ∈ [1,2, …, T],k∈[1,2,…,N]And N represents the number of skeletal points in a frame. The skeletal data of the t-th frame is denoted S
t={J
1,J
2,…,J
N}. Generating a bone feature image corresponding to the bone sequence mainly comprises three steps:
firstly, arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
step two, arranging the three-channel data into a three-channel matrix according to a time sequence;
and thirdly, carrying out normalization processing on the three-channel matrix to obtain a bone characteristic image.
In step one, the coordinates (x, y, z) in three dimensions are considered as three channels. Taking x dimension as an example, let S
tThe x dimension in (a) is taken in a predefined order O ═ O
1,o
2,…,o
k,…,o
K) Arranging into a feature vector to obtain a feature vector f
tX channel feature vector f
t xWherein
The order of arrangement O determines the proximity of skeletal points in the image. In step two, three-channel bone characteristics of all frames are determined
And a three-channel matrix M is arranged according to the time sequence. Taking the x-channel as an example,
the size of M is 3 × T × K, where T is the length of the video sequence and K is the length of the permutation order O.
In step three, M is normalized and quantized to an RGB image I as follows:
wherein the content of the first and second substances,
is the pixel value at position coordinate (I, j) on the c-th channel of image I;
and
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function. The normalization is performed by subtracting the minimum value in the channel from the element in the matrix and dividing by the maximum value variation in all channels. The values are then quantized to [0, 255 ]]The interval of (2).
In some embodiments of the present application, inputting a bone feature image into a preset convolutional neural network model for classification, and obtaining an action category corresponding to the bone feature image, includes:
extracting the characteristics of the bone characteristic image by using a preset convolutional neural network model;
converting the features into feature vectors using a full-connectivity layer;
and determining the type of the bone feature image according to the feature vector, wherein the type is the action category of the object to be identified.
The invention also provides a motion recognition device based on the same inventive concept, which comprises:
the acquisition module is used for acquiring bone data of an object to be identified in the video during motion;
the generating module is used for generating a bone sequence of the object to be identified according to the bone data;
the characteristic image module is used for generating a bone characteristic image corresponding to the bone sequence, and the bone characteristic image comprises a plurality of bone points;
and the determining module is used for inputting the bone characteristic image into a preset convolutional neural network model for classification to obtain an action category corresponding to the bone characteristic image.
Optionally, the feature image module includes:
the first sequencing submodule is used for arranging three-dimensional point coordinates of a skeleton sequence in each frame of image in a video into three-channel data according to a preset sequence;
the second sequencing submodule is used for arranging the three-channel data into a three-channel matrix according to a time sequence;
and the normalization submodule is used for performing normalization processing on the three-channel matrix to obtain a bone characteristic image.
Optionally, the normalization sub-module is configured to,
the normalization is shown as follows:
wherein the content of the first and second substances,
the pixel value of the position coordinate (i, j) on the c channel of the bone feature image is obtained;
and
respectively the minimum value and the maximum value of the pixel on the c channel of the bone feature image; round (.) is a rounding function.
Optionally, the determining module includes:
the extraction submodule is used for extracting the characteristics of the bone characteristic image by utilizing a preset convolutional neural network model;
a conversion submodule for converting the features into feature vectors using a full connection layer;
and the determining submodule is used for determining the type of the bone feature image according to the feature vector, wherein the type is the action feature of the object to be identified.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.