CN108764107B

CN108764107B - Behavior and identity combined identification method and device based on human body skeleton sequence

Info

Publication number: CN108764107B
Application number: CN201810499463.5A
Authority: CN
Inventors: 王亮; 王洪松
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2018-05-23
Filing date: 2018-05-23
Publication date: 2020-09-11
Anticipated expiration: 2038-05-23
Also published as: CN108764107A

Abstract

The invention relates to the field of visual recognition, and provides a behavior and identity joint recognition method based on a human body skeleton sequence, aiming at solving the problem that identity information and behavior actions cannot be recognized simultaneously in human body data recognition. The method comprises the following steps: acquiring a human body skeleton sequence of a human body to be identified; according to the human body skeleton sequence, utilizing a pre-constructed recognition model to recognize the identity information and behavior actions of the human body; the training method of the recognition model comprises the following steps: converting the coordinates of the human body framework sequence for training into a reference coordinate system to obtain a reference framework sequence; comparing the coordinates of each joint node of each reference framework of the reference framework sequence with the coordinates of a pre-specified central point to obtain the relative coordinates of each joint node of each reference framework; and carrying out three-dimensional coordinate transformation on the reference framework sequence, and training the initial recognition model to obtain the optimized recognition model. The invention can quickly and accurately identify the identity information and the behavior action of the human body from the human body skeleton sequence.

Description

Behavior and identity combined identification method and device based on human body skeleton sequence

Technical Field

The invention relates to the technical field of computer vision, in particular to the field of vision based on deep learning, and specifically relates to a behavior and identity joint identification method and device based on a human body skeleton sequence.

Background

With the development of computer graphic and visual technology and the development of man-machine interaction technology, it is becoming more and more important to timely and accurately display the behavior and action and identity information of a detected or monitored person. Behavior recognition and identity recognition are applied in the fields of automatic driving, man-machine interaction, smart cities, intelligent transportation, intelligent monitoring and the like.

With the development of depth cameras (e.g., Kinect) and high-precision and high-efficiency human posture estimation algorithms in recent years, behavior recognition based on human skeleton sequences is becoming more popular. The skeleton sequence directly reflects the motion of the human body and has the advantages of small input data, no background interference and the like. The deep neural network-based method can automatically learn features and identify behaviors from an original skeleton sequence; however, identification based on human skeletal sequences is ignored.

The action sequence of a person in time can reflect the behavior of the person and can also reflect the identity of the person, for example, the gait recognition research can judge the identity of the person according to the walking state of the person. However, the behavior and identity of an individual are recognized individually, and the motion of a pedestrian and the identity of the pedestrian cannot be recognized simultaneously by the same motion sequence.

Disclosure of Invention

The technical problem that identity information and behavior actions cannot be recognized simultaneously in human body skeleton data recognition is solved. For the purpose, the invention provides a behavior and identity joint identification method and device based on a human skeleton sequence, so as to solve the technical problems.

In a first aspect, the behavior and identity joint identification method based on the human skeleton sequence provided by the invention comprises the following steps: acquiring a human body skeleton sequence of a human body to be identified; predicting the probability of each preset identity category and the probability of each preset behavior category according to the human body skeleton sequence by using a pre-constructed recognition model; judging the identity type of the human body to be identified according to the predicted probability of the identity type; judging the behavior category of the human body to be recognized according to the predicted probability of the behavior category; the identification model is an identity class and behavior class probability prediction model constructed based on a deep recurrent neural network.

Further, in a preferred technical solution provided by the present invention, before the step of "predicting the probability of each preset identity category and the probability of each preset behavior category according to the human skeleton sequence based on a pre-constructed recognition model", the method further includes: performing coordinate conversion on a preset human body skeleton sequence training sample based on a preset reference coordinate system to obtain a first reference skeleton sequence; acquiring the position coordinates of a preset human body central point at each moment corresponding to the first reference skeleton sequence; subtracting the position coordinate of the joint point corresponding to each moment in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain a second reference skeleton sequence; performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence; acquiring the coordinate change characteristic of each joint point according to the third reference skeleton sequence; fusing the obtained coordinate change characteristics to obtain a characteristic sequence; and performing model training on the identification model according to the characteristic sequence based on a preset model loss function.

Further, in a preferred embodiment of the present invention, before the step of subtracting the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain the second reference skeleton sequence, the method includes: acquiring coordinates of a plurality of preset central points of the human skeleton; calculating a coordinate mean value of a plurality of the central points according to the acquired coordinates; in this case, the step of subtracting the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain the second reference skeleton sequence is to subtract the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding center point coordinate mean value to obtain the second reference skeleton sequence.

Further, in a preferred technical solution provided by the present invention, the step of performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence includes: and (3) carrying out three-dimensional coordinate transformation on each joint node by using the following transformation formula:

R＝R_z(γ)R_y(β)R_x(α)

wherein R is a three-dimensional rotation transformation matrix, R_x(α)，R_y(β)，R_zAnd (gamma) is a rotation matrix of the directions of three coordinate axes of x, y and z, and the form of the rotation matrix is as follows:

and alpha, beta and gamma are rotation angles in the directions of three coordinate axes of x, y and z.

Further, in a preferred technical solution provided by the present invention, the step of "fusing the obtained coordinate change features to obtain a feature sequence" includes: and connecting the coordinates of the joint points at each moment after the coordinate transformation into a feature vector to obtain a feature sequence.

Further, in a preferred embodiment of the present invention, the model loss function is represented by the following formula:

L＝λL⁽¹⁾+(1-λ)L⁽²⁾

wherein, λ is a preset weighting coefficient, λ is more than or equal to 0 and less than or equal to 1, and L⁽¹⁾And L⁽²⁾Respectively, the loss functions corresponding to behavior recognition and identity recognition are as follows:

wherein the content of the first and second substances,

a category label of the behavior and identity of the nth sample, wherein N is the total number of samples;

the step of performing model training on the recognition model according to the characteristic sequence based on a preset model loss function comprises the following steps: and performing model training on the recognition model by using a BPTT algorithm according to the third reference skeleton sequence.

Further, in a preferred embodiment of the present invention, the center point includes a center point of a left hip joint, a center point of a right hip joint, and a center point of a hip, or the center point includes a center point of a left shoulder joint, a center point of a right shoulder joint, and a center point of a chest.

Further, in a preferred embodiment provided by the present invention, the deep recurrent neural network is a multi-layer bidirectional recurrent neural network or a unidirectional recurrent neural network; the multilayer bidirectional recurrent neural network comprises a plurality of long-time memory networks.

Further, in a preferred technical solution provided by the present invention, the fully connected layer in the network structure of the recognition model includes a first fully connected layer and a second fully connected layer; the first full-link layer is used for predicting the probability of each preset behavior category according to the human body skeleton sequence; the second fully-connected layer is used for predicting the probability of each preset identity type according to the human body skeleton sequence.

In a second aspect, the present invention provides a storage device, where the storage device carries one or more programs, where the programs are adapted to be loaded and executed by a processor, and when the one or more programs are executed by the device, the method may implement the behavior and identity joint recognition method based on the human skeleton sequence according to the above technical solution.

In a third aspect, the present invention provides a processing apparatus comprising a processor adapted to execute programs; and a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the behavior and identity joint identification method based on the human body skeleton sequence.

Compared with the closest prior art, the technical scheme has at least the following beneficial effects:

the behavior and identity joint recognition method based on the human body skeleton sequence, provided by the invention, predicts the probability of the identity category and the probability of the behavior category through a pre-constructed recognition model for the human body skeleton sequence to be recognized, judges each behavior action of the identity information of a human body corresponding to the human body skeleton sequence according to the summary of the predicted identity category and the probability of the behavior category, and realizes the joint recognition of the identity and the behavior of the human body skeleton sequence; the use of the multi-layer bidirectional recurrent neural network improves the prediction precision of the probability of the identity class and the probability of the behavior class.

Drawings

FIG. 1 is a schematic diagram illustrating the main steps of behavior and identity joint identification based on human skeleton sequence in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a network structure for identifying model neurons according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the main structure of a bidirectional recurrent neural network of the recognition model in the embodiment of the present invention;

fig. 4 is a schematic diagram of recognizing behavior and identity information of a human body corresponding to a human body skeleton sequence by using a recognition model in the embodiment of the present invention.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to fig. 1, fig. 1 illustrates the main steps of behavior and identity joint identification based on human skeleton sequence in this embodiment. The behavior and identity combined identification method based on the human body skeleton sequence comprises the following steps:

step 1, obtaining a human body skeleton sequence of a human body to be identified.

In this embodiment, an electronic device or an application platform based on a behavior and identity joint identification method of a human skeleton sequence may be applied to obtain a human skeleton sequence to be subjected to behavior identification and identity verification. Obtaining a human skeleton sequence from a terminal device connected with the electronic device or the application platform; specifically, the terminal device can obtain skeleton data of a human skeleton of a person in the identification area through a Kinect sensor connected with the terminal device. The human body skeleton sequence is a skeleton data sequence of human body skeletons of the same person according to a time sequence.

The skeleton data may be image data of a human body detected by a Kinect sensor, and each frame of image data detected by the Kinect sensor may be data representing a trunk and each joint of the human body; the skeleton data includes the joint point coordinates of the human skeleton.

And 2, predicting the probability of each preset identity category and the probability of each preset behavior category based on a pre-constructed recognition model and according to the human body skeleton sequence.

In this embodiment, based on the human skeleton sequence obtained in step 1, the electronic device or the application platform identifies the human skeleton sequence by using a pre-established identification model, and predicts a probability of each preset identity class and a probability of each preset behavior class. The identification model may be a model identity class and behavior class probability prediction model constructed based on a deep recurrent neural network, for example, a Siamese network model, and the Siamese network model is used to complete identity verification and behavior action identification of the human skeleton sequence to be detected. The input of the identification model is a sequence of human body skeleton data, and the output is the probability of the identity class and the probability of the behavior class of the human body corresponding to the input human body skeleton sequence. The identity information and the behavior and action information of the human body are stored in the storage unit or the database of the electronic equipment or the application platform in advance. Specifically, the probability that the human skeleton sequence corresponds to each identity category in the pre-stored identity information of the human body can be predicted for the recognition model; the recognition model predicts a probability that the human skeleton sequence corresponds to a behavior type of each of the pre-stored behavior motions of the human body.

Step 3, judging the identity type of the human body to be detected according to the predicted probability of the identity type; and judging the behavior category of the human body to be recognized according to the predicted probability of the behavior category.

In this embodiment, according to the probabilities of the identity categories and the probabilities of the behavior categories predicted in the step 2, the identity category of the human body corresponding to the human body skeleton sequence and the behavior category of the human body corresponding to the human body skeleton sequence can be determined according to the magnitude of the probabilities. The identity category may be information for distinguishing human identity, and the behavior category may be information for distinguishing human behavior.

Further, in a preferred technical solution provided in this embodiment, before the step of "predicting the probability of each preset identity class and the probability of each preset behavior class according to the human skeleton sequence based on a pre-constructed recognition model", the method further includes: performing coordinate conversion on a preset human body skeleton sequence training sample based on a preset reference coordinate system to obtain a first reference skeleton sequence; acquiring the position coordinates of a preset human body central point at each moment corresponding to the first reference skeleton sequence; subtracting the position coordinate of the joint point corresponding to each moment in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain a second reference skeleton sequence; performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence; acquiring the coordinate change characteristic of each joint point according to the third reference skeleton sequence; fusing the obtained coordinate change characteristics to obtain a characteristic sequence; and performing model training on the identification model according to the characteristic sequence based on a preset model loss function.

The training method of the pre-constructed recognition model comprises the following steps: converting the coordinates of the human skeleton sequence for training into a reference coordinate system to obtain a reference skeleton sequence; comparing the coordinates of each joint node of each reference framework of the reference framework sequence with the coordinates of a pre-specified central point to obtain the relative coordinates of each joint node of each reference framework; and carrying out three-dimensional coordinate transformation on the relative coordinates of each joint node, taking the reference framework sequence subjected to three-dimensional coordinate transformation as training data, and training the initial recognition model to obtain the optimized recognition model.

The preprocessing of the sample data also comprises the absolute coordinate processing of each skeleton data in the human skeleton sequence, namely, the coordinates of all key points of one skeleton sequence at different time are subtracted by the mean value of the coordinates of the corresponding time to obtain the coordinates of each joint node.

Specifically, in the data preprocessing, if the human skeleton sequence is based on an image plane coordinate system and camera parameters are known, the coordinate system conversion can be performed by calculating a camera transformation matrix; if the camera parameters are unknown, adding a dimension with the numerical value of 1 to the two-dimensional coordinates of the plane, and carrying out scale transformation on the processed three-dimensional coordinates to enable the numerical values of the coordinates of x, y and z to be in a preset range; preferably, the values of the coordinates of x, y, z are in the range of [ -3,3 ].

The three-dimensional coordinate transformation may be performed on the second reference skeleton sequence by using a preset rotation transformation matrix to obtain a third reference skeleton sequence.

Further, in a preferred technical solution provided in this embodiment, before the step of subtracting the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding human skeleton coordinate mean value to obtain the second reference skeleton sequence, the method includes: acquiring coordinates of a plurality of preset central points of the human skeleton; calculating a coordinate mean value of a plurality of the central points according to the acquired coordinates; in this case, the step of subtracting the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain the second reference skeleton sequence is to subtract the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding center point coordinate mean value to obtain the second reference skeleton sequence.

Specifically, the center point includes a center point of a left hip, a center point of a right hip, and a center point of a hip, or the center point includes a center point of a left shoulder, a center point of a right shoulder, and a center point of a chest.

Further, in a preferred technical solution provided in this embodiment, the step of performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence includes:

in some optional implementations of this embodiment, the step of "performing three-dimensional coordinate transformation on the relative coordinates of each joint node" includes performing three-dimensional coordinate transformation on each joint node by using the following transformation formula:

R＝R_z(γ)R_y(β)R_x(α) (1)

wherein R is_x(α)，R_y(β)，R_zAnd (gamma) is a rotation matrix of the directions of three coordinate axes of x, y and z, and the form of the rotation matrix is as follows:

in the above formula, R is a three-dimensional rotation transformation matrix, and α, β, and γ are rotation angles in the directions of three coordinate axes of x, y, and z. The three-dimensional transformation described above is a rotation matrix, the rotation matrix R for the three-dimensional transformation being dependent on only three parameters α, β, γ. When the values of the parameters α, β, γ are all 0, the rotation matrix R is an identity matrix, which means that no coordinate transformation is performed. In the recognition model training, values of α, β, γ are randomly generated, and the range of the random generation depends on the task, for example, for recognition across view angles, α ∈ [ -pi/2, pi/2 ], β ∈ [ -pi/2, pi/2 ], and γ ═ 0 may be set.

Further, in a preferred embodiment, the step of "fusing the obtained coordinate change features to obtain a feature sequence" includes: and connecting the coordinate change characteristics of the different joint points to obtain a characteristic sequence.

And fusing the characteristics of the model, which are learned on the basis of the characteristic sequence after coordinate transformation, for describing the motion in a time dimension to obtain a vector for describing the motion, and using the vector as the input of two full-connection layers in the network. The method for fusing the coordinate variation features can be realized by a Max Pooling method (Max Pooling) or an average Pooling method (Mean Pooling).

Further, in a preferred embodiment, the preset model loss function is shown as the following formula:

L＝λL⁽¹⁾+(1-λ)L⁽²⁾(5)

wherein, λ is a preset weighting coefficient, λ is more than or equal to 0 and less than or equal to 1, and L⁽¹⁾And L⁽²⁾The loss functions, L, corresponding to behavior recognition and identity recognition, respectively⁽¹⁾And L⁽²⁾Can be expressed as:

wherein, in

the step of performing model training on the recognition model according to the characteristic sequence based on a preset model loss function comprises the following steps: and performing model training on the recognition model by using a BPTT algorithm according to the third reference skeleton sequence. The BPTT algorithm is a Time sequence-based Back Propagation algorithm and is an abbreviation of Back-Propagation Through Time.

Further, in a preferred technical solution of this embodiment, the deep recurrent neural network is a multi-layer bidirectional recurrent neural network or a unidirectional recurrent neural network; the multilayer bidirectional recurrent neural network comprises a plurality of long-time memory networks.

In some optional implementations of the present embodiment, the recognition model is constructed based on a deep recurrent neural network. The recognition model may employ a multi-layer bidirectional recurrent neural network, wherein the recurrent neural network may employ a Short-Term Memory network (LSTM).

Referring to fig. 2, fig. 2 illustrates a network structure of a recognition model neuron in the present embodiment. As shown in FIG. 2, in identifying a network of model neuronsIn the structure, given an input sequence { x_tThe output sequence of the long-time and short-time memory network is { h }_tThe iterative process of the long and short term memory network is as follows:

i_t＝σ(W_xix_t+W_hih_t-1+W_cic_t-1+b_i) (7)

f_t＝σ(W_xfx_t+W_hfh_t-1+W_cfc_t-1+b_f) (8)

c_t＝f_tc_t-1+i_ttanh(W_xcx_t+W_hch_t-1+b_c) (9)

o_t＝σ(W_xox_t+W_hoh_t-1+W_coc_t+b_o) (10)

h_t＝o_ttanh(c_t) (11)

wherein i_t，f_t，o_t，c_tRespectively representing the states of an Input control gate (Input gate), a forgetting gate (Forget gate), an Output control gate (Output gate) and a memory Cell (Cell) at time t, and W and b respectively representing the connection weight and the offset vector.

Referring to fig. 3, fig. 3 illustrates the main structure of a bidirectional recurrent neural network for identifying model neurons in the present embodiment. In the network structure of the bidirectional recursive network shown in fig. 3, the network structure of the bidirectional recursive network applied to the recognition model is shown in fig. 3, and for an input human skeleton sequence, the network has two hidden layers: a forward layer and a reverse layer, respectively learning the variation characteristics of the input sequence in two opposite directions in time. The output of the bidirectional recurrent neural network is the output of the forward layer and the reverse layer at the same time, which is connected to form a new time sequence.

In some optional implementation manners of this embodiment, the full connection layer in the network structure of the recognition model includes a first full connection layer and a second full connection layer, where the first full connection layer is configured to predict, according to the human skeleton sequence, a probability of each preset behavior category, so as to recognize a human action behavior, and the second full connection layer is configured to predict, according to the human skeleton sequence, a probability of each preset identity category, so as to recognize a human identity.

Here, the fully connected layer for classification includes two fully connected layers, and features learned by the deep recurrent neural network need to be fused in the time dimension to obtain a representation of the sequence. The fusion method employs either maximal Pooling (Max Pooling) or Mean Pooling (Mean Pooling). Remember { o_tT ∈ {1, 2.., T }, T representing the sequence length, and max { o } the maximum pooled output_tThe average pooled output is ∑ o_t/T。

The node number of the hidden layer of the first full-connection layer is the number of behaviors to be identified, and the behavior category to which the input sequence belongs is judged through the maximum value of the following generic probability of the activation function:

wherein, a_iFor the output of the fully connected layer, the number of categories of behaviors is m, p_iThe predicted probability for the ith behavior class.

The node number of the hidden layer of the second full-connection layer is the number of the identities to be identified, and the identity category to which the input sequence belongs is judged through the maximum value of the following generic probabilities of the activation function:

wherein, b_jThe output of the full connection layer has n and q types of identities_jIs the ith racePredicted probability of share category.

It can be understood that the behaviors to be recognized may be preset, and the number of behavior categories may be determined by an actual task; wherein each action corresponds to a behavior class. The information of the identities can be preset, and the number of the identity categories can be determined by the number of the human bodies to be identified in the actual task; wherein each person corresponds to an identity class.

By way of example, referring to fig. 4, fig. 4 is a schematic diagram illustrating a behavior and an identity information of a human body corresponding to a human skeleton sequence recognized by using a recognition model in the embodiment. As shown in fig. 4, after the human skeleton sequence is input into the recognition model, the behavior and identity of the human body are recognized. The recognition model jointly recognizes the behavior and the identity information of the human body through data preprocessing, three-dimensional coordinate transformation, a deep recurrent neural network and classification prediction. Here, 60 behavior categories and 40 identity categories are preset; according to the human skeleton sequence, 60 persons with different behaviors and actions and 40 persons with different identities can be identified by using the identification model.

The present invention also provides a storage device carrying one or more programs adapted to be loaded and executed by a processor, which when executed by the device is operable to carry out any of the methods of the embodiments described above.

The invention also provides a processing device comprising a processor adapted to execute various programs; and a storage device adapted to store a plurality of programs; wherein the program is adapted to be loaded and executed by a processor to implement any of the methods in the above embodiments.

The method provided by the embodiment of the invention identifies the human skeleton sequence through the pre-established identification model, and identifies the behavior and the identity information of the human body. In the invention, the full-link layer of the identification model comprises a full-link layer for identity identification and a full-link layer for behavior identification, the recurrent neural network of the identification model fuses the learned characteristics in a time dimension, the identification model can simultaneously predict the probability of the behavior class of the human skeleton sequence and the probability of the identity class of a human, and the identity class and the behavior class of the human are judged according to the pre-stored probabilities. Therefore, the method provided by the invention can be used for quickly and accurately identifying the identity information and the behavior action of the human body corresponding to the human body skeleton sequence.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principle of the invention, one skilled in the art can make equivalent changes or substitutions on the related technical features, and the technical solutions after the changes or substitutions will fall within the protection scope of the invention.

Claims

1. A behavior and identity joint identification method based on a human skeleton sequence is characterized by comprising the following steps:

performing coordinate conversion on a preset human body skeleton sequence training sample based on a preset reference coordinate system to obtain a first reference skeleton sequence;

acquiring position coordinates of a preset human body central point at each moment corresponding to the first reference skeleton sequence;

acquiring coordinates of a plurality of preset central points of the human skeleton;

calculating a coordinate mean value of a plurality of central points according to the acquired coordinates;

subtracting the position coordinate of the joint point corresponding to each moment in the first reference skeleton sequence from the coordinate mean value of the corresponding central point to obtain a second reference skeleton sequence;

performing three-dimensional coordinate transformation on the second reference framework sequence according to a preset rotation angle to obtain a third reference framework sequence;

acquiring the coordinate change characteristic of each joint point according to the third reference skeleton sequence;

fusing the obtained coordinate change characteristics to obtain a characteristic sequence;

performing model training on the recognition model according to the characteristic sequence based on a preset model loss function;

acquiring a human body skeleton sequence of a human body to be identified;

predicting the probability of each preset identity category and the probability of each preset behavior category according to the human body skeleton sequence, the recognition model and the human body skeleton sequence;

judging the identity type of the human body to be recognized according to the predicted probability of the identity type; judging the behavior category of the human body to be recognized according to the predicted probability of the behavior category;

the identification model is an identity class and behavior class probability prediction model constructed based on a deep recurrent neural network.

2. The behavior and identity joint identification method based on the human body skeleton sequence according to claim 1, wherein the step of performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence comprises:

and (3) carrying out three-dimensional coordinate transformation on each joint node by using the following transformation formula:

R＝R_z(γ)R_y(β)R_x(α)

3. The behavior and identity joint recognition method based on the human body skeleton sequence according to claim 1, wherein the step of fusing the obtained coordinate change features to obtain the feature sequence comprises: and connecting the coordinates of the joint points at each moment after the coordinate transformation into a feature vector to obtain a feature sequence.

4. The human skeleton sequence-based behavior and identity joint recognition method according to any one of claims 1-3, wherein the model loss function is represented by the following formula:

L＝λL⁽¹⁾+(1-λ)L⁽²⁾

wherein, λ is a preset weighting coefficient, λ is more than or equal to 0 and less than or equal to 1, and L⁽¹⁾And L⁽²⁾The loss functions corresponding to behavior recognition and identity recognition are respectively:

wherein, in

as a category of behavior

The corresponding probability of the prediction is used,

as identity classes

A corresponding prediction probability;

the step of performing model training on the recognition model according to the feature sequence based on a preset model loss function comprises the following steps: and performing model training on the recognition model by utilizing a time sequence-based back propagation algorithm according to the third reference skeleton sequence.

5. The human skeletal sequence-based behavior and identity joint recognition method according to any one of claims 1 to 3, wherein the central point comprises a central point of a left hip, a central point of a right hip and a central point of a hip, or the central points comprise a central point of a left shoulder, a central point of a right shoulder and a central point of a chest.

6. The human skeleton sequence-based behavior and identity joint recognition method according to any one of claims 1-3, wherein the deep recurrent neural network is a multi-layer bidirectional recurrent neural network or a unidirectional recurrent neural network; the multilayer bidirectional recurrent neural network comprises a plurality of long-time and short-time memory networks.

7. The human skeleton sequence-based behavior and identity joint recognition method according to any one of claims 1 to 3, wherein fully connected layers in a network structure of the recognition model comprise a first fully connected layer and a second fully connected layer;

the first full-connection layer is used for predicting the probability of each preset behavior category according to the human body skeleton sequence;

the second full-link layer is used for predicting the probability of each preset identity category according to the human body skeleton sequence.

8. A storage device having a plurality of programs stored therein, wherein the programs are adapted to be loaded and executed by a processor to implement the method for joint human skeletal sequence based behavior and identity recognition according to any of claims 1 to 7.

9. A processing apparatus, comprising:

a processor adapted to execute various programs; and

a storage device adapted to store a plurality of programs;

wherein the program is adapted to be loaded and executed by a processor to perform:

a method of joint behavioral and identity recognition based on human body framework sequences as claimed in any one of claims 1 to 7.