CN108764107B - Behavior and identity combined identification method and device based on human body skeleton sequence - Google Patents
Behavior and identity combined identification method and device based on human body skeleton sequence Download PDFInfo
- Publication number
- CN108764107B CN108764107B CN201810499463.5A CN201810499463A CN108764107B CN 108764107 B CN108764107 B CN 108764107B CN 201810499463 A CN201810499463 A CN 201810499463A CN 108764107 B CN108764107 B CN 108764107B
- Authority
- CN
- China
- Prior art keywords
- sequence
- human body
- behavior
- identity
- skeleton sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000009466 transformation Effects 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 20
- 230000006399 behavior Effects 0.000 claims description 84
- 238000013528 artificial neural network Methods 0.000 claims description 30
- 230000000306 recurrent effect Effects 0.000 claims description 29
- 230000002457 bidirectional effect Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 14
- 230000008859 change Effects 0.000 claims description 10
- 230000015654 memory Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000003542 behavioural effect Effects 0.000 claims 1
- 230000009471 action Effects 0.000 abstract description 13
- 230000000007 visual effect Effects 0.000 abstract description 2
- 238000011176 pooling Methods 0.000 description 8
- 230000033001 locomotion Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 210000001624 hip Anatomy 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 210000004394 hip joint Anatomy 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 210000000323 shoulder joint Anatomy 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the field of visual recognition, and provides a behavior and identity joint recognition method based on a human body skeleton sequence, aiming at solving the problem that identity information and behavior actions cannot be recognized simultaneously in human body data recognition. The method comprises the following steps: acquiring a human body skeleton sequence of a human body to be identified; according to the human body skeleton sequence, utilizing a pre-constructed recognition model to recognize the identity information and behavior actions of the human body; the training method of the recognition model comprises the following steps: converting the coordinates of the human body framework sequence for training into a reference coordinate system to obtain a reference framework sequence; comparing the coordinates of each joint node of each reference framework of the reference framework sequence with the coordinates of a pre-specified central point to obtain the relative coordinates of each joint node of each reference framework; and carrying out three-dimensional coordinate transformation on the reference framework sequence, and training the initial recognition model to obtain the optimized recognition model. The invention can quickly and accurately identify the identity information and the behavior action of the human body from the human body skeleton sequence.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to the field of vision based on deep learning, and specifically relates to a behavior and identity joint identification method and device based on a human body skeleton sequence.
Background
With the development of computer graphic and visual technology and the development of man-machine interaction technology, it is becoming more and more important to timely and accurately display the behavior and action and identity information of a detected or monitored person. Behavior recognition and identity recognition are applied in the fields of automatic driving, man-machine interaction, smart cities, intelligent transportation, intelligent monitoring and the like.
With the development of depth cameras (e.g., Kinect) and high-precision and high-efficiency human posture estimation algorithms in recent years, behavior recognition based on human skeleton sequences is becoming more popular. The skeleton sequence directly reflects the motion of the human body and has the advantages of small input data, no background interference and the like. The deep neural network-based method can automatically learn features and identify behaviors from an original skeleton sequence; however, identification based on human skeletal sequences is ignored.
The action sequence of a person in time can reflect the behavior of the person and can also reflect the identity of the person, for example, the gait recognition research can judge the identity of the person according to the walking state of the person. However, the behavior and identity of an individual are recognized individually, and the motion of a pedestrian and the identity of the pedestrian cannot be recognized simultaneously by the same motion sequence.
Disclosure of Invention
The technical problem that identity information and behavior actions cannot be recognized simultaneously in human body skeleton data recognition is solved. For the purpose, the invention provides a behavior and identity joint identification method and device based on a human skeleton sequence, so as to solve the technical problems.
In a first aspect, the behavior and identity joint identification method based on the human skeleton sequence provided by the invention comprises the following steps: acquiring a human body skeleton sequence of a human body to be identified; predicting the probability of each preset identity category and the probability of each preset behavior category according to the human body skeleton sequence by using a pre-constructed recognition model; judging the identity type of the human body to be identified according to the predicted probability of the identity type; judging the behavior category of the human body to be recognized according to the predicted probability of the behavior category; the identification model is an identity class and behavior class probability prediction model constructed based on a deep recurrent neural network.
Further, in a preferred technical solution provided by the present invention, before the step of "predicting the probability of each preset identity category and the probability of each preset behavior category according to the human skeleton sequence based on a pre-constructed recognition model", the method further includes: performing coordinate conversion on a preset human body skeleton sequence training sample based on a preset reference coordinate system to obtain a first reference skeleton sequence; acquiring the position coordinates of a preset human body central point at each moment corresponding to the first reference skeleton sequence; subtracting the position coordinate of the joint point corresponding to each moment in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain a second reference skeleton sequence; performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence; acquiring the coordinate change characteristic of each joint point according to the third reference skeleton sequence; fusing the obtained coordinate change characteristics to obtain a characteristic sequence; and performing model training on the identification model according to the characteristic sequence based on a preset model loss function.
Further, in a preferred embodiment of the present invention, before the step of subtracting the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain the second reference skeleton sequence, the method includes: acquiring coordinates of a plurality of preset central points of the human skeleton; calculating a coordinate mean value of a plurality of the central points according to the acquired coordinates; in this case, the step of subtracting the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain the second reference skeleton sequence is to subtract the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding center point coordinate mean value to obtain the second reference skeleton sequence.
Further, in a preferred technical solution provided by the present invention, the step of performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence includes: and (3) carrying out three-dimensional coordinate transformation on each joint node by using the following transformation formula:
R=Rz(γ)Ry(β)Rx(α)
wherein R is a three-dimensional rotation transformation matrix, Rx(α),Ry(β),RzAnd (gamma) is a rotation matrix of the directions of three coordinate axes of x, y and z, and the form of the rotation matrix is as follows:
and alpha, beta and gamma are rotation angles in the directions of three coordinate axes of x, y and z.
Further, in a preferred technical solution provided by the present invention, the step of "fusing the obtained coordinate change features to obtain a feature sequence" includes: and connecting the coordinates of the joint points at each moment after the coordinate transformation into a feature vector to obtain a feature sequence.
Further, in a preferred embodiment of the present invention, the model loss function is represented by the following formula:
L=λL(1)+(1-λ)L(2)
wherein, λ is a preset weighting coefficient, λ is more than or equal to 0 and less than or equal to 1, and L(1)And L(2)Respectively, the loss functions corresponding to behavior recognition and identity recognition are as follows:
wherein the content of the first and second substances,a category label of the behavior and identity of the nth sample, wherein N is the total number of samples;
the step of performing model training on the recognition model according to the characteristic sequence based on a preset model loss function comprises the following steps: and performing model training on the recognition model by using a BPTT algorithm according to the third reference skeleton sequence.
Further, in a preferred embodiment of the present invention, the center point includes a center point of a left hip joint, a center point of a right hip joint, and a center point of a hip, or the center point includes a center point of a left shoulder joint, a center point of a right shoulder joint, and a center point of a chest.
Further, in a preferred embodiment provided by the present invention, the deep recurrent neural network is a multi-layer bidirectional recurrent neural network or a unidirectional recurrent neural network; the multilayer bidirectional recurrent neural network comprises a plurality of long-time memory networks.
Further, in a preferred technical solution provided by the present invention, the fully connected layer in the network structure of the recognition model includes a first fully connected layer and a second fully connected layer; the first full-link layer is used for predicting the probability of each preset behavior category according to the human body skeleton sequence; the second fully-connected layer is used for predicting the probability of each preset identity type according to the human body skeleton sequence.
In a second aspect, the present invention provides a storage device, where the storage device carries one or more programs, where the programs are adapted to be loaded and executed by a processor, and when the one or more programs are executed by the device, the method may implement the behavior and identity joint recognition method based on the human skeleton sequence according to the above technical solution.
In a third aspect, the present invention provides a processing apparatus comprising a processor adapted to execute programs; and a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to realize the behavior and identity joint identification method based on the human body skeleton sequence.
Compared with the closest prior art, the technical scheme has at least the following beneficial effects:
the behavior and identity joint recognition method based on the human body skeleton sequence, provided by the invention, predicts the probability of the identity category and the probability of the behavior category through a pre-constructed recognition model for the human body skeleton sequence to be recognized, judges each behavior action of the identity information of a human body corresponding to the human body skeleton sequence according to the summary of the predicted identity category and the probability of the behavior category, and realizes the joint recognition of the identity and the behavior of the human body skeleton sequence; the use of the multi-layer bidirectional recurrent neural network improves the prediction precision of the probability of the identity class and the probability of the behavior class.
Drawings
FIG. 1 is a schematic diagram illustrating the main steps of behavior and identity joint identification based on human skeleton sequence in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network structure for identifying model neurons according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the main structure of a bidirectional recurrent neural network of the recognition model in the embodiment of the present invention;
fig. 4 is a schematic diagram of recognizing behavior and identity information of a human body corresponding to a human body skeleton sequence by using a recognition model in the embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that the embodiments and features of the embodiments of the present invention may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, fig. 1 illustrates the main steps of behavior and identity joint identification based on human skeleton sequence in this embodiment. The behavior and identity combined identification method based on the human body skeleton sequence comprises the following steps:
step 1, obtaining a human body skeleton sequence of a human body to be identified.
In this embodiment, an electronic device or an application platform based on a behavior and identity joint identification method of a human skeleton sequence may be applied to obtain a human skeleton sequence to be subjected to behavior identification and identity verification. Obtaining a human skeleton sequence from a terminal device connected with the electronic device or the application platform; specifically, the terminal device can obtain skeleton data of a human skeleton of a person in the identification area through a Kinect sensor connected with the terminal device. The human body skeleton sequence is a skeleton data sequence of human body skeletons of the same person according to a time sequence.
The skeleton data may be image data of a human body detected by a Kinect sensor, and each frame of image data detected by the Kinect sensor may be data representing a trunk and each joint of the human body; the skeleton data includes the joint point coordinates of the human skeleton.
And 2, predicting the probability of each preset identity category and the probability of each preset behavior category based on a pre-constructed recognition model and according to the human body skeleton sequence.
In this embodiment, based on the human skeleton sequence obtained in step 1, the electronic device or the application platform identifies the human skeleton sequence by using a pre-established identification model, and predicts a probability of each preset identity class and a probability of each preset behavior class. The identification model may be a model identity class and behavior class probability prediction model constructed based on a deep recurrent neural network, for example, a Siamese network model, and the Siamese network model is used to complete identity verification and behavior action identification of the human skeleton sequence to be detected. The input of the identification model is a sequence of human body skeleton data, and the output is the probability of the identity class and the probability of the behavior class of the human body corresponding to the input human body skeleton sequence. The identity information and the behavior and action information of the human body are stored in the storage unit or the database of the electronic equipment or the application platform in advance. Specifically, the probability that the human skeleton sequence corresponds to each identity category in the pre-stored identity information of the human body can be predicted for the recognition model; the recognition model predicts a probability that the human skeleton sequence corresponds to a behavior type of each of the pre-stored behavior motions of the human body.
Step 3, judging the identity type of the human body to be detected according to the predicted probability of the identity type; and judging the behavior category of the human body to be recognized according to the predicted probability of the behavior category.
In this embodiment, according to the probabilities of the identity categories and the probabilities of the behavior categories predicted in the step 2, the identity category of the human body corresponding to the human body skeleton sequence and the behavior category of the human body corresponding to the human body skeleton sequence can be determined according to the magnitude of the probabilities. The identity category may be information for distinguishing human identity, and the behavior category may be information for distinguishing human behavior.
Further, in a preferred technical solution provided in this embodiment, before the step of "predicting the probability of each preset identity class and the probability of each preset behavior class according to the human skeleton sequence based on a pre-constructed recognition model", the method further includes: performing coordinate conversion on a preset human body skeleton sequence training sample based on a preset reference coordinate system to obtain a first reference skeleton sequence; acquiring the position coordinates of a preset human body central point at each moment corresponding to the first reference skeleton sequence; subtracting the position coordinate of the joint point corresponding to each moment in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain a second reference skeleton sequence; performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence; acquiring the coordinate change characteristic of each joint point according to the third reference skeleton sequence; fusing the obtained coordinate change characteristics to obtain a characteristic sequence; and performing model training on the identification model according to the characteristic sequence based on a preset model loss function.
The training method of the pre-constructed recognition model comprises the following steps: converting the coordinates of the human skeleton sequence for training into a reference coordinate system to obtain a reference skeleton sequence; comparing the coordinates of each joint node of each reference framework of the reference framework sequence with the coordinates of a pre-specified central point to obtain the relative coordinates of each joint node of each reference framework; and carrying out three-dimensional coordinate transformation on the relative coordinates of each joint node, taking the reference framework sequence subjected to three-dimensional coordinate transformation as training data, and training the initial recognition model to obtain the optimized recognition model.
The preprocessing of the sample data also comprises the absolute coordinate processing of each skeleton data in the human skeleton sequence, namely, the coordinates of all key points of one skeleton sequence at different time are subtracted by the mean value of the coordinates of the corresponding time to obtain the coordinates of each joint node.
Specifically, in the data preprocessing, if the human skeleton sequence is based on an image plane coordinate system and camera parameters are known, the coordinate system conversion can be performed by calculating a camera transformation matrix; if the camera parameters are unknown, adding a dimension with the numerical value of 1 to the two-dimensional coordinates of the plane, and carrying out scale transformation on the processed three-dimensional coordinates to enable the numerical values of the coordinates of x, y and z to be in a preset range; preferably, the values of the coordinates of x, y, z are in the range of [ -3,3 ].
The three-dimensional coordinate transformation may be performed on the second reference skeleton sequence by using a preset rotation transformation matrix to obtain a third reference skeleton sequence.
Further, in a preferred technical solution provided in this embodiment, before the step of subtracting the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding human skeleton coordinate mean value to obtain the second reference skeleton sequence, the method includes: acquiring coordinates of a plurality of preset central points of the human skeleton; calculating a coordinate mean value of a plurality of the central points according to the acquired coordinates; in this case, the step of subtracting the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding human body skeleton coordinate mean value to obtain the second reference skeleton sequence is to subtract the position coordinate of the joint point corresponding to each time in the first reference skeleton sequence from the corresponding center point coordinate mean value to obtain the second reference skeleton sequence.
Specifically, the center point includes a center point of a left hip, a center point of a right hip, and a center point of a hip, or the center point includes a center point of a left shoulder, a center point of a right shoulder, and a center point of a chest.
Further, in a preferred technical solution provided in this embodiment, the step of performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence includes:
in some optional implementations of this embodiment, the step of "performing three-dimensional coordinate transformation on the relative coordinates of each joint node" includes performing three-dimensional coordinate transformation on each joint node by using the following transformation formula:
R=Rz(γ)Ry(β)Rx(α) (1)
wherein R isx(α),Ry(β),RzAnd (gamma) is a rotation matrix of the directions of three coordinate axes of x, y and z, and the form of the rotation matrix is as follows:
in the above formula, R is a three-dimensional rotation transformation matrix, and α, β, and γ are rotation angles in the directions of three coordinate axes of x, y, and z. The three-dimensional transformation described above is a rotation matrix, the rotation matrix R for the three-dimensional transformation being dependent on only three parameters α, β, γ. When the values of the parameters α, β, γ are all 0, the rotation matrix R is an identity matrix, which means that no coordinate transformation is performed. In the recognition model training, values of α, β, γ are randomly generated, and the range of the random generation depends on the task, for example, for recognition across view angles, α ∈ [ -pi/2, pi/2 ], β ∈ [ -pi/2, pi/2 ], and γ ═ 0 may be set.
Further, in a preferred embodiment, the step of "fusing the obtained coordinate change features to obtain a feature sequence" includes: and connecting the coordinate change characteristics of the different joint points to obtain a characteristic sequence.
And fusing the characteristics of the model, which are learned on the basis of the characteristic sequence after coordinate transformation, for describing the motion in a time dimension to obtain a vector for describing the motion, and using the vector as the input of two full-connection layers in the network. The method for fusing the coordinate variation features can be realized by a Max Pooling method (Max Pooling) or an average Pooling method (Mean Pooling).
Further, in a preferred embodiment, the preset model loss function is shown as the following formula:
L=λL(1)+(1-λ)L(2)(5)
wherein, λ is a preset weighting coefficient, λ is more than or equal to 0 and less than or equal to 1, and L(1)And L(2)The loss functions, L, corresponding to behavior recognition and identity recognition, respectively(1)And L(2)Can be expressed as:
wherein, inA category label of the behavior and identity of the nth sample, wherein N is the total number of samples;
the step of performing model training on the recognition model according to the characteristic sequence based on a preset model loss function comprises the following steps: and performing model training on the recognition model by using a BPTT algorithm according to the third reference skeleton sequence. The BPTT algorithm is a Time sequence-based Back Propagation algorithm and is an abbreviation of Back-Propagation Through Time.
Further, in a preferred technical solution of this embodiment, the deep recurrent neural network is a multi-layer bidirectional recurrent neural network or a unidirectional recurrent neural network; the multilayer bidirectional recurrent neural network comprises a plurality of long-time memory networks.
In some optional implementations of the present embodiment, the recognition model is constructed based on a deep recurrent neural network. The recognition model may employ a multi-layer bidirectional recurrent neural network, wherein the recurrent neural network may employ a Short-Term Memory network (LSTM).
Referring to fig. 2, fig. 2 illustrates a network structure of a recognition model neuron in the present embodiment. As shown in FIG. 2, in identifying a network of model neuronsIn the structure, given an input sequence { xtThe output sequence of the long-time and short-time memory network is { h }tThe iterative process of the long and short term memory network is as follows:
it=σ(Wxixt+Whiht-1+Wcict-1+bi) (7)
ft=σ(Wxfxt+Whfht-1+Wcfct-1+bf) (8)
ct=ftct-1+ittanh(Wxcxt+Whcht-1+bc) (9)
ot=σ(Wxoxt+Whoht-1+Wcoct+bo) (10)
ht=ottanh(ct) (11)
wherein it,ft,ot,ctRespectively representing the states of an Input control gate (Input gate), a forgetting gate (Forget gate), an Output control gate (Output gate) and a memory Cell (Cell) at time t, and W and b respectively representing the connection weight and the offset vector.
Further, in a preferred technical solution of this embodiment, the deep recurrent neural network is a multi-layer bidirectional recurrent neural network or a unidirectional recurrent neural network; the multilayer bidirectional recurrent neural network comprises a plurality of long-time memory networks.
Referring to fig. 3, fig. 3 illustrates the main structure of a bidirectional recurrent neural network for identifying model neurons in the present embodiment. In the network structure of the bidirectional recursive network shown in fig. 3, the network structure of the bidirectional recursive network applied to the recognition model is shown in fig. 3, and for an input human skeleton sequence, the network has two hidden layers: a forward layer and a reverse layer, respectively learning the variation characteristics of the input sequence in two opposite directions in time. The output of the bidirectional recurrent neural network is the output of the forward layer and the reverse layer at the same time, which is connected to form a new time sequence.
In some optional implementation manners of this embodiment, the full connection layer in the network structure of the recognition model includes a first full connection layer and a second full connection layer, where the first full connection layer is configured to predict, according to the human skeleton sequence, a probability of each preset behavior category, so as to recognize a human action behavior, and the second full connection layer is configured to predict, according to the human skeleton sequence, a probability of each preset identity category, so as to recognize a human identity.
Here, the fully connected layer for classification includes two fully connected layers, and features learned by the deep recurrent neural network need to be fused in the time dimension to obtain a representation of the sequence. The fusion method employs either maximal Pooling (Max Pooling) or Mean Pooling (Mean Pooling). Remember { otT ∈ {1, 2.., T }, T representing the sequence length, and max { o } the maximum pooled outputtThe average pooled output is ∑ ot/T。
The node number of the hidden layer of the first full-connection layer is the number of behaviors to be identified, and the behavior category to which the input sequence belongs is judged through the maximum value of the following generic probability of the activation function:
wherein, aiFor the output of the fully connected layer, the number of categories of behaviors is m, piThe predicted probability for the ith behavior class.
The node number of the hidden layer of the second full-connection layer is the number of the identities to be identified, and the identity category to which the input sequence belongs is judged through the maximum value of the following generic probabilities of the activation function:
wherein, bjThe output of the full connection layer has n and q types of identitiesjIs the ith racePredicted probability of share category.
It can be understood that the behaviors to be recognized may be preset, and the number of behavior categories may be determined by an actual task; wherein each action corresponds to a behavior class. The information of the identities can be preset, and the number of the identity categories can be determined by the number of the human bodies to be identified in the actual task; wherein each person corresponds to an identity class.
By way of example, referring to fig. 4, fig. 4 is a schematic diagram illustrating a behavior and an identity information of a human body corresponding to a human skeleton sequence recognized by using a recognition model in the embodiment. As shown in fig. 4, after the human skeleton sequence is input into the recognition model, the behavior and identity of the human body are recognized. The recognition model jointly recognizes the behavior and the identity information of the human body through data preprocessing, three-dimensional coordinate transformation, a deep recurrent neural network and classification prediction. Here, 60 behavior categories and 40 identity categories are preset; according to the human skeleton sequence, 60 persons with different behaviors and actions and 40 persons with different identities can be identified by using the identification model.
The present invention also provides a storage device carrying one or more programs adapted to be loaded and executed by a processor, which when executed by the device is operable to carry out any of the methods of the embodiments described above.
The invention also provides a processing device comprising a processor adapted to execute various programs; and a storage device adapted to store a plurality of programs; wherein the program is adapted to be loaded and executed by a processor to implement any of the methods in the above embodiments.
The method provided by the embodiment of the invention identifies the human skeleton sequence through the pre-established identification model, and identifies the behavior and the identity information of the human body. In the invention, the full-link layer of the identification model comprises a full-link layer for identity identification and a full-link layer for behavior identification, the recurrent neural network of the identification model fuses the learned characteristics in a time dimension, the identification model can simultaneously predict the probability of the behavior class of the human skeleton sequence and the probability of the identity class of a human, and the identity class and the behavior class of the human are judged according to the pre-stored probabilities. Therefore, the method provided by the invention can be used for quickly and accurately identifying the identity information and the behavior action of the human body corresponding to the human body skeleton sequence.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principle of the invention, one skilled in the art can make equivalent changes or substitutions on the related technical features, and the technical solutions after the changes or substitutions will fall within the protection scope of the invention.
Claims (9)
1. A behavior and identity joint identification method based on a human skeleton sequence is characterized by comprising the following steps:
performing coordinate conversion on a preset human body skeleton sequence training sample based on a preset reference coordinate system to obtain a first reference skeleton sequence;
acquiring position coordinates of a preset human body central point at each moment corresponding to the first reference skeleton sequence;
acquiring coordinates of a plurality of preset central points of the human skeleton;
calculating a coordinate mean value of a plurality of central points according to the acquired coordinates;
subtracting the position coordinate of the joint point corresponding to each moment in the first reference skeleton sequence from the coordinate mean value of the corresponding central point to obtain a second reference skeleton sequence;
performing three-dimensional coordinate transformation on the second reference framework sequence according to a preset rotation angle to obtain a third reference framework sequence;
acquiring the coordinate change characteristic of each joint point according to the third reference skeleton sequence;
fusing the obtained coordinate change characteristics to obtain a characteristic sequence;
performing model training on the recognition model according to the characteristic sequence based on a preset model loss function;
acquiring a human body skeleton sequence of a human body to be identified;
predicting the probability of each preset identity category and the probability of each preset behavior category according to the human body skeleton sequence, the recognition model and the human body skeleton sequence;
judging the identity type of the human body to be recognized according to the predicted probability of the identity type; judging the behavior category of the human body to be recognized according to the predicted probability of the behavior category;
the identification model is an identity class and behavior class probability prediction model constructed based on a deep recurrent neural network.
2. The behavior and identity joint identification method based on the human body skeleton sequence according to claim 1, wherein the step of performing three-dimensional coordinate transformation on the second reference skeleton sequence according to a preset rotation angle to obtain a third reference skeleton sequence comprises:
and (3) carrying out three-dimensional coordinate transformation on each joint node by using the following transformation formula:
R=Rz(γ)Ry(β)Rx(α)
wherein R is a three-dimensional rotation transformation matrix, Rx(α),Ry(β),RzAnd (gamma) is a rotation matrix of the directions of three coordinate axes of x, y and z, and the form of the rotation matrix is as follows:
and alpha, beta and gamma are rotation angles in the directions of three coordinate axes of x, y and z.
3. The behavior and identity joint recognition method based on the human body skeleton sequence according to claim 1, wherein the step of fusing the obtained coordinate change features to obtain the feature sequence comprises: and connecting the coordinates of the joint points at each moment after the coordinate transformation into a feature vector to obtain a feature sequence.
4. The human skeleton sequence-based behavior and identity joint recognition method according to any one of claims 1-3, wherein the model loss function is represented by the following formula:
L=λL(1)+(1-λ)L(2)
wherein, λ is a preset weighting coefficient, λ is more than or equal to 0 and less than or equal to 1, and L(1)And L(2)The loss functions corresponding to behavior recognition and identity recognition are respectively:
wherein, inA category label of the behavior and identity of the nth sample, wherein N is the total number of samples;as a category of behaviorThe corresponding probability of the prediction is used,as identity classesA corresponding prediction probability;
the step of performing model training on the recognition model according to the feature sequence based on a preset model loss function comprises the following steps: and performing model training on the recognition model by utilizing a time sequence-based back propagation algorithm according to the third reference skeleton sequence.
5. The human skeletal sequence-based behavior and identity joint recognition method according to any one of claims 1 to 3, wherein the central point comprises a central point of a left hip, a central point of a right hip and a central point of a hip, or the central points comprise a central point of a left shoulder, a central point of a right shoulder and a central point of a chest.
6. The human skeleton sequence-based behavior and identity joint recognition method according to any one of claims 1-3, wherein the deep recurrent neural network is a multi-layer bidirectional recurrent neural network or a unidirectional recurrent neural network; the multilayer bidirectional recurrent neural network comprises a plurality of long-time and short-time memory networks.
7. The human skeleton sequence-based behavior and identity joint recognition method according to any one of claims 1 to 3, wherein fully connected layers in a network structure of the recognition model comprise a first fully connected layer and a second fully connected layer;
the first full-connection layer is used for predicting the probability of each preset behavior category according to the human body skeleton sequence;
the second full-link layer is used for predicting the probability of each preset identity category according to the human body skeleton sequence.
8. A storage device having a plurality of programs stored therein, wherein the programs are adapted to be loaded and executed by a processor to implement the method for joint human skeletal sequence based behavior and identity recognition according to any of claims 1 to 7.
9. A processing apparatus, comprising:
a processor adapted to execute various programs; and
a storage device adapted to store a plurality of programs;
wherein the program is adapted to be loaded and executed by a processor to perform:
a method of joint behavioral and identity recognition based on human body framework sequences as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810499463.5A CN108764107B (en) | 2018-05-23 | 2018-05-23 | Behavior and identity combined identification method and device based on human body skeleton sequence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810499463.5A CN108764107B (en) | 2018-05-23 | 2018-05-23 | Behavior and identity combined identification method and device based on human body skeleton sequence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108764107A CN108764107A (en) | 2018-11-06 |
CN108764107B true CN108764107B (en) | 2020-09-11 |
Family
ID=64005031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810499463.5A Active CN108764107B (en) | 2018-05-23 | 2018-05-23 | Behavior and identity combined identification method and device based on human body skeleton sequence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108764107B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111382306B (en) * | 2018-12-28 | 2023-12-01 | 杭州海康威视数字技术股份有限公司 | Method and device for inquiring video frame |
CN109902729B (en) * | 2019-02-18 | 2020-10-16 | 清华大学 | Behavior prediction method and device based on sequence state evolution |
CN110197116B (en) * | 2019-04-15 | 2023-05-23 | 深圳大学 | Human behavior recognition method, device and computer readable storage medium |
CN110070029B (en) * | 2019-04-17 | 2021-07-16 | 北京易达图灵科技有限公司 | Gait recognition method and device |
CN110363131B (en) * | 2019-07-08 | 2021-10-15 | 上海交通大学 | Abnormal behavior detection method, system and medium based on human skeleton |
CN110717381A (en) * | 2019-08-28 | 2020-01-21 | 北京航空航天大学 | Human intention understanding method facing human-computer cooperation and based on deep stacking Bi-LSTM |
CN111079535B (en) * | 2019-11-18 | 2022-09-16 | 华中科技大学 | Human skeleton action recognition method and device and terminal |
CN111274937B (en) * | 2020-01-19 | 2023-04-28 | 中移(杭州)信息技术有限公司 | Tumble detection method, tumble detection device, electronic equipment and computer-readable storage medium |
CN113269008B (en) * | 2020-02-14 | 2023-06-30 | 宁波吉利汽车研究开发有限公司 | Pedestrian track prediction method and device, electronic equipment and storage medium |
CN111353447B (en) * | 2020-03-05 | 2023-07-04 | 辽宁石油化工大学 | Human skeleton behavior recognition method based on graph convolution network |
CN111783711B (en) * | 2020-07-09 | 2022-11-08 | 中国科学院自动化研究所 | Skeleton behavior identification method and device based on body component layer |
CN112966628A (en) * | 2021-03-17 | 2021-06-15 | 广东工业大学 | Visual angle self-adaptive multi-target tumble detection method based on graph convolution neural network |
US11854305B2 (en) | 2021-05-09 | 2023-12-26 | International Business Machines Corporation | Skeleton-based action recognition using bi-directional spatial-temporal transformer |
CN113239819B (en) * | 2021-05-18 | 2022-05-03 | 西安电子科技大学广州研究院 | Visual angle normalization-based skeleton behavior identification method, device and equipment |
CN113688790A (en) * | 2021-09-22 | 2021-11-23 | 武汉工程大学 | Human body action early warning method and system based on image recognition |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729614A (en) * | 2012-10-16 | 2014-04-16 | 上海唐里信息技术有限公司 | People recognition method and device based on video images |
US8929600B2 (en) * | 2012-12-19 | 2015-01-06 | Microsoft Corporation | Action recognition based on depth maps |
US20160042227A1 (en) * | 2014-08-06 | 2016-02-11 | BAE Systems Information and Electronic Systems Integraton Inc. | System and method for determining view invariant spatial-temporal descriptors for motion detection and analysis |
CN107301370B (en) * | 2017-05-08 | 2020-10-16 | 上海大学 | Kinect three-dimensional skeleton model-based limb action identification method |
-
2018
- 2018-05-23 CN CN201810499463.5A patent/CN108764107B/en active Active
Non-Patent Citations (1)
Title |
---|
基于人体红外辐射的身份识别新方法;薛召军等;《天津职业技术师范大学学报》;20120331;第22卷(第1期);第1-5页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108764107A (en) | 2018-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764107B (en) | Behavior and identity combined identification method and device based on human body skeleton sequence | |
CN107273782B (en) | Online motion detection using recurrent neural networks | |
US5845048A (en) | Applicable recognition system for estimating object conditions | |
US10019629B2 (en) | Skeleton-based action detection using recurrent neural network | |
CN106951923B (en) | Robot three-dimensional shape recognition method based on multi-view information fusion | |
EP0737938B1 (en) | Method and apparatus for processing visual information | |
CN109919245B (en) | Deep learning model training method and device, training equipment and storage medium | |
CN107516127B (en) | Method and system for service robot to autonomously acquire attribution semantics of human-worn carried articles | |
KR20180057096A (en) | Device and method to perform recognizing and training face expression | |
CN112990211A (en) | Neural network training method, image processing method and device | |
CN110555481A (en) | Portrait style identification method and device and computer readable storage medium | |
CN111368656A (en) | Video content description method and video content description device | |
CN111062263A (en) | Method, device, computer device and storage medium for hand pose estimation | |
CN113569598A (en) | Image processing method and image processing apparatus | |
CN111738074B (en) | Pedestrian attribute identification method, system and device based on weak supervision learning | |
CN111428854A (en) | Structure searching method and structure searching device | |
CN114387513A (en) | Robot grabbing method and device, electronic equipment and storage medium | |
CN114708435A (en) | Obstacle size prediction and uncertainty analysis method based on semantic segmentation | |
CN113516227A (en) | Neural network training method and device based on federal learning | |
CN113838135A (en) | Pose estimation method, system and medium based on LSTM double-current convolution neural network | |
CN114140841A (en) | Point cloud data processing method, neural network training method and related equipment | |
Su et al. | Incremental learning with balanced update on receptive fields for multi-sensor data fusion | |
CN111104911A (en) | Pedestrian re-identification method and device based on big data training | |
TWI812053B (en) | Positioning method, electronic equipment and computer-readable storage medium | |
CN112818887B (en) | Human skeleton sequence behavior identification method based on unsupervised learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |