CN114373146A - Participant action identification method based on skeleton information and space-time characteristics - Google Patents

Participant action identification method based on skeleton information and space-time characteristics Download PDF

Info

Publication number
CN114373146A
CN114373146A CN202111568652.1A CN202111568652A CN114373146A CN 114373146 A CN114373146 A CN 114373146A CN 202111568652 A CN202111568652 A CN 202111568652A CN 114373146 A CN114373146 A CN 114373146A
Authority
CN
China
Prior art keywords
frame image
atomic
sequence
action
actions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111568652.1A
Other languages
Chinese (zh)
Inventor
马丕明
陈思颖
栾春芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202111568652.1A priority Critical patent/CN114373146A/en
Publication of CN114373146A publication Critical patent/CN114373146A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A participant action identification method based on skeleton information and space-time characteristics belongs to the field of computer vision, and comprises the following steps: acquiring a coordinate sequence of human skeleton joint points in a video conference monitoring picture; obtaining a spatial feature sequence of human body actions by calculating joint angle features and joint point distance features; classifying the atomic actions of the single-frame images according to the spatial characteristics, and further determining the atomic action number sequence of the multi-frame images in the video; learning the time change characteristics of the atomic actions by constructing Hidden Markov Models (HMMs) corresponding to different participant actions; and identifying the participant actions by calculating the maximum log-likelihood value of the atomic action number sequence with unknown classification under different participant actions corresponding to the HMM. The invention can accurately and efficiently identify the human body participating in the conference aiming at the video conference.

Description

Participant action identification method based on skeleton information and space-time characteristics
Technical Field
The invention relates to a participant action identification method based on skeletal information and space-time characteristics, and belongs to the field of computer vision.
Background
With the development of image processing technology, the research of video conference systems has also changed significantly with the introduction of new technologies. In order to meet the diversified demands of users, the image processing technology is adopted to identify the actions of the participants in the video conference, the conference state of the participants can be timely and effectively reflected, and the related management departments can be helped to accurately master the conference-opening effect, so that the video conference is assisted to realize automation and intellectualization. The invention automatically identifies the human body action in the meeting state according to the obtained human body skeleton data of the participants so that the management department can more effectively arrange and schedule the meeting, and has practical significance and application value.
When the deep learning method is used for human body action recognition, the data are analyzed by constructing the hierarchical neural network with learning capability, and the defect is that huge data volume is needed, otherwise, the overfitting condition may occur in the model training process. The patent number CN 113255616 a discloses a method for identifying human behaviors in video by the inventor of collusion method and named as "a method for identifying video behaviors based on deep learning". The method comprises the following steps: constructing a video behavior recognition network; a two-dimensional convolutional neural network Resnet is used as a backbone network of the video behavior identification network, and a convolutional neural network of an inter-frame time domain information extraction module is inserted into the backbone network; the two-dimensional convolutional neural network Resnet is used for extracting static characteristics of a target in a video; the inter-frame time domain information extraction module is used for optimizing the backbone network, extracting inter-frame information features by using bilinear operation, and fusing the intra-frame information and the inter-frame information to obtain high-identification spatio-temporal features for behavior classification. The method adopts the human behavior training sample to train and optimize parameters of the neural network model, and further realizes human behavior recognition in the video. However, the neural network model constructed by the method has a large depth, so that a large number of training samples are required for the model. And the conference motion in the video conference has the characteristics of few types and few data samples, and model training overfitting may occur by using a deep neural network model, so that the motion recognition effect is poor.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a conference participation action recognition method based on skeleton information and space-time characteristics, and aims to solve the problem that the action recognition result is poor due to overfitting of model training under the condition that a video conference participation action data set is few.
The technical scheme adopted by the invention is as follows:
a conference participation action identification method based on skeleton information and space-time characteristics determines a human conference participation action by processing a coordinate sequence of human skeleton joint points in a video conference monitoring picture, and comprises the following steps:
1) skeletal joint point coordinate sequence acquisition
Acquiring coordinate information of 8 skeletal joint points of the upper half of the human body in a video picture, wherein the coordinate information comprises a nose tip, a neck center, a right shoulder end, a right upper arm center, a right wrist center, a left shoulder end, a left upper arm center and a left wrist center in sequence; a video comprises T frames of images, and the coordinate sequence of joint points in the video is represented as [ X ]0,X1,...,Xt,...,XT-1]The coordinate sequence contained in the t-th frame image in the video is represented as Xt=(xt,0,xt,1,...,xt,l,...,xt,15) Wherein
Figure BDA0003422650010000011
l=0,1,…,15,(xt,0,xt,1) Coordinates representing the nose tip of the first joint in the image of the t-th frame, (x)t,2,xt,3) The coordinate representing the exact center of the neck of the second joint in the t-th image, and so on, (x)t,14,xt,15) Coordinates representing the center of the left wrist of the eighth joint point in the t frame image;
2) spatial feature sequence computation
a. For the t frame image, calculating the horizontal distance F between two joint points1,t,cAnd a vertical distance F2,t,cExtracting distance features of atomic actions, wherein
Figure BDA0003422650010000012
Arbitrarily selecting two joint points l from eight joint points l1And l2,l1≠l2Wherein l is1,l20,2, 14 are coordinates in the horizontal direction, l1,l21,3, 15 are coordinates in the vertical direction, the combination mode ordinal number of different 2 joint points is defined as c, and 28 modes are counted, then
Figure BDA0003422650010000013
When t frame image contains joint point l1And l2The corresponding horizontal coordinate is expressed as
Figure BDA0003422650010000021
The coordinate in the vertical direction is expressed as
Figure BDA0003422650010000022
And
Figure BDA0003422650010000023
l1and l2Horizontal distance F of1,t,cAnd a vertical distance F2,t,cIs expressed as:
Figure BDA0003422650010000024
b. for the t frame image, calculating the joint angle F related to the motion of the participant3,t,dExtracting angular features of atomic actions, wherein
Figure BDA0003422650010000025
d is 0,1, 4, and the joint angles related to the participation action are respectively the included angle between the connecting straight line from the nose tip to the left shoulder end and the right shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the left shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the right shoulder end, the included angle between the upper arm of the left hand and the lower arm, and the included angle between the upper arm of the right hand and the lower arm; taking the angle between the connecting straight lines from the tip of the nose to the left and right shoulders as an example, the calculation formula of the angle is as follows:
Figure BDA0003422650010000026
wherein x in the above formulat,0,xt,1,xt,4,xt,5,xt,10,xt,11Coordinate values representing the tip of the nose, the left shoulder end and the right shoulder end;
c. extracting distance features and angle features of each frame of image in the video according to the steps a and b to obtain a spatial feature sequence of the participating actions, and expressing the spatial feature sequence as
Figure BDA0003422650010000027
Wherein Ft=(F1,t,0,...,F1,t,27,F2,t,0,...,F2,t,27,F3,t,0,...,F3,t,4);
3) Atomic action number sequence acquisition
a. According to a specific application scene, setting V-type atomic actions, carrying out priority ordering on the atomic actions according to the action attention degree, and obtaining an atomic action number set represented as
Figure BDA0003422650010000028
b. Since the spatial features are obviously different in different atomic motions, the v < th > is formulated by searching the range of joint angles and joint point distances which are most representative of each atomic motion,
Figure BDA0003422650010000029
judgment standard of the atomic-like action: definition of U1,v,c、L1,v,c、U2,v,c、L2,v,cThe upper limit and the lower limit of the c joint horizontal and vertical distance characteristic value range, U, representing the v-type atom motion3,v,d、L3,v,dRepresenting the upper limit and the lower limit of the value range of the angle characteristic of the d joint of the v-th type atom motion;
c. according to the atomic motion priority, classifying the human body motion in the t frame image; if the spatial feature of the t-th frame satisfies L1,v,c<F1,t,c<U1,v,c、L2,v,c<F2,t,c<U2,v,cAnd L3,v,d<F3,t,d<U3,v,dIf the motion type in the t-th frame image belongs to the v-th type atomic motion, the observed value of the t-th frame image at the corresponding moment is
Figure BDA00034226500100000210
d. And c, classifying each frame of image in the video according to the atomic motion in the step c to obtain an observation sequence of the video, namely the atomic motion serial number is O ═ (O)0,o2,...,oT-1);
4) HMM construction of hidden Markov models corresponding to different participant actions
a. According to the specific application scene, if K-type participation actions are set, the participation action number set is expressed as
Figure BDA00034226500100000211
The training data of the kth-class participant action, i.e. the atomic action number sequence, is
Figure BDA00034226500100000212
Figure BDA00034226500100000213
Wherein o is0,k,o1,k,...,oT-1,kIs an observation sequence; the hidden sequence corresponding to the observed sequence is Ik=(i0,k,i1,k,...,iT-1,k),it,k∈Q,Q=(q0,q1,...,qN-1) Is a hidden state set, and N is the number of hidden states;
b. training data O according to kth-class participationkPerforming HMM modeling on the corresponding participant action; the HMM parameters defining the kth-class conferencing actions are
Figure BDA00034226500100000214
r is the number of iterations in which the state transition matrix
Figure BDA00034226500100000215
Figure BDA00034226500100000216
Figure BDA00034226500100000217
Indicating that the image is in the state q at the corresponding time of the t-th frame imagenThe t +1 th frame image is shifted to the state q corresponding to the time of the frame imagemAnd the observed probability matrix is represented as
Figure BDA00034226500100000218
Figure BDA0003422650010000031
Representing the image in state q at the moment corresponding to the t-th framenUnder the condition (1), the observed value is the probability of the atomic action number v, and the initial state probability vector is
Figure BDA0003422650010000032
Figure BDA0003422650010000033
The state representing the corresponding time of the 0 th frame image is qnThe probability of (d); symbol [ 2 ]]The subscripts N × N, N × V and 1 × N represent the dimensions of the matrix;
c. initializing HMM parameters to
Figure BDA0003422650010000034
Defining maximum iteration times R and maximum log-likelihood value error delta;
d. in the parameter
Figure BDA0003422650010000035
Under the conditions of (1), define αt,k(n) represents that the corresponding moment of the t frame image is in the state qnThe observation sequence is o0,k,o1,k,...,ot,kForward probability of (d); the forward probability at the corresponding time of the 0 th frame image is expressed as
Figure BDA0003422650010000036
Set T to 0,1, …, T-2 for all hidden states
Figure BDA0003422650010000037
The forward probability is calculated as follows:
Figure BDA0003422650010000038
and when the parameter is
Figure BDA0003422650010000039
Under the condition that the observed sequence is OkIs expressed as a probability log-likelihood value of
Figure BDA00034226500100000310
e. Under the premise that r is more than or equal to 1, probability log-likelihood values obtained by utilizing the r-th iteration and the r-1 th iteration
Figure BDA00034226500100000311
And
Figure BDA00034226500100000312
calculating the log-likelihood value error deltarThe calculation formula is as follows:
Figure BDA00034226500100000313
(symbol)
Figure BDA00034226500100000314
is expressed in the parameter of
Figure BDA00034226500100000315
Under the condition that the observed sequence is OkThe probability of (a) of (b) being,
Figure BDA00034226500100000316
representing HMM parameters obtained by the last iteration calculation;
f. definition of betat,k(n) denotes a t-th frame mapCorresponding to the state q at the momentnUnder the condition that the observed sequence is ot+1,k,ot+2,k,...,oT-1,kThe posterior probability of (1), the corresponding time of the T-1 frame image, betaT-1(n) 1, T-2, T-3, …,0, for all hidden states
Figure BDA00034226500100000317
The backward probability is expressed as:
Figure BDA00034226500100000318
g. defining a probability xit,k(m, n) represents an observed sequence of OkThe t frame image is in state q at corresponding timenThe t +1 th frame image is shifted to the state q at the corresponding timemThe formula is as follows:
Figure BDA00034226500100000319
and gamma ist,k(n) represents that the corresponding time state of the t frame image is qnIs expressed as:
Figure BDA00034226500100000320
h. computing HMM parameters using a re-estimation formula
Figure BDA00034226500100000321
Expressed as:
Figure BDA00034226500100000322
Figure BDA00034226500100000323
Figure BDA00034226500100000324
i. executing steps d to h until the iteration number R is R-1 or the log likelihood value error deltar<Delta, obtaining trained HMM parameters
Figure BDA00034226500100000325
5) Human body participant action recognition
a. Numbering the sequence O ═ for atomic actions of unknown class (O)0,o2,...,oT-1) Calculating the HMM parameter lambda corresponding to the O in different participant actions according to the method of step d in step 4)kLog likelihood value log P [ O | λ ] ofk};
b. And (3) performing conference participation action identification by calculating a maximum log-likelihood value, wherein the conference participation action number is expressed as:
Figure BDA0003422650010000041
wherein the equal sign right represents logP { O | λkAnd when the parameter k is the maximum value, the participant action identification is completed, and the identification result is the action type corresponding to the participant action number.
The invention has the beneficial effects that: the method is used for identifying the conference participation action in the video conference by combining human body skeleton information and the space-time information of video data. The spatial feature sequence of the human body action is calculated and the atomic action of a single frame image is classified, so that the atomic action number sequence of a plurality of frames of images in the video is determined, the spatial feature in the video image is fully utilized, and the identification effect is improved. Furthermore, by utilizing the characteristic of small training data amount, the HMM is adopted to carry out model construction on different participant actions, and the HMM is utilized to model a time process, so that time characteristics in a video can be better utilized, and a better participant action recognition effect is achieved.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The invention is further described below, but not limited to, with reference to the following figures and examples.
Example (b):
a conference participation action identification method based on skeleton information and space-time characteristics determines a human conference participation action by processing a coordinate sequence of human skeleton joint points in a video conference monitoring picture, as shown in figure 1, and comprises the following steps:
1) skeletal joint point coordinate sequence acquisition
Acquiring coordinate information of 8 skeletal joint points of the upper half of the human body in a video picture, wherein the coordinate information comprises a nose tip, a neck center, a right shoulder end, a right upper arm center, a right wrist center, a left shoulder end, a left upper arm center and a left wrist center in sequence; a video comprises T frames of images, and the coordinate sequence of joint points in the video is represented as [ X ]0,X1,...,Xt,...,XT-1]The coordinate sequence contained in the t-th frame image in the video is represented as Xt=(xt,0,xt,1,...,xt,l,...,xt,15) Wherein
Figure BDA0003422650010000042
l=0,1,…,15,(xt,0,xt,1) Coordinates representing the nose tip of the first joint in the image of the t-th frame, (x)t,2,xt,3) The coordinate representing the exact center of the neck of the second joint in the t-th image, and so on, (x)t,14,xt,15) Coordinates representing the center of the left wrist of the eighth joint point in the t frame image;
2) spatial feature sequence computation
a. For the t frame image, calculating the horizontal distance F between two joint points1,t,cAnd a vertical distance F2,t,cExtracting distance features of atomic actions, wherein
Figure BDA0003422650010000043
Arbitrarily selecting two joint points l from eight joint points l1And l2,l1≠l2Wherein l is1,l20,2, 14 are coordinates in the horizontal direction, l1,l21,3, 15 are coordinates in the vertical direction, the combination mode ordinal number of different 2 joint points is defined as c, and 28 modes are counted, then
Figure BDA0003422650010000044
When t frame image contains joint point l1And l2The corresponding horizontal coordinate is expressed as
Figure BDA0003422650010000045
The coordinate in the vertical direction is expressed as
Figure BDA0003422650010000046
And
Figure BDA0003422650010000047
l1and l2Horizontal distance F of1,t,cAnd a vertical distance F2,t,cIs expressed as:
Figure BDA0003422650010000048
b. for the t frame image, calculating the joint angle F related to the motion of the participant3,t,dExtracting angular features of atomic actions, wherein
Figure BDA0003422650010000049
d is 0,1, 4, and the joint angles related to the participation action are respectively the included angle between the connecting straight line from the nose tip to the left shoulder end and the right shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the left shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the right shoulder end, the included angle between the upper arm of the left hand and the lower arm, and the included angle between the upper arm of the right hand and the lower arm; taking the angle between the connecting straight lines from the tip of the nose to the left and right shoulders as an example, the calculation formula of the angle is as follows:
Figure BDA00034226500100000410
wherein x in the above formulat,0,xt,1,xt,4,xt,5,xt,10,xt,11Coordinate values representing the tip of the nose, the left shoulder end and the right shoulder end;
c. extracting distance features and angle features of each frame of image in the video according to the steps a and b to obtain a spatial feature sequence of the participating actions, and expressing the spatial feature sequence as
Figure BDA0003422650010000051
Wherein Ft=(F1,t,0,...,F1,t,27,F2,t,0,...,F2,t,27,F3,t,0,...,F3,t,4);
3) Atomic action number sequence acquisition
a. According to a specific application scene, setting V-type atomic actions, carrying out priority ordering on the atomic actions according to the action attention degree, and obtaining an atomic action number set represented as
Figure BDA0003422650010000052
b. Since the spatial features are obviously different in different atomic motions, the v < th > is formulated by searching the range of joint angles and joint point distances which are most representative of each atomic motion,
Figure BDA0003422650010000053
judgment standard of the atomic-like action: definition of U1,v,c、L1,v,c、U2,v,c、L2,v,cThe upper limit and the lower limit of the c joint horizontal and vertical distance characteristic value range, U, representing the v-type atom motion3,v,d、L3,v,dRepresenting the upper limit and the lower limit of the value range of the angle characteristic of the d joint of the v-th type atom motion;
c. according to the atomic motion priority, classifying the human body motion in the t frame image; if the t-th frameSpatial characteristics satisfy L1,v,c<F1,t,c<U1,v,c、L2,v,c<F2,t,c<U2,v,cAnd L3,v,d<F3,t,d<U3,v,dIf the motion type in the t-th frame image belongs to the v-th type atomic motion, the observed value of the t-th frame image at the corresponding moment is
Figure BDA0003422650010000054
d. And c, classifying each frame of image in the video according to the atomic motion in the step c to obtain an observation sequence of the video, namely the atomic motion serial number is O ═ (O)0,o2,...,oT-1);
4) HMM construction of hidden Markov models corresponding to different participant actions
a. According to the specific application scene, if K-type participation actions are set, the participation action number set is expressed as
Figure BDA0003422650010000055
The training data of the kth-class participant action, i.e. the atomic action number sequence, is
Figure BDA0003422650010000056
Figure BDA0003422650010000057
Wherein o is0,k,o1,k,...,oT-1,kIs an observation sequence; the hidden sequence corresponding to the observed sequence is Ik=(i0,k,i1,k,...,iT-1,k),it,k∈Q,Q=(q0,q1,...,qN-1) Is a hidden state set, and N is the number of hidden states;
b. training data O according to kth-class participationkPerforming HMM modeling on the corresponding participant action; the HMM parameters defining the kth-class conferencing actions are
Figure BDA0003422650010000058
r is the number of iterations in which the state transition matrix
Figure BDA0003422650010000059
Figure BDA00034226500100000510
Figure BDA00034226500100000511
Indicating that the image is in the state q at the corresponding time of the t-th frame imagenThe t +1 th frame image is shifted to the state q corresponding to the time of the frame imagemAnd the observed probability matrix is represented as
Figure BDA00034226500100000512
Figure BDA00034226500100000513
Representing the image in state q at the moment corresponding to the t-th framenUnder the condition (1), the observed value is the probability of the atomic action number v, and the initial state probability vector is
Figure BDA00034226500100000514
Figure BDA00034226500100000515
The state representing the corresponding time of the 0 th frame image is qnThe probability of (d); symbol [ 2 ]]The subscripts N × N, N × V and 1 × N represent the dimensions of the matrix;
c. initializing HMM parameters to
Figure BDA00034226500100000516
Defining maximum iteration times R and maximum log-likelihood value error delta;
d. in the parameter
Figure BDA00034226500100000517
Under the conditions of (1), define αt,k(n) represents that the corresponding moment of the t frame image is in the state qnThe observation sequence is o0,k,o1,k,...,ot,kForward probability of (d); the forward probability at the corresponding time of the 0 th frame image is expressed as
Figure BDA00034226500100000518
Set T to 0,1, …, T-2 for all hidden states
Figure BDA00034226500100000519
The forward probability is calculated as follows:
Figure BDA00034226500100000520
and when the parameter is
Figure BDA00034226500100000521
Under the condition that the observed sequence is OkIs expressed as a probability log-likelihood value of
Figure BDA0003422650010000061
e. Under the premise that r is more than or equal to 1, probability log-likelihood values obtained by utilizing the r-th iteration and the r-1 th iteration
Figure BDA0003422650010000062
And
Figure BDA0003422650010000063
calculating the log-likelihood value error deltarThe calculation formula is as follows:
Figure BDA0003422650010000064
(symbol)
Figure BDA0003422650010000065
is expressed in the parameter of
Figure BDA0003422650010000066
Under the condition that the observed sequence is OkThe probability of (a) of (b) being,
Figure BDA0003422650010000067
representing HMM parameters obtained by the last iteration calculation;
f. definition of betat,k(n) indicates that the t-th frame image is in the state q at the corresponding timenUnder the condition that the observed sequence is ot+1,k,ot+2,k,...,oT-1,kThe posterior probability of (1), the corresponding time of the T-1 frame image, betaT-1(n) 1, T-2, T-3, …,0, for all hidden states
Figure BDA0003422650010000068
The backward probability is expressed as:
Figure BDA0003422650010000069
g. defining a probability xit,k(m, n) represents an observed sequence of OkThe t frame image is in state q at corresponding timenThe t +1 th frame image is shifted to the state q at the corresponding timemThe formula is as follows:
Figure BDA00034226500100000610
and gamma ist,k(n) represents that the corresponding time state of the t frame image is qnIs expressed as:
Figure BDA00034226500100000611
h. computing HMM parameters using a re-estimation formula
Figure BDA00034226500100000612
Expressed as:
Figure BDA00034226500100000613
Figure BDA00034226500100000614
Figure BDA00034226500100000615
i. executing steps d to h until the iteration number R is R-1 or the log likelihood value error deltar<Delta, obtaining trained HMM parameters
Figure BDA00034226500100000616
5) Human body participant action recognition
a. Numbering the sequence O ═ for atomic actions of unknown class (O)0,o2,...,oT-1) Calculating the HMM parameter lambda corresponding to the O in different participant actions according to the method of step d in step 4)kLog likelihood value log P [ O | λ ] ofk};
b. And (3) performing conference participation action identification by calculating a maximum log-likelihood value, wherein the conference participation action number is expressed as:
Figure BDA00034226500100000617
wherein the equal sign right represents logP { O | λkAnd when the parameter k is the maximum value, the participant action identification is completed, and the identification result is the action type corresponding to the participant action number.

Claims (1)

1. A conference participation action identification method based on skeleton information and space-time characteristics determines a human conference participation action by processing a coordinate sequence of human skeleton joint points in a video conference monitoring picture, and comprises the following steps:
1) skeletal joint point coordinate sequence acquisition
Acquiring coordinate information of 8 skeletal joint points of the upper half of the human body in the video picture, namely the tip of the nose, the center of the neck, the right shoulder end, the center of the right upper arm, the center of the right wrist, the left shoulder end and the left upper arm in sequenceThe right center, the left wrist center; a video comprises T frames of images, and the coordinate sequence of joint points in the video is represented as [ X ]0,X1,...,Xt,...,XT-1]The coordinate sequence contained in the t-th frame image in the video is represented as Xt=(xt,0,xt,1,...,xt,l,...,xt,15) Wherein
Figure FDA00034226500000000114
l=0,1,...,15,(xt,0,xt,1) Coordinates representing the nose tip of the first joint in the image of the t-th frame, (x)t,2,xt,3) The coordinate representing the exact center of the neck of the second joint in the t-th image, and so on, (x)t,14,xt,15) Coordinates representing the center of the left wrist of the eighth joint point in the t frame image;
2) spatial feature sequence computation
a. For the t frame image, calculating the horizontal distance F between two joint points1,t,cAnd a vertical distance F2,t,cExtracting distance features of atomic actions, wherein
Figure FDA0003422650000000011
Arbitrarily selecting two joint points l from eight joint points l1And l2,l1≠l2Wherein l is1,l20,2, 14 are coordinates in the horizontal direction, l1,l21,3, 15 are coordinates in the vertical direction, the combination mode ordinal number of different 2 joint points is defined as c, and 28 modes are counted, then
Figure FDA0003422650000000012
When t frame image contains joint point l1And l2The corresponding horizontal coordinate is expressed as
Figure FDA0003422650000000013
The coordinate in the vertical direction is expressed as
Figure FDA0003422650000000014
And
Figure FDA0003422650000000015
and l2Horizontal distance F of1,t,cAnd a vertical distance F2,t,cIs expressed as:
Figure FDA0003422650000000016
b. for the t frame image, calculating the joint angle F related to the motion of the participant3,t,dExtracting angular features of atomic actions, wherein
Figure FDA0003422650000000017
d is 0,1, 4, and the joint angles related to the participation action are respectively the included angle between the connecting straight line from the nose tip to the left shoulder end and the right shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the left shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the right shoulder end, the included angle between the upper arm of the left hand and the lower arm, and the included angle between the upper arm of the right hand and the lower arm;
c. extracting distance features and angle features of each frame of image in the video according to the steps a and b to obtain a spatial feature sequence of the participating actions, and expressing the spatial feature sequence as
Figure FDA0003422650000000018
Wherein Ft=(F1,t,0,...,F1,t,27,F2,t,0,...,F2,t,27,F3,t,0,...,F3,t,4);
3) Atomic action number sequence acquisition
a. According to a specific application scene, setting V-type atomic actions, and performing priority ordering on the atomic actions according to the action attention degree to obtain an atomic action number setIs shown as
Figure FDA0003422650000000019
b. In different atomic actions, the space characteristics are obviously different, and the range of the joint angle and the joint point distance which are most representative of each atomic action is found to formulate the
Figure FDA00034226500000000110
Judgment standard of the atomic-like action: definition of U1,v,c、L1,v,c、U2,v,c、L2,v,cThe upper limit and the lower limit of the c joint horizontal and vertical distance characteristic value range, U, representing the v-type atom motion3,v,d、L3,v,dRepresenting the upper limit and the lower limit of the value range of the angle characteristic of the d joint of the v-th type atom motion;
c. according to the atomic motion priority, classifying the human body motion in the t frame image; if the spatial feature of the t-th frame satisfies L1,v,c<F1,t,c<U1,v,c、L2,v,c<F2,t,c<U2,v,cAnd L3,v,d<F3,t,d<U3,v,dIf the motion type in the t-th frame image belongs to the v-th type atomic motion, the observed value of the t-th frame image at the corresponding moment is
Figure FDA00034226500000000113
d. And c, classifying each frame of image in the video according to the atomic motion in the step c to obtain an observation sequence of the video, namely the atomic motion serial number is O ═ (O)0,o2,...,oT-1);
4) HMM construction of hidden Markov models corresponding to different participant actions
a. According to the specific application scene, if K-type participation actions are set, the participation action number set is expressed as
Figure FDA00034226500000000112
The training data of the kth-class participant action, i.e. the atomic action number sequence, is
Figure FDA00034226500000000231
Figure FDA0003422650000000022
Wherein o is0,k,o1,k,...,oT-1,kIs an observation sequence; the hidden sequence corresponding to the observed sequence is Ik=(i0,k,i1,k,...,iT-1,k),it,k∈Q,Q=(q0,q1,...,qN-1) Is a hidden state set, and N is the number of hidden states;
b. training data O according to kth-class participationkPerforming HMM modeling on the corresponding participant action; the HMM parameters defining the kth-class conferencing actions are
Figure FDA0003422650000000023
r is the number of iterations in which the state transition matrix
Figure FDA0003422650000000024
Figure FDA0003422650000000025
Figure FDA0003422650000000026
Indicating that the image is in the state q at the corresponding time of the t-th frame imagenThe t +1 th frame image is shifted to the state q corresponding to the time of the frame imagemAnd the observed probability matrix is represented as
Figure FDA0003422650000000027
Figure FDA0003422650000000028
Indicating that the image is in the state q at the corresponding time of the t-th frame imagenUnder the conditions of (1), seeThe measured value is the probability of the atomic motion number v, and the initial state probability vector is
Figure FDA0003422650000000029
Figure FDA00034226500000000210
The state representing the corresponding time of the 0 th frame image is qnThe probability of (d); symbol [ 2 ]]The subscripts N × N, N × V and 1 × N represent the dimensions of the matrix;
c. initializing HMM parameters to
Figure FDA00034226500000000211
Defining maximum iteration times R and maximum log-likelihood value error delta;
d. in the parameter
Figure FDA00034226500000000212
Under the conditions of (1), define αt,k(n) represents that the corresponding moment of the t frame image is in the state qnThe observation sequence is o0,k,o1,k,...,ot,kForward probability of (d); the forward probability at the corresponding time of the 0 th frame image is expressed as
Figure FDA00034226500000000213
Set T0, 1.. times, T-2 for all hidden states
Figure FDA00034226500000000214
The forward probability is calculated as follows:
Figure FDA00034226500000000215
and when the parameter is
Figure FDA00034226500000000216
Under the condition that the observed sequence is OkTable of probability log-likelihood valuesShown as
Figure FDA00034226500000000217
e. Under the premise that r is more than or equal to 1, probability log-likelihood values obtained by utilizing the r-th iteration and the r-1 th iteration
Figure FDA00034226500000000218
And
Figure FDA00034226500000000219
calculating the log-likelihood value error deltarThe calculation formula is as follows:
Figure FDA00034226500000000220
(symbol)
Figure FDA00034226500000000221
is expressed in the parameter of
Figure FDA00034226500000000222
Under the condition that the observed sequence is OkThe probability of (a) of (b) being,
Figure FDA00034226500000000230
representing HMM parameters obtained by the last iteration calculation;
f. definition of betat,k(n) indicates that the t-th frame image is in the state q at the corresponding timenUnder the condition that the observed sequence is ot+1,k,ot+2,k,...,oT-1,kThe posterior probability of (1), the corresponding time of the T-1 frame image, betaT-11, T-2, T-3, 0, for all hidden states
Figure FDA00034226500000000224
The backward probability is expressed as:
Figure FDA00034226500000000225
g. defining a probability xit,k(m, n) represents an observed sequence of OkThe t frame image is in state q at corresponding timenThe t +1 th frame image is shifted to the state q at the corresponding timemThe formula is as follows:
Figure FDA00034226500000000226
and gamma ist,k(n) represents that the corresponding time state of the t frame image is qnIs expressed as:
Figure FDA00034226500000000227
h. computing HMM parameters using a re-estimation formula
Figure FDA00034226500000000228
Expressed as:
Figure FDA00034226500000000229
Figure FDA0003422650000000031
Figure FDA0003422650000000032
i. executing steps d to h until the iteration number R is R-1 or the log likelihood value error deltar< delta, obtaining HMM parameters for completion of training
Figure FDA0003422650000000033
5) Human body participant action recognition
a. Numbering the sequence O ═ for atomic actions of unknown class (O)0,o2,...,oT-1) Calculating the HMM parameter lambda corresponding to the O in different participant actions according to the method of step d in step 4)kLog likelihood value log P [ O | λ ] ofk};
b. And (3) performing conference participation action identification by calculating a maximum log-likelihood value, wherein the conference participation action number is expressed as:
Figure FDA0003422650000000034
wherein the equal sign right represents logP { O | λkAnd when the parameter k is the maximum value, the participant action identification is completed, and the identification result is the action type corresponding to the participant action number.
CN202111568652.1A 2021-12-21 2021-12-21 Participant action identification method based on skeleton information and space-time characteristics Pending CN114373146A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111568652.1A CN114373146A (en) 2021-12-21 2021-12-21 Participant action identification method based on skeleton information and space-time characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111568652.1A CN114373146A (en) 2021-12-21 2021-12-21 Participant action identification method based on skeleton information and space-time characteristics

Publications (1)

Publication Number Publication Date
CN114373146A true CN114373146A (en) 2022-04-19

Family

ID=81140973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111568652.1A Pending CN114373146A (en) 2021-12-21 2021-12-21 Participant action identification method based on skeleton information and space-time characteristics

Country Status (1)

Country Link
CN (1) CN114373146A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117747055A (en) * 2024-02-21 2024-03-22 北京万物成理科技有限公司 Training task difficulty determining method and device, electronic equipment and storage medium
CN117747055B (en) * 2024-02-21 2024-05-28 北京万物成理科技有限公司 Training task difficulty determining method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117747055A (en) * 2024-02-21 2024-03-22 北京万物成理科技有限公司 Training task difficulty determining method and device, electronic equipment and storage medium
CN117747055B (en) * 2024-02-21 2024-05-28 北京万物成理科技有限公司 Training task difficulty determining method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106897670B (en) Express violence sorting identification method based on computer vision
CN109815826B (en) Method and device for generating face attribute model
CN109410168B (en) Modeling method of convolutional neural network for determining sub-tile classes in an image
CN112784736B (en) Character interaction behavior recognition method based on multi-modal feature fusion
CN108171133B (en) Dynamic gesture recognition method based on characteristic covariance matrix
CN108121950B (en) Large-pose face alignment method and system based on 3D model
CN107169117B (en) Hand-drawn human motion retrieval method based on automatic encoder and DTW
CN113205595B (en) Construction method and application of 3D human body posture estimation model
CN113255457A (en) Animation character facial expression generation method and system based on facial expression recognition
CN112836597A (en) Multi-hand posture key point estimation method based on cascade parallel convolution neural network
CN112329525A (en) Gesture recognition method and device based on space-time diagram convolutional neural network
CN114360067A (en) Dynamic gesture recognition method based on deep learning
CN110827304A (en) Traditional Chinese medicine tongue image positioning method and system based on deep convolutional network and level set method
CN113808047A (en) Human motion capture data denoising method
CN111339888B (en) Double interaction behavior recognition method based on joint point motion diagram
CN115205737B (en) Motion real-time counting method and system based on transducer model
CN114373146A (en) Participant action identification method based on skeleton information and space-time characteristics
CN111178141B (en) LSTM human body behavior identification method based on attention mechanism
CN115187633A (en) Six-degree-of-freedom visual feedback real-time motion tracking method
CN113192186B (en) 3D human body posture estimation model establishing method based on single-frame image and application thereof
Li et al. Few-shot meta-learning on point cloud for semantic segmentation
CN113205545A (en) Behavior recognition analysis method and system under regional environment
Hao et al. Evaluation System of Foreign Language Teaching Quality Based on Spatiotemporal Feature Fusion
Feng et al. An Analysis System of Students' Attendence State in Classroom Based on Human Posture Recognition
CN116469175B (en) Visual interaction method and system for infant education

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination