CN114373146A - Participant action identification method based on skeleton information and space-time characteristics - Google Patents
Participant action identification method based on skeleton information and space-time characteristics Download PDFInfo
- Publication number
- CN114373146A CN114373146A CN202111568652.1A CN202111568652A CN114373146A CN 114373146 A CN114373146 A CN 114373146A CN 202111568652 A CN202111568652 A CN 202111568652A CN 114373146 A CN114373146 A CN 114373146A
- Authority
- CN
- China
- Prior art keywords
- frame image
- atomic
- sequence
- action
- actions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A participant action identification method based on skeleton information and space-time characteristics belongs to the field of computer vision, and comprises the following steps: acquiring a coordinate sequence of human skeleton joint points in a video conference monitoring picture; obtaining a spatial feature sequence of human body actions by calculating joint angle features and joint point distance features; classifying the atomic actions of the single-frame images according to the spatial characteristics, and further determining the atomic action number sequence of the multi-frame images in the video; learning the time change characteristics of the atomic actions by constructing Hidden Markov Models (HMMs) corresponding to different participant actions; and identifying the participant actions by calculating the maximum log-likelihood value of the atomic action number sequence with unknown classification under different participant actions corresponding to the HMM. The invention can accurately and efficiently identify the human body participating in the conference aiming at the video conference.
Description
Technical Field
The invention relates to a participant action identification method based on skeletal information and space-time characteristics, and belongs to the field of computer vision.
Background
With the development of image processing technology, the research of video conference systems has also changed significantly with the introduction of new technologies. In order to meet the diversified demands of users, the image processing technology is adopted to identify the actions of the participants in the video conference, the conference state of the participants can be timely and effectively reflected, and the related management departments can be helped to accurately master the conference-opening effect, so that the video conference is assisted to realize automation and intellectualization. The invention automatically identifies the human body action in the meeting state according to the obtained human body skeleton data of the participants so that the management department can more effectively arrange and schedule the meeting, and has practical significance and application value.
When the deep learning method is used for human body action recognition, the data are analyzed by constructing the hierarchical neural network with learning capability, and the defect is that huge data volume is needed, otherwise, the overfitting condition may occur in the model training process. The patent number CN 113255616 a discloses a method for identifying human behaviors in video by the inventor of collusion method and named as "a method for identifying video behaviors based on deep learning". The method comprises the following steps: constructing a video behavior recognition network; a two-dimensional convolutional neural network Resnet is used as a backbone network of the video behavior identification network, and a convolutional neural network of an inter-frame time domain information extraction module is inserted into the backbone network; the two-dimensional convolutional neural network Resnet is used for extracting static characteristics of a target in a video; the inter-frame time domain information extraction module is used for optimizing the backbone network, extracting inter-frame information features by using bilinear operation, and fusing the intra-frame information and the inter-frame information to obtain high-identification spatio-temporal features for behavior classification. The method adopts the human behavior training sample to train and optimize parameters of the neural network model, and further realizes human behavior recognition in the video. However, the neural network model constructed by the method has a large depth, so that a large number of training samples are required for the model. And the conference motion in the video conference has the characteristics of few types and few data samples, and model training overfitting may occur by using a deep neural network model, so that the motion recognition effect is poor.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a conference participation action recognition method based on skeleton information and space-time characteristics, and aims to solve the problem that the action recognition result is poor due to overfitting of model training under the condition that a video conference participation action data set is few.
The technical scheme adopted by the invention is as follows:
a conference participation action identification method based on skeleton information and space-time characteristics determines a human conference participation action by processing a coordinate sequence of human skeleton joint points in a video conference monitoring picture, and comprises the following steps:
1) skeletal joint point coordinate sequence acquisition
Acquiring coordinate information of 8 skeletal joint points of the upper half of the human body in a video picture, wherein the coordinate information comprises a nose tip, a neck center, a right shoulder end, a right upper arm center, a right wrist center, a left shoulder end, a left upper arm center and a left wrist center in sequence; a video comprises T frames of images, and the coordinate sequence of joint points in the video is represented as [ X ]0,X1,...,Xt,...,XT-1]The coordinate sequence contained in the t-th frame image in the video is represented as Xt=(xt,0,xt,1,...,xt,l,...,xt,15) Whereinl=0,1,…,15,(xt,0,xt,1) Coordinates representing the nose tip of the first joint in the image of the t-th frame, (x)t,2,xt,3) The coordinate representing the exact center of the neck of the second joint in the t-th image, and so on, (x)t,14,xt,15) Coordinates representing the center of the left wrist of the eighth joint point in the t frame image;
2) spatial feature sequence computation
a. For the t frame image, calculating the horizontal distance F between two joint points1,t,cAnd a vertical distance F2,t,cExtracting distance features of atomic actions, whereinArbitrarily selecting two joint points l from eight joint points l1And l2,l1≠l2Wherein l is1,l20,2, 14 are coordinates in the horizontal direction, l1,l21,3, 15 are coordinates in the vertical direction, the combination mode ordinal number of different 2 joint points is defined as c, and 28 modes are counted, thenWhen t frame image contains joint point l1And l2The corresponding horizontal coordinate is expressed asThe coordinate in the vertical direction is expressed asAndl1and l2Horizontal distance F of1,t,cAnd a vertical distance F2,t,cIs expressed as:
b. for the t frame image, calculating the joint angle F related to the motion of the participant3,t,dExtracting angular features of atomic actions, whereind is 0,1, 4, and the joint angles related to the participation action are respectively the included angle between the connecting straight line from the nose tip to the left shoulder end and the right shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the left shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the right shoulder end, the included angle between the upper arm of the left hand and the lower arm, and the included angle between the upper arm of the right hand and the lower arm; taking the angle between the connecting straight lines from the tip of the nose to the left and right shoulders as an example, the calculation formula of the angle is as follows:
wherein x in the above formulat,0,xt,1,xt,4,xt,5,xt,10,xt,11Coordinate values representing the tip of the nose, the left shoulder end and the right shoulder end;
c. extracting distance features and angle features of each frame of image in the video according to the steps a and b to obtain a spatial feature sequence of the participating actions, and expressing the spatial feature sequence asWherein Ft=(F1,t,0,...,F1,t,27,F2,t,0,...,F2,t,27,F3,t,0,...,F3,t,4);
3) Atomic action number sequence acquisition
a. According to a specific application scene, setting V-type atomic actions, carrying out priority ordering on the atomic actions according to the action attention degree, and obtaining an atomic action number set represented as
b. Since the spatial features are obviously different in different atomic motions, the v < th > is formulated by searching the range of joint angles and joint point distances which are most representative of each atomic motion,judgment standard of the atomic-like action: definition of U1,v,c、L1,v,c、U2,v,c、L2,v,cThe upper limit and the lower limit of the c joint horizontal and vertical distance characteristic value range, U, representing the v-type atom motion3,v,d、L3,v,dRepresenting the upper limit and the lower limit of the value range of the angle characteristic of the d joint of the v-th type atom motion;
c. according to the atomic motion priority, classifying the human body motion in the t frame image; if the spatial feature of the t-th frame satisfies L1,v,c<F1,t,c<U1,v,c、L2,v,c<F2,t,c<U2,v,cAnd L3,v,d<F3,t,d<U3,v,dIf the motion type in the t-th frame image belongs to the v-th type atomic motion, the observed value of the t-th frame image at the corresponding moment is
d. And c, classifying each frame of image in the video according to the atomic motion in the step c to obtain an observation sequence of the video, namely the atomic motion serial number is O ═ (O)0,o2,...,oT-1);
4) HMM construction of hidden Markov models corresponding to different participant actions
a. According to the specific application scene, if K-type participation actions are set, the participation action number set is expressed asThe training data of the kth-class participant action, i.e. the atomic action number sequence, is Wherein o is0,k,o1,k,...,oT-1,kIs an observation sequence; the hidden sequence corresponding to the observed sequence is Ik=(i0,k,i1,k,...,iT-1,k),it,k∈Q,Q=(q0,q1,...,qN-1) Is a hidden state set, and N is the number of hidden states;
b. training data O according to kth-class participationkPerforming HMM modeling on the corresponding participant action; the HMM parameters defining the kth-class conferencing actions arer is the number of iterations in which the state transition matrix Indicating that the image is in the state q at the corresponding time of the t-th frame imagenThe t +1 th frame image is shifted to the state q corresponding to the time of the frame imagemAnd the observed probability matrix is represented as Representing the image in state q at the moment corresponding to the t-th framenUnder the condition (1), the observed value is the probability of the atomic action number v, and the initial state probability vector is The state representing the corresponding time of the 0 th frame image is qnThe probability of (d); symbol [ 2 ]]The subscripts N × N, N × V and 1 × N represent the dimensions of the matrix;
c. initializing HMM parameters toDefining maximum iteration times R and maximum log-likelihood value error delta;
d. in the parameterUnder the conditions of (1), define αt,k(n) represents that the corresponding moment of the t frame image is in the state qnThe observation sequence is o0,k,o1,k,...,ot,kForward probability of (d); the forward probability at the corresponding time of the 0 th frame image is expressed asSet T to 0,1, …, T-2 for all hidden statesThe forward probability is calculated as follows:
and when the parameter isUnder the condition that the observed sequence is OkIs expressed as a probability log-likelihood value of
e. Under the premise that r is more than or equal to 1, probability log-likelihood values obtained by utilizing the r-th iteration and the r-1 th iterationAndcalculating the log-likelihood value error deltarThe calculation formula is as follows:
(symbol)is expressed in the parameter ofUnder the condition that the observed sequence is OkThe probability of (a) of (b) being,representing HMM parameters obtained by the last iteration calculation;
f. definition of betat,k(n) denotes a t-th frame mapCorresponding to the state q at the momentnUnder the condition that the observed sequence is ot+1,k,ot+2,k,...,oT-1,kThe posterior probability of (1), the corresponding time of the T-1 frame image, betaT-1(n) 1, T-2, T-3, …,0, for all hidden statesThe backward probability is expressed as:
g. defining a probability xit,k(m, n) represents an observed sequence of OkThe t frame image is in state q at corresponding timenThe t +1 th frame image is shifted to the state q at the corresponding timemThe formula is as follows:
and gamma ist,k(n) represents that the corresponding time state of the t frame image is qnIs expressed as:
i. executing steps d to h until the iteration number R is R-1 or the log likelihood value error deltar<Delta, obtaining trained HMM parameters
5) Human body participant action recognition
a. Numbering the sequence O ═ for atomic actions of unknown class (O)0,o2,...,oT-1) Calculating the HMM parameter lambda corresponding to the O in different participant actions according to the method of step d in step 4)kLog likelihood value log P [ O | λ ] ofk};
b. And (3) performing conference participation action identification by calculating a maximum log-likelihood value, wherein the conference participation action number is expressed as:
wherein the equal sign right represents logP { O | λkAnd when the parameter k is the maximum value, the participant action identification is completed, and the identification result is the action type corresponding to the participant action number.
The invention has the beneficial effects that: the method is used for identifying the conference participation action in the video conference by combining human body skeleton information and the space-time information of video data. The spatial feature sequence of the human body action is calculated and the atomic action of a single frame image is classified, so that the atomic action number sequence of a plurality of frames of images in the video is determined, the spatial feature in the video image is fully utilized, and the identification effect is improved. Furthermore, by utilizing the characteristic of small training data amount, the HMM is adopted to carry out model construction on different participant actions, and the HMM is utilized to model a time process, so that time characteristics in a video can be better utilized, and a better participant action recognition effect is achieved.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The invention is further described below, but not limited to, with reference to the following figures and examples.
Example (b):
a conference participation action identification method based on skeleton information and space-time characteristics determines a human conference participation action by processing a coordinate sequence of human skeleton joint points in a video conference monitoring picture, as shown in figure 1, and comprises the following steps:
1) skeletal joint point coordinate sequence acquisition
Acquiring coordinate information of 8 skeletal joint points of the upper half of the human body in a video picture, wherein the coordinate information comprises a nose tip, a neck center, a right shoulder end, a right upper arm center, a right wrist center, a left shoulder end, a left upper arm center and a left wrist center in sequence; a video comprises T frames of images, and the coordinate sequence of joint points in the video is represented as [ X ]0,X1,...,Xt,...,XT-1]The coordinate sequence contained in the t-th frame image in the video is represented as Xt=(xt,0,xt,1,...,xt,l,...,xt,15) Whereinl=0,1,…,15,(xt,0,xt,1) Coordinates representing the nose tip of the first joint in the image of the t-th frame, (x)t,2,xt,3) The coordinate representing the exact center of the neck of the second joint in the t-th image, and so on, (x)t,14,xt,15) Coordinates representing the center of the left wrist of the eighth joint point in the t frame image;
2) spatial feature sequence computation
a. For the t frame image, calculating the horizontal distance F between two joint points1,t,cAnd a vertical distance F2,t,cExtracting distance features of atomic actions, whereinArbitrarily selecting two joint points l from eight joint points l1And l2,l1≠l2Wherein l is1,l20,2, 14 are coordinates in the horizontal direction, l1,l21,3, 15 are coordinates in the vertical direction, the combination mode ordinal number of different 2 joint points is defined as c, and 28 modes are counted, thenWhen t frame image contains joint point l1And l2The corresponding horizontal coordinate is expressed asThe coordinate in the vertical direction is expressed asAndl1and l2Horizontal distance F of1,t,cAnd a vertical distance F2,t,cIs expressed as:
b. for the t frame image, calculating the joint angle F related to the motion of the participant3,t,dExtracting angular features of atomic actions, whereind is 0,1, 4, and the joint angles related to the participation action are respectively the included angle between the connecting straight line from the nose tip to the left shoulder end and the right shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the left shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the right shoulder end, the included angle between the upper arm of the left hand and the lower arm, and the included angle between the upper arm of the right hand and the lower arm; taking the angle between the connecting straight lines from the tip of the nose to the left and right shoulders as an example, the calculation formula of the angle is as follows:
wherein x in the above formulat,0,xt,1,xt,4,xt,5,xt,10,xt,11Coordinate values representing the tip of the nose, the left shoulder end and the right shoulder end;
c. extracting distance features and angle features of each frame of image in the video according to the steps a and b to obtain a spatial feature sequence of the participating actions, and expressing the spatial feature sequence asWherein Ft=(F1,t,0,...,F1,t,27,F2,t,0,...,F2,t,27,F3,t,0,...,F3,t,4);
3) Atomic action number sequence acquisition
a. According to a specific application scene, setting V-type atomic actions, carrying out priority ordering on the atomic actions according to the action attention degree, and obtaining an atomic action number set represented as
b. Since the spatial features are obviously different in different atomic motions, the v < th > is formulated by searching the range of joint angles and joint point distances which are most representative of each atomic motion,judgment standard of the atomic-like action: definition of U1,v,c、L1,v,c、U2,v,c、L2,v,cThe upper limit and the lower limit of the c joint horizontal and vertical distance characteristic value range, U, representing the v-type atom motion3,v,d、L3,v,dRepresenting the upper limit and the lower limit of the value range of the angle characteristic of the d joint of the v-th type atom motion;
c. according to the atomic motion priority, classifying the human body motion in the t frame image; if the t-th frameSpatial characteristics satisfy L1,v,c<F1,t,c<U1,v,c、L2,v,c<F2,t,c<U2,v,cAnd L3,v,d<F3,t,d<U3,v,dIf the motion type in the t-th frame image belongs to the v-th type atomic motion, the observed value of the t-th frame image at the corresponding moment is
d. And c, classifying each frame of image in the video according to the atomic motion in the step c to obtain an observation sequence of the video, namely the atomic motion serial number is O ═ (O)0,o2,...,oT-1);
4) HMM construction of hidden Markov models corresponding to different participant actions
a. According to the specific application scene, if K-type participation actions are set, the participation action number set is expressed asThe training data of the kth-class participant action, i.e. the atomic action number sequence, is Wherein o is0,k,o1,k,...,oT-1,kIs an observation sequence; the hidden sequence corresponding to the observed sequence is Ik=(i0,k,i1,k,...,iT-1,k),it,k∈Q,Q=(q0,q1,...,qN-1) Is a hidden state set, and N is the number of hidden states;
b. training data O according to kth-class participationkPerforming HMM modeling on the corresponding participant action; the HMM parameters defining the kth-class conferencing actions arer is the number of iterations in which the state transition matrix Indicating that the image is in the state q at the corresponding time of the t-th frame imagenThe t +1 th frame image is shifted to the state q corresponding to the time of the frame imagemAnd the observed probability matrix is represented as Representing the image in state q at the moment corresponding to the t-th framenUnder the condition (1), the observed value is the probability of the atomic action number v, and the initial state probability vector is The state representing the corresponding time of the 0 th frame image is qnThe probability of (d); symbol [ 2 ]]The subscripts N × N, N × V and 1 × N represent the dimensions of the matrix;
c. initializing HMM parameters toDefining maximum iteration times R and maximum log-likelihood value error delta;
d. in the parameterUnder the conditions of (1), define αt,k(n) represents that the corresponding moment of the t frame image is in the state qnThe observation sequence is o0,k,o1,k,...,ot,kForward probability of (d); the forward probability at the corresponding time of the 0 th frame image is expressed asSet T to 0,1, …, T-2 for all hidden statesThe forward probability is calculated as follows:
and when the parameter isUnder the condition that the observed sequence is OkIs expressed as a probability log-likelihood value of
e. Under the premise that r is more than or equal to 1, probability log-likelihood values obtained by utilizing the r-th iteration and the r-1 th iterationAndcalculating the log-likelihood value error deltarThe calculation formula is as follows:
(symbol)is expressed in the parameter ofUnder the condition that the observed sequence is OkThe probability of (a) of (b) being,representing HMM parameters obtained by the last iteration calculation;
f. definition of betat,k(n) indicates that the t-th frame image is in the state q at the corresponding timenUnder the condition that the observed sequence is ot+1,k,ot+2,k,...,oT-1,kThe posterior probability of (1), the corresponding time of the T-1 frame image, betaT-1(n) 1, T-2, T-3, …,0, for all hidden statesThe backward probability is expressed as:
g. defining a probability xit,k(m, n) represents an observed sequence of OkThe t frame image is in state q at corresponding timenThe t +1 th frame image is shifted to the state q at the corresponding timemThe formula is as follows:
and gamma ist,k(n) represents that the corresponding time state of the t frame image is qnIs expressed as:
i. executing steps d to h until the iteration number R is R-1 or the log likelihood value error deltar<Delta, obtaining trained HMM parameters
5) Human body participant action recognition
a. Numbering the sequence O ═ for atomic actions of unknown class (O)0,o2,...,oT-1) Calculating the HMM parameter lambda corresponding to the O in different participant actions according to the method of step d in step 4)kLog likelihood value log P [ O | λ ] ofk};
b. And (3) performing conference participation action identification by calculating a maximum log-likelihood value, wherein the conference participation action number is expressed as:
wherein the equal sign right represents logP { O | λkAnd when the parameter k is the maximum value, the participant action identification is completed, and the identification result is the action type corresponding to the participant action number.
Claims (1)
1. A conference participation action identification method based on skeleton information and space-time characteristics determines a human conference participation action by processing a coordinate sequence of human skeleton joint points in a video conference monitoring picture, and comprises the following steps:
1) skeletal joint point coordinate sequence acquisition
Acquiring coordinate information of 8 skeletal joint points of the upper half of the human body in the video picture, namely the tip of the nose, the center of the neck, the right shoulder end, the center of the right upper arm, the center of the right wrist, the left shoulder end and the left upper arm in sequenceThe right center, the left wrist center; a video comprises T frames of images, and the coordinate sequence of joint points in the video is represented as [ X ]0,X1,...,Xt,...,XT-1]The coordinate sequence contained in the t-th frame image in the video is represented as Xt=(xt,0,xt,1,...,xt,l,...,xt,15) Whereinl=0,1,...,15,(xt,0,xt,1) Coordinates representing the nose tip of the first joint in the image of the t-th frame, (x)t,2,xt,3) The coordinate representing the exact center of the neck of the second joint in the t-th image, and so on, (x)t,14,xt,15) Coordinates representing the center of the left wrist of the eighth joint point in the t frame image;
2) spatial feature sequence computation
a. For the t frame image, calculating the horizontal distance F between two joint points1,t,cAnd a vertical distance F2,t,cExtracting distance features of atomic actions, whereinArbitrarily selecting two joint points l from eight joint points l1And l2,l1≠l2Wherein l is1,l20,2, 14 are coordinates in the horizontal direction, l1,l21,3, 15 are coordinates in the vertical direction, the combination mode ordinal number of different 2 joint points is defined as c, and 28 modes are counted, thenWhen t frame image contains joint point l1And l2The corresponding horizontal coordinate is expressed asThe coordinate in the vertical direction is expressed asAndand l2Horizontal distance F of1,t,cAnd a vertical distance F2,t,cIs expressed as:
b. for the t frame image, calculating the joint angle F related to the motion of the participant3,t,dExtracting angular features of atomic actions, whereind is 0,1, 4, and the joint angles related to the participation action are respectively the included angle between the connecting straight line from the nose tip to the left shoulder end and the right shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the left shoulder end, the included angle between the connecting straight line from the center of the neck to the nose tip and the connecting straight line from the center of the neck to the right shoulder end, the included angle between the upper arm of the left hand and the lower arm, and the included angle between the upper arm of the right hand and the lower arm;
c. extracting distance features and angle features of each frame of image in the video according to the steps a and b to obtain a spatial feature sequence of the participating actions, and expressing the spatial feature sequence asWherein Ft=(F1,t,0,...,F1,t,27,F2,t,0,...,F2,t,27,F3,t,0,...,F3,t,4);
3) Atomic action number sequence acquisition
a. According to a specific application scene, setting V-type atomic actions, and performing priority ordering on the atomic actions according to the action attention degree to obtain an atomic action number setIs shown as
b. In different atomic actions, the space characteristics are obviously different, and the range of the joint angle and the joint point distance which are most representative of each atomic action is found to formulate theJudgment standard of the atomic-like action: definition of U1,v,c、L1,v,c、U2,v,c、L2,v,cThe upper limit and the lower limit of the c joint horizontal and vertical distance characteristic value range, U, representing the v-type atom motion3,v,d、L3,v,dRepresenting the upper limit and the lower limit of the value range of the angle characteristic of the d joint of the v-th type atom motion;
c. according to the atomic motion priority, classifying the human body motion in the t frame image; if the spatial feature of the t-th frame satisfies L1,v,c<F1,t,c<U1,v,c、L2,v,c<F2,t,c<U2,v,cAnd L3,v,d<F3,t,d<U3,v,dIf the motion type in the t-th frame image belongs to the v-th type atomic motion, the observed value of the t-th frame image at the corresponding moment is
d. And c, classifying each frame of image in the video according to the atomic motion in the step c to obtain an observation sequence of the video, namely the atomic motion serial number is O ═ (O)0,o2,...,oT-1);
4) HMM construction of hidden Markov models corresponding to different participant actions
a. According to the specific application scene, if K-type participation actions are set, the participation action number set is expressed asThe training data of the kth-class participant action, i.e. the atomic action number sequence, is Wherein o is0,k,o1,k,...,oT-1,kIs an observation sequence; the hidden sequence corresponding to the observed sequence is Ik=(i0,k,i1,k,...,iT-1,k),it,k∈Q,Q=(q0,q1,...,qN-1) Is a hidden state set, and N is the number of hidden states;
b. training data O according to kth-class participationkPerforming HMM modeling on the corresponding participant action; the HMM parameters defining the kth-class conferencing actions arer is the number of iterations in which the state transition matrix Indicating that the image is in the state q at the corresponding time of the t-th frame imagenThe t +1 th frame image is shifted to the state q corresponding to the time of the frame imagemAnd the observed probability matrix is represented as Indicating that the image is in the state q at the corresponding time of the t-th frame imagenUnder the conditions of (1), seeThe measured value is the probability of the atomic motion number v, and the initial state probability vector is The state representing the corresponding time of the 0 th frame image is qnThe probability of (d); symbol [ 2 ]]The subscripts N × N, N × V and 1 × N represent the dimensions of the matrix;
c. initializing HMM parameters toDefining maximum iteration times R and maximum log-likelihood value error delta;
d. in the parameterUnder the conditions of (1), define αt,k(n) represents that the corresponding moment of the t frame image is in the state qnThe observation sequence is o0,k,o1,k,...,ot,kForward probability of (d); the forward probability at the corresponding time of the 0 th frame image is expressed asSet T0, 1.. times, T-2 for all hidden statesThe forward probability is calculated as follows:
and when the parameter isUnder the condition that the observed sequence is OkTable of probability log-likelihood valuesShown as
e. Under the premise that r is more than or equal to 1, probability log-likelihood values obtained by utilizing the r-th iteration and the r-1 th iterationAndcalculating the log-likelihood value error deltarThe calculation formula is as follows:
(symbol)is expressed in the parameter ofUnder the condition that the observed sequence is OkThe probability of (a) of (b) being,representing HMM parameters obtained by the last iteration calculation;
f. definition of betat,k(n) indicates that the t-th frame image is in the state q at the corresponding timenUnder the condition that the observed sequence is ot+1,k,ot+2,k,...,oT-1,kThe posterior probability of (1), the corresponding time of the T-1 frame image, betaT-11, T-2, T-3, 0, for all hidden statesThe backward probability is expressed as:
g. defining a probability xit,k(m, n) represents an observed sequence of OkThe t frame image is in state q at corresponding timenThe t +1 th frame image is shifted to the state q at the corresponding timemThe formula is as follows:
and gamma ist,k(n) represents that the corresponding time state of the t frame image is qnIs expressed as:
i. executing steps d to h until the iteration number R is R-1 or the log likelihood value error deltar< delta, obtaining HMM parameters for completion of training
5) Human body participant action recognition
a. Numbering the sequence O ═ for atomic actions of unknown class (O)0,o2,...,oT-1) Calculating the HMM parameter lambda corresponding to the O in different participant actions according to the method of step d in step 4)kLog likelihood value log P [ O | λ ] ofk};
b. And (3) performing conference participation action identification by calculating a maximum log-likelihood value, wherein the conference participation action number is expressed as:
wherein the equal sign right represents logP { O | λkAnd when the parameter k is the maximum value, the participant action identification is completed, and the identification result is the action type corresponding to the participant action number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111568652.1A CN114373146A (en) | 2021-12-21 | 2021-12-21 | Participant action identification method based on skeleton information and space-time characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111568652.1A CN114373146A (en) | 2021-12-21 | 2021-12-21 | Participant action identification method based on skeleton information and space-time characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114373146A true CN114373146A (en) | 2022-04-19 |
Family
ID=81140973
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111568652.1A Pending CN114373146A (en) | 2021-12-21 | 2021-12-21 | Participant action identification method based on skeleton information and space-time characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114373146A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117747055A (en) * | 2024-02-21 | 2024-03-22 | 北京万物成理科技有限公司 | Training task difficulty determining method and device, electronic equipment and storage medium |
CN117747055B (en) * | 2024-02-21 | 2024-05-28 | 北京万物成理科技有限公司 | Training task difficulty determining method and device, electronic equipment and storage medium |
-
2021
- 2021-12-21 CN CN202111568652.1A patent/CN114373146A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117747055A (en) * | 2024-02-21 | 2024-03-22 | 北京万物成理科技有限公司 | Training task difficulty determining method and device, electronic equipment and storage medium |
CN117747055B (en) * | 2024-02-21 | 2024-05-28 | 北京万物成理科技有限公司 | Training task difficulty determining method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106897670B (en) | Express violence sorting identification method based on computer vision | |
CN109815826B (en) | Method and device for generating face attribute model | |
CN109410168B (en) | Modeling method of convolutional neural network for determining sub-tile classes in an image | |
CN112784736B (en) | Character interaction behavior recognition method based on multi-modal feature fusion | |
CN108171133B (en) | Dynamic gesture recognition method based on characteristic covariance matrix | |
CN108121950B (en) | Large-pose face alignment method and system based on 3D model | |
CN107169117B (en) | Hand-drawn human motion retrieval method based on automatic encoder and DTW | |
CN113205595B (en) | Construction method and application of 3D human body posture estimation model | |
CN113255457A (en) | Animation character facial expression generation method and system based on facial expression recognition | |
CN112836597A (en) | Multi-hand posture key point estimation method based on cascade parallel convolution neural network | |
CN112329525A (en) | Gesture recognition method and device based on space-time diagram convolutional neural network | |
CN114360067A (en) | Dynamic gesture recognition method based on deep learning | |
CN110827304A (en) | Traditional Chinese medicine tongue image positioning method and system based on deep convolutional network and level set method | |
CN113808047A (en) | Human motion capture data denoising method | |
CN111339888B (en) | Double interaction behavior recognition method based on joint point motion diagram | |
CN115205737B (en) | Motion real-time counting method and system based on transducer model | |
CN114373146A (en) | Participant action identification method based on skeleton information and space-time characteristics | |
CN111178141B (en) | LSTM human body behavior identification method based on attention mechanism | |
CN115187633A (en) | Six-degree-of-freedom visual feedback real-time motion tracking method | |
CN113192186B (en) | 3D human body posture estimation model establishing method based on single-frame image and application thereof | |
Li et al. | Few-shot meta-learning on point cloud for semantic segmentation | |
CN113205545A (en) | Behavior recognition analysis method and system under regional environment | |
Hao et al. | Evaluation System of Foreign Language Teaching Quality Based on Spatiotemporal Feature Fusion | |
Feng et al. | An Analysis System of Students' Attendence State in Classroom Based on Human Posture Recognition | |
CN116469175B (en) | Visual interaction method and system for infant education |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |