CN112329562A - Human body interaction action recognition method based on skeleton features and slice recurrent neural network - Google Patents

Human body interaction action recognition method based on skeleton features and slice recurrent neural network Download PDF

Info

Publication number
CN112329562A
CN112329562A CN202011146588.3A CN202011146588A CN112329562A CN 112329562 A CN112329562 A CN 112329562A CN 202011146588 A CN202011146588 A CN 202011146588A CN 112329562 A CN112329562 A CN 112329562A
Authority
CN
China
Prior art keywords
slice
layer
connection
neural network
recurrent neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011146588.3A
Other languages
Chinese (zh)
Other versions
CN112329562B (en
Inventor
成科扬
吴金霞
毛启容
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202011146588.3A priority Critical patent/CN112329562B/en
Publication of CN112329562A publication Critical patent/CN112329562A/en
Application granted granted Critical
Publication of CN112329562B publication Critical patent/CN112329562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body interaction action recognition method based on skeleton characteristics and a slice recurrent neural network, which is characterized in that for each action, OpenPose is used for acquiring skeleton sequences, action characteristics are acquired through the skeletons, interaction actions among the skeletons are designed, and additional interaction information is acquired to increase the accuracy of action recognition; constructing a new skeleton diagram through the connections, approximating the skeleton diagram through a high-order quick Chebyshev polynomial of spectrogram convolution, and extracting action characteristics; in order to enhance the extraction of time domain information, a slice cyclic neural network is innovatively applied to video motion recognition to capture the dependency information of the whole motion sequence, and meanwhile, a high-level feature diagram of space-time modeling can properly make up for long-term dependency loss caused by the slice network. The invention improves the accuracy of interactive identification, has good applicability, and can improve the speed of long-time sequence feature extraction by using the slice recurrent neural network.

Description

Human body interaction action recognition method based on skeleton features and slice recurrent neural network
Technical Field
The invention relates to the technical fields of computer vision, pattern recognition and the like, in particular to a human body interaction action recognition method based on skeleton features and a slice recurrent neural network.
Background
The interactive behavior recognition based on the video has higher practical value and wide application prospect. The purpose of human motion recognition is to analyze and understand human-to-human motion and interactions in a video. Although the motion recognition method based on the RGB video or the optical flow has high performance, the motion recognition method is susceptible to the change of the background, the illumination and the appearance, and the extraction of the optical flow information also requires high calculation cost. Nowadays, more and more people research the skeleton data, and the human skeleton can well express the motion of the human body, and is favorable for analyzing the motion of the human body.
At present, relatively mature researches aim at identifying single skeleton actions, and discussion on interaction actions is lacked. Compared with single-person actions, the interactive actions have higher complexity, more types of limb actions are performed in the process of finishing the interactive actions, and the change among the limbs is more diversified. How to effectively characterize an interaction and model and analyze interaction events is a very challenging problem.
When a video sequence is processed, in order to completely capture time information of the whole motion sequence and dependency information of the motion sequence, processing is generally performed through a recurrent neural network model. However, the current node information of the traditional recurrent neural network is only related to the previous node, so that the traditional recurrent neural network can only model short-time dynamic information and cannot store long-time sequences; meanwhile, the standard recurrent neural network structure cannot realize parallel computation like a CNN network model, and the computation speed is relatively low. The present invention therefore proposes a sliced recurrent neural network model to solve the above-mentioned problems.
Disclosure of Invention
In order to solve the problems of incomplete extraction of interaction information and missing of inter-frame dependency information in motion recognition, the invention provides a framework-based interaction space-time modeling method on the basis of single-person skeleton diagram convolution, designs the interaction information among frameworks, and increases the accuracy of motion recognition by capturing the additional interaction information. Meanwhile, the graph convolution and the slice cyclic neural network are combined, so that the dependency between nodes and between frames can be better extracted, the interactive behavior characteristics can be accurately extracted, and the interactive behavior can be identified.
The technical scheme adopted by the invention is as follows: the invention provides an interactive identification method based on skeleton characteristics and a slice recurrent neural network, which comprises the following steps:
(1) and based on the video frame, extracting a skeleton of the action, and designing different connections for the joint points so as to extract the interactive information between different nodes.
(2) A new skeleton map is constructed and a spectrogram convolution is applied to the spatio-temporal skeleton map to obtain a high-level feature map.
(3) And a slice recurrent neural network model is adopted to obtain time sequence dependence information, and the running speed of the time sequence dependence information is improved through parallel calculation. And classifying the actions in the video according to the characteristics extracted by the slice recurrent neural network.
Further, in the step (1), the connection is divided into single connection, interactive connection and inter-frame connection, and various connection methods include:
(1-1) the intra-frame connection comprises two parts of single person self-connection and interactive connection, wherein the interactive part mainly refers to the connection of points which are easy to have similar joint changes as corresponding connection, and epsilon is used1For example, the actions of two participants are basically consistent, the connection between the corresponding joint points is established, and when the actions of the participants are basically consistent, the establishment of the corresponding connection edges plays an important role. The connection occurring between other joints is called an extrinsic connection, using ∈2To indicate. Assigning theta to epsilon1The weight of the edge in (1), assigning δ to ε2The weight of the edge in (1), i.e.:
Figure BDA0002739921530000021
i and j denote the joint points of different persons, wi,jRepresenting different weights of the edges. In order to determine the connection of the nodes in the intra-frame interactive modeling, the relevance between interactive nodes is measured by Euclidean distance. Calculate the values between all points, i.e.:
d(xi,xj)=||xi-xj||2
wherein xi,xjRespectively as a key point i and a key pointj is characterized. Calculating Euclidean distance d (x) of edges of corresponding connection and extrinsic connectioni,xj) And normalizing the obtained distance, and mapping the result to [0,1 ]]In between, the normalization method of the maximum and minimum values is adopted, namely:
Figure BDA0002739921530000022
wherein d ismaxRepresents the maximum value of joint distance, dminRepresenting the minimum joint distance. Therefore, not only can some new necessary interaction connections be added, but also the underlying graph can have certain sparsity.
(1-2) in time domain, the corresponding joint points between the respective video frames are independent of each other, allowing the frame x to be dividedtIs connected to its previous frame xt-1And the next frame xt+1There are also two types of connections between the corresponding neighborhoods and frames: 1) joint of the same type, denoted ε3(ii) a 2) The connected connection between any joint points in adjacent frames is expressed as epsilon4. The weights of these two edges are expressed as:
Figure BDA0002739921530000031
further, the step (2) is realized by:
and constructing an undirected graph G which is { V, E, A }, and consists of a vertex set V, an edge set E connecting the vertices and a weighted adjacency matrix A. Constructing a multi-frame adjacency matrix:
Figure BDA0002739921530000032
wherein A istotalAs a new adjacency matrix, A*(i)Adjacent matrix of intra-frame model representing frame i, Ai,jRepresenting the adjacency matrix between frame i and frame j. 0 represents a zero matrix. The graph laplacian matrix thus calculated is: l ═ D-Atotal. D is a measurement matrix, and D is a measurement matrix,from opposite angle
Figure BDA0002739921530000033
Is shown as ai,jRepresenting the weight assigned to the edge connecting vertex i and vertex j.
The change of the bone is simulated by the graph laplace matrix. The laplacian matrix L is essentially a high-pass operator that captures the changes in the underlying signal. For any signal x ∈ RNIt satisfies:
Figure BDA0002739921530000034
where (Lx) (i) represents the ith component of Lx. N is a radical ofiIs the set of vertices connected to i.
Then, the chebyshev polynomial is adopted to approximate the spectrogram convolution:
Figure BDA0002739921530000035
in the formula (I), the compound is shown in the specification,
Figure BDA0002739921530000036
is the defined symmetric normalized graph laplacian. ThetakDenotes the k-th Chebyshev coefficient, gθRepresenting the convolution kernel, K the order of chebyshev,
Figure BDA0002739921530000037
is a chebyshev polynomial of order k. It is composed of
Figure BDA0002739921530000038
Repeatedly calculated, wherein
Figure BDA0002739921530000039
Thus, the spectrogram convolution equation is defined as:
Figure BDA00027399215300000310
in the formula (I), the compound is shown in the specification,
Figure BDA00027399215300000311
is a weight parameter θ 'to be learned from the network'kThe matrix of (a) is,
Figure BDA00027399215300000312
represents WkIs determined by the dimensions of two adjacent connection layers, where F1、F2Respectively, the dimensions of the connection layer are indicated,
Figure BDA00027399215300000313
w in the formula of integral expressionkOf (c) is calculated. b is the bias and ReLU is the activation function.
Further, in the step (3), the action timing sequence dependency information is acquired by using a slice recurrent neural network, and the actions in the video are classified according to the extracted feature information, which specifically includes the following steps:
(3-1) after extracting the high-level graph features from the graph volume model, wherein T is the sequence length, the input X is divided into N subsequences with equal length, and then the length T of each subsequence N is as follows:
Figure BDA0002739921530000041
whereby the input is represented as X ═ N1,N2,...Nn]In which N isp=[x(p-1)*t+1,x(p-1)*t+2,...xp*t]And X comprises a T frame image characteristic sequence.
(3-2) slicing each subsequence N into N subsequences of equal length again, and repeating the slicing operation k times until there is a suitable minimum subsequence length in the bottom layer, and obtaining k +1 layers by slicing k times.
(3-3) at the 0 th layer, the recursion unit acts on each minimum subsequence through the connection structure to obtain the last hidden state of each minimum subsequence at the 0 th layer, and the hidden state is used as the input of the parent sequence of the first layer. Then, using the last hidden state of each sub-sequence on the (p-1) th layer as the input of the parent sequence on the p-th layer, and calculating the last hidden state of the sub-sequence on the p-th layer, namely:
Figure BDA0002739921530000042
Figure BDA0002739921530000043
wherein
Figure BDA0002739921530000044
An implicit representation of the ith subsequence representing the p-th layer. mss denotes the smallest subsequence at level 0,/0Denotes the minimum subsequence length of layer 0,/pIndicates the length of the sub-sequence of the p-th layer,
Figure BDA0002739921530000045
representing the forward calculation process of a p-th layer GRU unit; .
(3-4) applying a plurality of gate control loop unit gru (gate recovery unit) network models to each layer of subsequence until a final hidden state F of the top layer (k layer) is obtained, namely:
Figure BDA0002739921530000046
(3-5) performing motion classification on the extracted video features:
p=softmax(WFF+bF)
softmax is a normalized exponential function, WFAs a weight matrix, bFIs the bias term.
The invention has the beneficial effects that:
1. the interactive identification method based on the skeleton characteristic and the slice recurrent neural network integrates the time and space interactive characteristics, and solves the problems of incomplete interactive information extraction and interframe dependency loss to a great extent;
2. in addition, the slice recurrent neural network can greatly retain information of long-term dependence and simultaneously carry out parallel computation, and meanwhile, the high-level characteristics of space-time modeling are used as the input of the slice recurrent neural network, so that the loss of long-term dependence of the recurrent neural network caused by slices can be properly compensated, and the interactive action can be more completely and accurately identified.
3. The method can be applied to a plurality of fields such as intelligent monitoring, man-machine interaction, video sequence understanding, medical treatment and health.
Drawings
FIG. 1 is a schematic flow chart of the practice of the present invention.
Fig. 2 is a diagram of a sliced recurrent neural network according to the present invention.
Detailed Description
The invention will be further explained with reference to the drawings.
As shown in fig. 1, the interactive identification method based on the skeleton feature and the slice recurrent neural network mainly includes connection between different joints in a frame and a frame, extraction of joint features by spectrogram convolution, and a slice recurrent neural network method. The method of carrying out the invention is explained in detail below in relation to these several aspects.
Skeleton extraction is performed on each action sequence by openpos. And extracting information (x, y, z) of 15 joint points of the human skeleton of each frame of each video, wherein x is the abscissa of the joint point on the image, y is the ordinate of the joint point on the image, and z is the confidence value of the joint point. The joint point connection is divided into single connection, interactive connection and interframe connection, and various connection methods comprise:
(1) the intra-frame connection includes two parts of single-person self-connection and interactive connection, the interactive part mainly refers to the connection of points which are easy to produce similar joint change as correspondent connection, and uses epsilon1For example, the actions of two participants are basically consistent, the connection between the corresponding joint points is established, and when the actions of the participants are basically consistent, the establishment of the corresponding connection edges plays an important role. In addition, other joint points occurThe connection of (2) is called extrinsic connection, using ∈2To indicate. Assigning theta to epsilon1The weight of the edge in (1), assigning δ to ε2The weight of the edge in (1), i.e.:
Figure BDA0002739921530000051
i and j denote the joint points of different persons, wi,jRepresenting different weights of the edges. In order to determine the connection of the nodes in the intra-frame interactive modeling, the relevance between interactive nodes is measured by Euclidean distance. Calculate the values between all points, i.e.:
d(xi,xj)=||xi-xj||2
wherein xi,xjRespectively, the feature representations of keypoint i and keypoint j. Calculating Euclidean distance d (x) of edges of corresponding connection and extrinsic connectioni,xj) And normalizing the obtained distance, and mapping the result to [0,1 ]]In between, the normalization method of the maximum and minimum values is adopted, namely:
Figure BDA0002739921530000061
wherein d ismaxRepresents the maximum value of joint distance, dminRepresenting the minimum joint distance. Therefore, not only can some new necessary interaction connections be added, but also the underlying graph can have certain sparsity.
(2) In the time domain, the corresponding joint points between the video frames are independent of each other, allowing the frame x to be dividedtIs connected to its previous frame xt-1And the next frame xt+1There are also two types of connections between the corresponding neighborhoods and frames: 1) joint of the same type, denoted ε3(ii) a 2) The connected connection between any joint points in adjacent frames is expressed as epsilon4. The weights of these two edges are expressed as:
Figure BDA0002739921530000062
(3) and constructing an undirected graph G which is { V, E, A }, and consists of a vertex set V, an edge set E connecting the vertices and a weighted adjacency matrix A. Constructing a multi-frame adjacency matrix:
Figure BDA0002739921530000063
wherein A istotalAs a new adjacency matrix, A*(i)Adjacent matrix of intra-frame model representing frame i, Ai,jRepresenting the adjacency matrix between frame i and frame j. 0 represents a zero matrix. The graph laplacian matrix thus calculated is: l ═ D-Atotal. D is a measurement matrix consisting of diagonal angles
Figure BDA0002739921530000064
Is shown as ai,jRepresenting the weight assigned to the edge connecting vertex i and vertex j.
The change of the bone is simulated by the graph laplace matrix. The laplacian matrix L is essentially a high-pass operator that captures the changes in the underlying signal. For any signal x ∈ RNIt satisfies:
Figure BDA0002739921530000065
where (Lx) (i) represents the ith component of Lx. N is a radical ofiIs the set of vertices connected to i. Then, approximation of the convolution of the spectrogram by a Chebyshev polynomial is adopted to realize:
Figure BDA0002739921530000071
in the formula (I), the compound is shown in the specification,
Figure BDA0002739921530000072
is the defined symmetric normalized graph laplacian. ThetakDenotes the k-th Chebyshev coefficient, gθRepresenting the convolution kernel, K the order of chebyshev,
Figure BDA0002739921530000073
is a chebyshev polynomial of order k. It is composed of
Figure BDA0002739921530000074
Repeatedly calculated, wherein
Figure BDA0002739921530000075
Thus, the spectrogram convolution equation is defined as:
Figure BDA0002739921530000076
in the formula (I), the compound is shown in the specification,
Figure BDA0002739921530000077
is a weight parameter θ 'to be learned from the network'kThe matrix of (a) is,
Figure BDA0002739921530000078
represents WkIs determined by the dimensions of two adjacent connection layers, where F1、F2Respectively, the dimensions of the connection layer are indicated,
Figure BDA0002739921530000079
w in the formula of integral expressionkOf (c) is calculated. b is the bias and ReLU is the activation function.
As shown in fig. 2, obtaining action timing dependency information by using a slice recurrent neural network, and classifying actions according to the information specifically includes the following steps:
(1) after extracting the high-level graph features from the graph convolution model, T is the sequence length, and the input X is divided into N subsequences with equal length, so that the length T of each subsequence N is:
Figure BDA00027399215300000710
whereby the input is represented as X ═ N1,N2,...Nn]In which N isp=[x(p-1)*t+1,x(p-1)*t+2,...xp*t]And X comprises a T frame image characteristic sequence.
(2) Each subsequence N is sliced again into N subsequences of equal length, and the slicing operation is repeated k times until there is an appropriate minimum subsequence length at the bottom layer, and k +1 layers are obtained by slicing k times.
(3) At the 0 th layer, the recursion unit acts on each minimum subsequence through the connection structure to obtain the last hidden state of each minimum subsequence at the 0 th layer, and the hidden state is used as the input of the parent sequence at the first layer. Then, using the last hidden state of each sub-sequence on the (p-1) th layer as the input of the parent sequence on the p-th layer, and calculating the last hidden state of the sub-sequence on the p-th layer, namely:
Figure BDA00027399215300000711
Figure BDA00027399215300000712
wherein
Figure BDA00027399215300000713
An implicit representation of the ith subsequence representing the p-th layer. mss denotes the smallest subsequence at level 0,/0Denotes the minimum subsequence length of layer 0,/pIndicates the length of the sub-sequence of the p-th layer,
Figure BDA0002739921530000081
the forward calculation process of the p-th layer GRU unit is shown.
(4) Applying a plurality of gate control loop unit (gru) (gate recovery unit) network models to each layer of subsequence until a final hidden state F of the top layer (k layer) is obtained, that is:
Figure BDA0002739921530000082
(5) and performing action classification on the extracted video features:
p=softmax(WFF+bF)
softmax is a normalized exponential function, WFAs a weight matrix, bFIs the bias term.
The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A human body interaction action recognition method based on skeleton features and a slice recurrent neural network is characterized by comprising the following steps:
s1: extracting a skeleton of the action based on the video frame, and designing different connections for the joint points to extract interaction information among different nodes;
s2: constructing a new skeleton graph, and applying spectrogram convolution to the space-time skeleton graph to obtain a high-level feature graph;
s3: and acquiring time sequence dependence information by adopting a slice recurrent neural network model, and classifying actions in the video according to the characteristics extracted by the slice recurrent neural network.
2. The human body interaction recognition method based on the skeleton feature and the slice recurrent neural network as claimed in claim 1, wherein the different connections designed in S1 include: intra-frame connections and inter-frame connections.
3. The human body interaction recognition method based on the skeleton feature and the slice recurrent neural network as claimed in claim 2, wherein the intra-frame connection comprises a single-person connection and an interactive connection; the interconnecting parts are mainly connections of points susceptible to joint-like changes called correspondencesConnection by epsilon1The actions of the two participants are basically consistent, and the corresponding joint points are connected; the connection occurring between other joints is called an extrinsic connection, using ∈2To represent; assigning theta to epsilon1The weight of the edge in (1), assigning δ to ε2The weight of the edge in (1), i.e.:
Figure FDA0002739921520000011
i and j denote the joint points of different persons, wi,jDifferent weights representing edges;
in order to determine the connection of the joint points in the intra-frame connection modeling, the relevance between the interactive nodes is measured by Euclidean distance, and the values between all the points are calculated, namely:
d(xi,xj)=||xi-xj||2
wherein xi,xjRespectively representing the characteristics of the key point i and the key point j; calculating Euclidean distance d (x) of edges of corresponding connection and extrinsic connectioni,xj) And normalizing the obtained distance, and mapping the result to [0,1 ]]In between, the normalization method of the maximum and minimum values is adopted, namely:
Figure FDA0002739921520000012
wherein d ismaxRepresents the maximum value of joint distance, dminRepresenting the minimum joint distance.
4. The human body interaction recognition method based on the skeleton feature and the slice recurrent neural network as claimed in claim 2, wherein the inter-frame connection comprises two connection types: 1) joint of the same type, denoted ε3(ii) a 2) The connected connection between any joint points in adjacent frames is expressed as epsilon4
The weights of the two edges are expressed as:
Figure FDA0002739921520000021
5. the human body interaction recognition method based on the skeleton feature and the slice recurrent neural network as claimed in claim 2, wherein the implementation of S2 includes:
and constructing an undirected graph G which is { V, E, A }, consists of a vertex set V, an edge set E connecting the vertices and a weighted adjacency matrix A, and constructs a multi-frame adjacency matrix:
Figure FDA0002739921520000022
wherein A istotalAs a new adjacency matrix, A*(i)Adjacent matrix of intra-frame model representing frame i, Ai,jRepresents the adjacency matrix between frame i and frame j, with 0 representing the zero matrix;
calculated graph laplacian matrix: l ═ D-AtotalD is a metric matrix consisting of diagonal angles
Figure FDA0002739921520000023
Is shown as ai,jRepresenting the weight assigned to the edge connecting vertex i and vertex j.
The change in bone was modeled by the graph laplacian matrix: the Laplace matrix L is a high-pass operator that is used to capture changes in the underlying signal, for any signal x ∈ RNIt satisfies:
Figure FDA0002739921520000024
where (Lx) (i) represents the ith component of Lx. N is a radical ofiIs the set of vertices connected to i.
6. The human body interaction recognition method based on the skeleton feature and the slice recurrent neural network as claimed in claim 5, further comprising: and (3) realizing spectrogram convolution by adopting a Chebyshev polynomial:
Figure FDA0002739921520000025
in the formula (I), the compound is shown in the specification,
Figure FDA0002739921520000026
is the defined symmetric normalized graph laplacian. ThetakDenotes the k-th Chebyshev coefficient, gθRepresenting the convolution kernel, K the order of chebyshev,
Figure FDA0002739921520000031
is a Chebyshev polynomial of order k consisting of
Figure FDA0002739921520000032
Is obtained by repeated calculation, wherein
Figure FDA0002739921520000033
Thus, the spectrogram convolution equation is defined as:
Figure FDA0002739921520000034
in the formula (I), the compound is shown in the specification,
Figure FDA0002739921520000035
is a weight parameter θ 'to be learned from the network'kThe matrix of (a) is,
Figure FDA0002739921520000036
represents WkIs determined by the dimensions of two adjacent connection layers, where F1、F2Respectively, the dimensions of the connection layer are indicated,
Figure FDA0002739921520000037
w in the formula of integral expressionkB is the bias and ReLU is the activation function.
7. The human body interaction recognition method based on the skeleton feature and the slice recurrent neural network as claimed in claim 1, wherein the implementation of S3 includes:
s3.1: t is the sequence length, and the input X is divided into N subsequences of equal length, so that the length T of each subsequence N is:
Figure FDA0002739921520000038
whereby the input is represented as X ═ N1,N2,...Nn]In which N isp=[x(p-1)*t+1,x(p-1)*t+2,...xp*t]X comprises a T frame image feature sequence;
s3.2: slicing each subsequence N into N subsequences with equal length again, repeating the slicing operation for k times until a proper minimum subsequence length exists at the bottom layer, and slicing for k times to obtain a k +1 layer;
s3.3: at the 0 th layer, the recursion unit acts on each minimum subsequence through the connection structure to obtain the last hidden state of each minimum subsequence at the 0 th layer and uses the last hidden state as the input of the first-layer parent sequence, then uses the last hidden state of each subsequence at the (p-1) th layer as the input of the parent sequence at the p th layer, and calculates the last hidden state of the subsequence at the p th layer, namely:
Figure FDA0002739921520000039
Figure FDA00027399215200000310
wherein
Figure FDA00027399215200000311
An implicit representation of the l-th sub-sequence representing the p-th layer; mss denotes the smallest subsequence at level 0,/0Denotes the minimum subsequence length of layer 0,/pIndicates the length of the sub-sequence of the p-th layer,
Figure FDA00027399215200000312
representing the forward calculation process of a p-th layer GRU unit;
s3.4: applying a plurality of gate control loop unit (gru) (gate recovery unit) network models to each layer of subsequence until a final hidden state F of the top layer (k layer) is obtained, that is:
Figure FDA0002739921520000041
s3.5: and performing motion classification on the extracted video features F.
8. The human body interaction recognition method based on the skeleton feature and the slice recurrent neural network as claimed in claim 7, wherein the classification is implemented according to the following formula:
p=softmax(WFF+bF)
softmax is a normalized exponential function, WFAs a weight matrix, bFIs the bias term.
CN202011146588.3A 2020-10-23 2020-10-23 Human interactive action recognition method based on skeleton characteristics and slicing recurrent neural network Active CN112329562B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011146588.3A CN112329562B (en) 2020-10-23 2020-10-23 Human interactive action recognition method based on skeleton characteristics and slicing recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011146588.3A CN112329562B (en) 2020-10-23 2020-10-23 Human interactive action recognition method based on skeleton characteristics and slicing recurrent neural network

Publications (2)

Publication Number Publication Date
CN112329562A true CN112329562A (en) 2021-02-05
CN112329562B CN112329562B (en) 2024-05-14

Family

ID=74310929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011146588.3A Active CN112329562B (en) 2020-10-23 2020-10-23 Human interactive action recognition method based on skeleton characteristics and slicing recurrent neural network

Country Status (1)

Country Link
CN (1) CN112329562B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792712A (en) * 2021-11-15 2021-12-14 长沙海信智能系统研究院有限公司 Action recognition method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796110A (en) * 2019-11-05 2020-02-14 西安电子科技大学 Human behavior identification method and system based on graph convolution network
CN111353447A (en) * 2020-03-05 2020-06-30 辽宁石油化工大学 Human skeleton behavior identification method based on graph convolution network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796110A (en) * 2019-11-05 2020-02-14 西安电子科技大学 Human behavior identification method and system based on graph convolution network
CN111353447A (en) * 2020-03-05 2020-06-30 辽宁石油化工大学 Human skeleton behavior identification method based on graph convolution network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792712A (en) * 2021-11-15 2021-12-14 长沙海信智能系统研究院有限公司 Action recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112329562B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN107609460B (en) Human body behavior recognition method integrating space-time dual network flow and attention mechanism
CN106407889B (en) Method for recognizing human body interaction in video based on optical flow graph deep learning model
CN111339942B (en) Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment
CN108133188A (en) A kind of Activity recognition method based on motion history image and convolutional neural networks
CN105844627B (en) A kind of sea-surface target image background suppressing method based on convolutional neural networks
CN110472604B (en) Pedestrian and crowd behavior identification method based on video
CN110598598A (en) Double-current convolution neural network human behavior identification method based on finite sample set
CN110163127A (en) A kind of video object Activity recognition method from thick to thin
CN104615983A (en) Behavior identification method based on recurrent neural network and human skeleton movement sequences
CN111507182B (en) Skeleton point fusion cyclic cavity convolution-based littering behavior detection method
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN110135386B (en) Human body action recognition method and system based on deep learning
CN112508110A (en) Deep learning-based electrocardiosignal graph classification method
CN107169117B (en) Hand-drawn human motion retrieval method based on automatic encoder and DTW
CN110458235B (en) Motion posture similarity comparison method in video
CN111339847A (en) Face emotion recognition method based on graph convolution neural network
CN111931722B (en) Correlated filtering tracking method combining color ratio characteristics
CN110096976A (en) Human behavior micro-Doppler classification method based on sparse migration network
CN110348492A (en) A kind of correlation filtering method for tracking target based on contextual information and multiple features fusion
Zhang et al. A Gaussian mixture based hidden Markov model for motion recognition with 3D vision device
CN113505719A (en) Gait recognition model compression system and method based on local-integral joint knowledge distillation algorithm
CN112329562A (en) Human body interaction action recognition method based on skeleton features and slice recurrent neural network
Qi et al. Research on deep learning expression recognition algorithm based on multi-model fusion
CN112800908B (en) Method for establishing anxiety perception model based on individual gait analysis in video
CN112149613A (en) Motion estimation evaluation method based on improved LSTM model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant