CN111353447A - Human skeleton behavior identification method based on graph convolution network - Google Patents

Human skeleton behavior identification method based on graph convolution network Download PDF

Info

Publication number
CN111353447A
CN111353447A CN202010146319.0A CN202010146319A CN111353447A CN 111353447 A CN111353447 A CN 111353447A CN 202010146319 A CN202010146319 A CN 202010146319A CN 111353447 A CN111353447 A CN 111353447A
Authority
CN
China
Prior art keywords
skeleton
graph
joint
sequence
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010146319.0A
Other languages
Chinese (zh)
Other versions
CN111353447B (en
Inventor
曹江涛
赵挺
洪恺临
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Shihua University
Original Assignee
Liaoning Shihua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Shihua University filed Critical Liaoning Shihua University
Priority to CN202010146319.0A priority Critical patent/CN111353447B/en
Publication of CN111353447A publication Critical patent/CN111353447A/en
Application granted granted Critical
Publication of CN111353447B publication Critical patent/CN111353447B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A human skeleton behavior recognition method based on graph convolution network belongs to the field of computer vision and deep learning, and comprises the steps of obtaining a human skeleton video frame and carrying out normalization processing; constructing a human body joint internal dependence connection diagram, an individual external dependence connection diagram and an interactive dependence connection diagram corresponding to each frame diagram; obtaining all joint connection graphs of the interactive whole body; distributing weight values to edges of each connection graph of the human joints; performing graph convolution operation to obtain the spatial characteristics of the skeleton sequence; and performing time sequence modeling by using the long and short period memory network to obtain the corresponding category of the interactive behavior. The invention can learn the basic human behavior characteristics by the internal dependence connection edge, can learn the additional behavior characteristics by the external dependence connection edge, can better learn the interaction relationship between two persons by the interaction dependence connection edge, and can better represent the motion relationship of double interaction behaviors, thereby improving the identification performance.

Description

Human skeleton behavior identification method based on graph convolution network
Technical Field
The invention belongs to the technical field of computer vision and deep learning, and particularly relates to a human skeleton behavior identification method based on a graph convolution network.
Background
The human behavior recognition and understanding based on the video is the leading edge direction which is concerned by people in the field of image processing and computer vision, and along with the technical fusion and development of deep learning and computer vision, the behavior recognition is widely applied to the fields of video analysis, intelligent monitoring, man-machine interaction, augmented reality, video retrieval and the like. Compared with single-person actions, double-person interaction behaviors are more common and difficult in daily life. Double interaction behavior is mainly divided into RGB-based and skeleton joint point data-based studies. The traditional RGB video has poor robustness due to factors such as illumination change, shielding, complex background and the like. And the compact three-dimensional positions of main body joints are contained on the basis of the skeleton joint point data, so that the method is robust to changes of viewpoints, body dimensions and movement speed. Therefore, behavior recognition based on skeletal joint point data has received increasing attention in recent years.
The method for identifying the double interaction behavior based on the skeleton joint points is mainly divided into two categories based on manual characteristics and deep learning methods. For the first class, Vemulapalli[1]Et al represent the human skeleton as a point in the Lie group and implement temporal modeling and classification in Lie algebra. Weng[2]Et al extend the Naive Bayes Nearest Neighbor (NBNN) method to space-time and classify behaviors using phase-to-class distance. The method has complex characteristic design cost ratio and difficult further improvement of identification accuracy. Methods based on deep learning features can be further divided into CNN-based models and RNN-based models. For the CNN-based method, the joint data is converted into pictures and then sent into the network for learning classification. Such methods ignore timing information in the video. For the RNN-based method, the time sequence information can be effectively modeled, but the dependency between the joint points and the interaction relation of two persons are ignored. (see [1 ] for details]Raviteja Vemulapalli,Felipe Arrate,andRama Chellappa.Human action recognition by representing 3dskeletons as points in a lie group.In CVPR,pages 588–595,2014.[2]Junwu Weng,Chaoqun Weng,and Junsong Yuan.Spatiotemporal naive-bayes nearest-neighbor forskeleton-based action recognition.In CVPR,pages 4171–4180,2017.)。
Recently, with the popular application of Graph Convolutional Network (GCN), many researchers have conducted experiments using the GCN method in the field of behavior recognition. However, the current research is mainly directed to single-person behaviors and mostly adopts a natural connection diagram of a human body, and the dependence between unnatural connection joints of the human body is ignored. In the existing double-person interactive application, two persons are divided into two individuals to be modeled respectively, and the interactive dependency relationship between the two persons is ignored.
Disclosure of Invention
Aiming at the problems and the defects in the prior art, the invention provides a double interaction behavior recognition method based on a graph convolution network, wherein the recognition method comprises the steps of obtaining a double interaction framework video; carrying out normalization processing on the joint point coordinates of the obtained video; constructing a human body joint internal dependency graph, an individual external dependency graph and an interaction dependency graph; distributing different weights for the connecting edges of the three joint connecting graphs; sending the data into a graph convolution network for learning and extracting spatial features; sending the spatial characteristics obtained by each frame into a long-term and short-term memory network for time sequence modeling; and obtaining the identification result of the interactive behavior category.
The method specifically comprises the following steps:
step S10, capturing a video: and starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executors as training videos of the interactive actions, marking interactive action meanings of various training videos, and establishing a video training set.
And step S20, performing normalization processing on the preset video frames in the acquired skeleton video to obtain a skeleton sequence to be identified.
Step S30, for each frame of image in the skeleton sequence to be recognized, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein the joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of the skeleton sequence to be recognized together by the three;
step S40, respectively distributing weights for the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized to obtain corresponding human joint connection graphs with different weight values;
step S50, carrying out image convolution operation on the human body joint connection image with different weight values corresponding to each frame of the skeleton sequence to be recognized to obtain the spatial characteristics of the skeleton sequence to be recognized;
and step S60, performing time sequence modeling on the time dimension based on the spatial characteristics of the skeleton sequence to be recognized, and obtaining the behavior category of the skeleton sequence to be recognized.
Further, "normalize the preset video frame in the obtained skeleton video and then use it as the skeleton sequence to be identified", the method is as follows:
step S11, sampling the obtained original skeleton video at preset equal intervals as training and recognizing skeleton sequences;
step S12, performing rotation, translation and scale normalization processing on the coordinates of each frame of joint points in the obtained skeleton sequence to obtain a skeleton sequence to be identified, wherein the specific method comprises the following steps:
Figure BDA0002400860820000031
wherein
Figure BDA0002400860820000032
The ith coordinate value for the original acquisition T frame, J and T representing the set of the joint point and acquisition frame,
Figure BDA0002400860820000033
the processed coordinate values;
rotation matrix R and rotation origin oRIs defined as:
Figure BDA0002400860820000034
Figure BDA0002400860820000035
wherein v is1And v2Is a vector perpendicular to the ground and the difference vector of the left hip joint and the right hip joint of the initial skeleton in each sequence,
Figure BDA0002400860820000036
and v1×v2Respectively represent v1And v2The projection of the vector above and the outer product of these two vectors,
Figure BDA0002400860820000037
and
Figure BDA0002400860820000038
the coordinates of the left and right hip joints of the initial skeleton of each sequence are represented.
Further, for each frame of image in the skeleton sequence to be identified, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; the method comprises the following steps of constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of a skeleton sequence to be recognized together by the three parts, wherein the method comprises the following steps:
regarding each frame of double interaction as a whole to construct a G (x, W) graph, performing human body modeling on each frame, wherein
Figure BDA0002400860820000039
Three-dimensional coordinates containing 2N joints, W is a 2N × 2N weighted adjacency matrix:
Figure BDA00024008608200000310
(w1,2)mny, a first human joint point m and a second human joint point n
α, γ represents the weight of the corresponding intrinsic dependency, extrinsic dependency and interactive dependency, respectively.
Further, the method comprises the following steps of respectively distributing weights to the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized to obtain corresponding human joint connection graphs with different weight values:
α is 3, β is 1, γ is 5 to emphasize the internal connection, and the external connection is added to emphasize the inter-connection.
Further, "the human body joint connection diagram with different weight values corresponding to each frame diagram of the skeleton sequence to be identified is subjected to diagram convolution operation to obtain the spatial characteristics of the skeleton sequence to be identified", and the method comprises the following steps:
Figure BDA0002400860820000041
wherein, represents graph convolution operation;
Figure BDA0002400860820000042
representing the graph convolution kernel. W is the weighted adjacency matrix of the human body articulation graph.
The specific graph convolution kernel is calculated as follows: graph laplacian normalization over the spectral domain: l ═ In-D-1/2WD-1/2Where D is a diagonal matrix, Dii=∑jwijScaling L to
Figure BDA0002400860820000043
To represent
Figure BDA0002400860820000044
Wherein λmaxIs the maximum eigenvalue of L, TkIs a chebyshev polynomial. The convolution operation can be expressed as:
Figure BDA0002400860820000045
here η∈ [ η ]01...,ηK-1]Is the training parameter, and K is the size of the graph convolution kernel.
Further, "based on the spatial feature of the skeleton sequence to be recognized, performing convolution operation in a time dimension to obtain a behavior category of the skeleton sequence to be recognized", the method includes:
and (3) for the spatial feature information of each frame obtained by the graph convolution operation, the spatial feature information is sent into a long-period memory network for time sequence modeling after being unfolded through a full connection layer, and softmax is adopted for classification so as to obtain a final interactive behavior classification result.
The invention has the advantages and effects that:
the double-person interactive behavior recognition method based on the graph convolution network comprises the steps of constructing a weighted joint connection graph added with a double-person interactive dependency relationship, obtaining a double-person interactive space characteristic with discriminability by adopting the graph convolution network, and sending the double-person interactive space characteristic into a long-period memory network to obtain a dynamic time relationship for modeling, so that the recognition precision is improved.
Drawings
FIG. 1 is a schematic flow chart diagram of a double-person interactive behavior recognition method based on graph convolution network according to the present invention;
FIG. 2 is a schematic diagram of the intra-articular dependency-connection diagram, the extra-articular dependency-connection diagram and the inter-connection diagram of the human body constructed by the present invention;
FIG. 3 is an algorithmic flow chart of the present invention;
FIG. 4 is a diagram of an LSTM module cell;
FIG. 5 is a confusion matrix of results of the test of the NTU RGB + D data set according to the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
The invention discloses a double interaction behavior recognition method based on a graph convolution network, which comprises the following steps:
step S10, capturing a video: and starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executors as training videos of the interactive actions, marking interactive action meanings of various training videos, and establishing a video training set.
And step S20, performing normalization processing on the preset video frames in the acquired skeleton video to obtain a skeleton sequence to be identified.
Step S30, for each frame of image in the skeleton sequence to be recognized, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein the joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of the skeleton sequence to be recognized together by the three;
step S40, respectively distributing weights for the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized to obtain corresponding human joint connection graphs with different weight values;
step S50, carrying out image convolution operation on the human body joint connection image with different weight values corresponding to each frame of the skeleton sequence to be recognized to obtain the spatial characteristics of the skeleton sequence to be recognized;
and step S60, performing time sequence modeling on the time dimension based on the spatial characteristics of the skeleton sequence to be recognized, and obtaining the behavior category of the skeleton sequence to be recognized.
In order to more clearly describe the double-person interactive behavior recognition method based on the graph convolution network, the following will expand the detailed description of the steps in the embodiment of the method of the present invention with reference to fig. 1.
Step S10, capturing a video: and starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executors as training videos of the interactive actions, marking interactive action meanings of various training videos, and establishing a video training set.
With the development of image processing technology, the microsoft Kinect camera can be directly adopted to obtain skeleton videos of two persons with interactive behaviors, and corresponding joint point data is stored.
And step S20, performing normalization processing on the preset video frames in the acquired skeleton video to obtain a skeleton sequence to be identified.
Due to the change of people and the change of visual angles in shooting, normalization processing is carried out on the shot people and the shot visual angles in a data processing stage, and the method specifically comprises the following steps:
Figure BDA0002400860820000061
wherein
Figure BDA0002400860820000062
The ith coordinate value for the original acquisition T frame, J and T representing the set of the joint point and acquisition frame,
Figure BDA0002400860820000063
the processed coordinate values;
rotation matrix R and rotation origin oRIs defined as:
Figure BDA0002400860820000064
Figure BDA0002400860820000065
wherein v is1And v2Is a vector perpendicular to the ground and the difference vector of the left hip joint and the right hip joint of the initial skeleton in each sequence,
Figure BDA0002400860820000066
and v1×v2Respectively represent v1And v2The projection of the vector above and the outer product of these two vectors,
Figure BDA0002400860820000067
and
Figure BDA0002400860820000068
the coordinates of the left and right hip joints of the initial skeleton of each sequence are represented.
Step S30, for each frame of image in the skeleton sequence to be recognized, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein the joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; the method comprises the following steps of constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection diagram of each frame of a skeleton sequence to be recognized together by the three parts, wherein the method comprises the following steps:
regarding each frame of double interaction as a whole to construct a G (x, W) graph, performing human body modeling on each frame, wherein
Figure BDA0002400860820000069
Three-dimensional coordinates containing 2N joints, W is a 2N × 2N weighted adjacency matrix:
Figure BDA00024008608200000610
(w1,2)mny, a first human joint point m and a second human joint point n
α, γ represents the weight of the corresponding intrinsic dependency, extrinsic dependency and interactive dependency, respectively.
Step S40, respectively distributing weights for the edges of three kinds of joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized, and obtaining corresponding human body joint connection graphs with different weight values:
the weights are assigned, α -3, β -1, γ -5 to emphasize the internal connection, and to attach external connections, highlighting the inter-connection.
Step S50, carrying out graph convolution operation on the human body joint connection graph with different weight values corresponding to each frame graph of the skeleton sequence to be recognized, and obtaining the spatial characteristics of the skeleton sequence to be recognized:
given a video of T frames, constructing a graph [ G ] according to the method of claim 31,G2,...,GT]For each graph G constructed at t framesTIt is input into the graph volume layer:
Figure BDA0002400860820000071
wherein, represents graph convolution operation;
Figure BDA0002400860820000072
representing the graph convolution kernel. W is the weighted adjacency matrix of the human body articulation graph.
The specific graph convolution kernel is calculated as follows:
graph laplacian normalization over the spectral domain: l ═ In-D-1/2WD-1/2Where D is a diagonal matrix, Dii=∑jwijScaling L to
Figure BDA0002400860820000073
To represent
Figure BDA0002400860820000074
Wherein λmaxIs the maximum eigenvalue of L, TkIs a chebyshev polynomial. The convolution operation can be expressed as:
Figure BDA0002400860820000075
here η∈ [ η ]01...,ηK-1]Is the training parameter, and K is the size of the graph convolution kernel.
Step S60, performing convolution operation in the time dimension based on the spatial features of the skeleton sequence to be recognized, to obtain a behavior category of the skeleton sequence to be recognized:
spatial feature information f of each frame obtained by image convolution operationtAnd after the full connection layer is expanded, sending the full connection layer into a long-term and short-term memory network for time sequence modeling, and classifying by adopting softmax to obtain a final interactive behavior recognition result.
A dataset of the present authentication algorithm is presented. The NTU RGB + D data set is the largest current behavior recognition data set based on a framework, has 56000 sequences and 400 ten thousand frames, has 60 types of actions, has 25 joint points for each framework, and relates to single-person actions and double-person actions. This embodiment will use 11 kinds of double interaction behaviors in NTU RGB + D as the data set.
The protocols of this method of evaluating a data set are of two types: cross subjects (CS, cross-subjects) and cross-views (CV, cross-views). The proposed method is evaluated herein using CV criteria.
Camera No. 2, 3 captures data for training and camera No. 1 captures data for testing, according to CV evaluation criteria. The final recognition rate is 88%, and a remarkable recognition effect is achieved. The confusion matrix is shown in fig. 4.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (7)

1. A human skeleton behavior identification method based on a graph convolution network is characterized in that: the identification method comprises the steps of obtaining a double-person interactive skeleton video; carrying out normalization processing on the joint point coordinates of the obtained video; constructing a human body joint internal dependency graph, an individual external dependency graph and an interaction dependency graph; distributing different weights for the connecting edges of the three joint connecting graphs; sending the data into a graph convolution network for learning and extracting spatial features; sending the spatial characteristics obtained by each frame into a long-term and short-term memory network for time sequence modeling; and obtaining the identification result of the interactive behavior category.
2. The method for recognizing human skeleton behaviors based on the graph convolution network according to claim 1, wherein the method comprises the following steps: the identification method specifically comprises the following steps:
step S10, capturing a video: starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executors as training videos of the interactive actions, carrying out interactive action meaning marking on various training videos, and establishing a video training set;
step S20, normalizing the preset video frames in the obtained skeleton video to be used as a skeleton sequence to be identified;
step S30, for each frame of image in the skeleton sequence to be recognized, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein the joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of the skeleton sequence to be recognized together by the three;
step S40, respectively distributing weights for the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized to obtain corresponding human joint connection graphs with different weight values;
step S50, carrying out image convolution operation on the human body joint connection image with different weight values corresponding to each frame of the skeleton sequence to be recognized to obtain the spatial characteristics of the skeleton sequence to be recognized;
and step S60, performing time sequence modeling on the time dimension based on the spatial characteristics of the skeleton sequence to be recognized, and obtaining the behavior category of the skeleton sequence to be recognized.
3. The method for recognizing human skeleton behaviors based on the graph convolution network as claimed in claim 2, wherein the method comprises the following steps: in the step S20, "normalize the preset video frames in the obtained skeleton video to obtain a skeleton sequence to be recognized", the method includes:
step S11, sampling the obtained original skeleton video at preset equal intervals as training and recognizing skeleton sequences;
step S12, performing rotation, translation and scale normalization processing on the coordinates of each frame of joint points in the obtained skeleton sequence to obtain a skeleton sequence to be identified, wherein the specific method comprises the following steps:
Figure FDA0002400860810000021
wherein
Figure FDA0002400860810000022
The ith coordinate value for the original acquisition T frame, J and T representing the set of the joint point and acquisition frame,
Figure FDA0002400860810000023
the processed coordinate values;
rotation matrix R and rotation origin oRIs defined as:
Figure FDA0002400860810000024
Figure FDA0002400860810000025
wherein v is1And v2Is a vector perpendicular to the ground and the difference vector of the left hip joint and the right hip joint of the initial skeleton in each sequence,
Figure FDA0002400860810000026
and v1×v2Respectively represent v1And v2The projection of the vector above and the outer product of these two vectors,
Figure FDA0002400860810000027
and
Figure FDA0002400860810000028
the coordinates of the left and right hip joints of the initial skeleton of each sequence are represented.
4. The method for recognizing human body skeleton behavior based on graph convolution network as claimed in claim 2, wherein in step S30, "for each frame of graph in the skeleton sequence to be recognized, a corresponding human body joint intrinsic dependence connection graph is constructed according to joint point coordinates, joint points are nodes of the graph, and natural connections between joint points are intrinsic dependence connection edges of the graph; the method comprises the following steps of constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of a skeleton sequence to be recognized together by the three parts, wherein the method comprises the following steps:
regarding each frame of double interaction as a whole to construct a G (x, W) graph, performing human body modeling on each frame, wherein
Figure FDA0002400860810000029
Three-dimensional coordinates containing 2N joints, W is a 2N × 2N weighted adjacency matrix:
Figure FDA00024008608100000210
(w1,2)mny, a first human joint point m and a second human joint point n
α, γ represents the weight of the corresponding intrinsic dependency, extrinsic dependency and interactive dependency, respectively.
5. The method according to claim 2, wherein in step S40, "assigning weights to the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be identified to obtain corresponding human joint connection graphs with different weight values" respectively includes:
α is 3, β is 1, γ is 5 to emphasize the internal connection, and the external connection is added to emphasize the inter-connection.
6. The method for recognizing human body skeleton behavior based on graph convolution network according to claim 2, wherein in step S50, "performing graph convolution operation on human body articulation graphs with different weight values corresponding to each frame graph of the skeleton sequence to be recognized to obtain spatial features of the skeleton sequence to be recognized" includes:
given a video of T frames, construct graph [ G1,G2,...,GT]For each ofA graph G constructed at t frameTIt is input into the graph volume layer:
Figure FDA0002400860810000031
wherein, represents graph convolution operation;
Figure FDA0002400860810000032
representing a graph convolution kernel, W is a weighted adjacency matrix of the human body articulation graph;
the specific graph convolution kernel is calculated as follows:
graph laplacian normalization over the spectral domain: l ═ In-D-1/2WD-1/2Where D is a diagonal matrix, Dii=∑jwijScaling L to
Figure FDA0002400860810000033
To represent
Figure FDA0002400860810000034
Wherein λmaxIs the maximum eigenvalue of L, TkFor chebyshev polynomials, the convolution operation can be expressed as:
Figure FDA0002400860810000035
here η∈ [ η ]01...,ηK-1]Is the training parameter, and K is the size of the graph convolution kernel.
7. The method for recognizing human body skeleton behavior based on graph convolution network according to claim 2, wherein in step S60, "performing convolution operation in time dimension based on spatial feature of the skeleton sequence to be recognized to obtain behavior category of the skeleton sequence to be recognized" includes:
spatial feature information f of each frame obtained by image convolution operationtAfter being unfolded through the full connecting layer, the long-term and short-term memory net is sent intoAnd performing time sequence modeling, and classifying by adopting softmax to obtain a final interactive behavior recognition result.
CN202010146319.0A 2020-03-05 2020-03-05 Human skeleton behavior recognition method based on graph convolution network Active CN111353447B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010146319.0A CN111353447B (en) 2020-03-05 2020-03-05 Human skeleton behavior recognition method based on graph convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010146319.0A CN111353447B (en) 2020-03-05 2020-03-05 Human skeleton behavior recognition method based on graph convolution network

Publications (2)

Publication Number Publication Date
CN111353447A true CN111353447A (en) 2020-06-30
CN111353447B CN111353447B (en) 2023-07-04

Family

ID=71194272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010146319.0A Active CN111353447B (en) 2020-03-05 2020-03-05 Human skeleton behavior recognition method based on graph convolution network

Country Status (1)

Country Link
CN (1) CN111353447B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329562A (en) * 2020-10-23 2021-02-05 江苏大学 Human body interaction action recognition method based on skeleton features and slice recurrent neural network
CN112668550A (en) * 2021-01-18 2021-04-16 沈阳航空航天大学 Double-person interaction behavior recognition method based on joint point-depth joint attention RGB modal data
CN113128425A (en) * 2021-04-23 2021-07-16 上海对外经贸大学 Semantic self-adaptive graph network method for human action recognition based on skeleton sequence
CN113283400A (en) * 2021-07-19 2021-08-20 成都考拉悠然科技有限公司 Skeleton action identification method based on selective hypergraph convolutional network
CN113792712A (en) * 2021-11-15 2021-12-14 长沙海信智能系统研究院有限公司 Action recognition method, device, equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150186713A1 (en) * 2013-12-31 2015-07-02 Konica Minolta Laboratory U.S.A., Inc. Method and system for emotion and behavior recognition
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN107301370A (en) * 2017-05-08 2017-10-27 上海大学 A kind of body action identification method based on Kinect three-dimensional framework models
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study
CN108764107A (en) * 2018-05-23 2018-11-06 中国科学院自动化研究所 Behavior based on human skeleton sequence and identity combination recognition methods and device
CN109376720A (en) * 2018-12-19 2019-02-22 杭州电子科技大学 Classification of motion method based on artis space-time simple cycle network and attention mechanism
CN110045823A (en) * 2019-03-12 2019-07-23 北京邮电大学 A kind of action director's method and apparatus based on motion capture
US20190251340A1 (en) * 2018-02-15 2019-08-15 Wrnch Inc. Method and system for activity classification
CN110197195A (en) * 2019-04-15 2019-09-03 深圳大学 A kind of novel deep layer network system and method towards Activity recognition
CN110222611A (en) * 2019-05-27 2019-09-10 中国科学院自动化研究所 Human skeleton Activity recognition method, system, device based on figure convolutional network
US20200042776A1 (en) * 2018-08-03 2020-02-06 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for recognizing body movement

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150186713A1 (en) * 2013-12-31 2015-07-02 Konica Minolta Laboratory U.S.A., Inc. Method and system for emotion and behavior recognition
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN107301370A (en) * 2017-05-08 2017-10-27 上海大学 A kind of body action identification method based on Kinect three-dimensional framework models
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study
US20190251340A1 (en) * 2018-02-15 2019-08-15 Wrnch Inc. Method and system for activity classification
CN108764107A (en) * 2018-05-23 2018-11-06 中国科学院自动化研究所 Behavior based on human skeleton sequence and identity combination recognition methods and device
US20200042776A1 (en) * 2018-08-03 2020-02-06 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for recognizing body movement
CN109376720A (en) * 2018-12-19 2019-02-22 杭州电子科技大学 Classification of motion method based on artis space-time simple cycle network and attention mechanism
CN110045823A (en) * 2019-03-12 2019-07-23 北京邮电大学 A kind of action director's method and apparatus based on motion capture
CN110197195A (en) * 2019-04-15 2019-09-03 深圳大学 A kind of novel deep layer network system and method towards Activity recognition
CN110222611A (en) * 2019-05-27 2019-09-10 中国科学院自动化研究所 Human skeleton Activity recognition method, system, device based on figure convolutional network

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CHENYANG SI等: "An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition", ARXIV *
LEI SHI等: "Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks", ARXIV *
吴潇颖等: "基于CNN与双向LSTM的行为识别算法", 计算机工程与设计, no. 02 *
曹江涛等: "基于整体和个体分割融合的双人交互行为识别", 辽宁石油化工大学学报, vol. 39, no. 06 *
董安等: "基于图卷积的骨架行为识别", 现代计算机, no. 02 *
韩丽丽: "基于LSTM的人体行为识别方法研究", pages 1 - 3 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329562A (en) * 2020-10-23 2021-02-05 江苏大学 Human body interaction action recognition method based on skeleton features and slice recurrent neural network
CN112329562B (en) * 2020-10-23 2024-05-14 江苏大学 Human interactive action recognition method based on skeleton characteristics and slicing recurrent neural network
CN112668550A (en) * 2021-01-18 2021-04-16 沈阳航空航天大学 Double-person interaction behavior recognition method based on joint point-depth joint attention RGB modal data
CN112668550B (en) * 2021-01-18 2023-12-19 沈阳航空航天大学 Double interaction behavior recognition method based on joint point-depth joint attention RGB modal data
CN113128425A (en) * 2021-04-23 2021-07-16 上海对外经贸大学 Semantic self-adaptive graph network method for human action recognition based on skeleton sequence
CN113283400A (en) * 2021-07-19 2021-08-20 成都考拉悠然科技有限公司 Skeleton action identification method based on selective hypergraph convolutional network
CN113283400B (en) * 2021-07-19 2021-11-12 成都考拉悠然科技有限公司 Skeleton action identification method based on selective hypergraph convolutional network
CN113792712A (en) * 2021-11-15 2021-12-14 长沙海信智能系统研究院有限公司 Action recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111353447B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN106919920B (en) Scene recognition method based on convolution characteristics and space vision bag-of-words model
CN111353447B (en) Human skeleton behavior recognition method based on graph convolution network
CN108460356B (en) Face image automatic processing system based on monitoring system
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
Berretti et al. Representation, analysis, and recognition of 3D humans: A survey
CN108596102B (en) RGB-D-based indoor scene object segmentation classifier construction method
CN109086754A (en) A kind of human posture recognition method based on deep learning
CN111339942A (en) Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment
WO2021218238A1 (en) Image processing method and image processing apparatus
CN110458235B (en) Motion posture similarity comparison method in video
CN111914643A (en) Human body action recognition method based on skeleton key point detection
Kovač et al. Frame–based classification for cross-speed gait recognition
CN109815923B (en) Needle mushroom head sorting and identifying method based on LBP (local binary pattern) features and deep learning
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN114419732A (en) HRNet human body posture identification method based on attention mechanism optimization
CN114332911A (en) Head posture detection method and device and computer equipment
CN112906520A (en) Gesture coding-based action recognition method and device
CN114170537A (en) Multi-mode three-dimensional visual attention prediction method and application thereof
CN114663807A (en) Smoking behavior detection method based on video analysis
CN110348395B (en) Skeleton behavior identification method based on space-time relationship
CN112149528A (en) Panorama target detection method, system, medium and equipment
CN114330535A (en) Pattern classification method for learning based on support vector regularization dictionary
CN112270228A (en) Pedestrian re-identification method based on DCCA fusion characteristics
CN113011506A (en) Texture image classification method based on depth re-fractal spectrum network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant