CN111353447A - Human skeleton behavior identification method based on graph convolution network - Google Patents
Human skeleton behavior identification method based on graph convolution network Download PDFInfo
- Publication number
- CN111353447A CN111353447A CN202010146319.0A CN202010146319A CN111353447A CN 111353447 A CN111353447 A CN 111353447A CN 202010146319 A CN202010146319 A CN 202010146319A CN 111353447 A CN111353447 A CN 111353447A
- Authority
- CN
- China
- Prior art keywords
- skeleton
- graph
- joint
- sequence
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A human skeleton behavior recognition method based on graph convolution network belongs to the field of computer vision and deep learning, and comprises the steps of obtaining a human skeleton video frame and carrying out normalization processing; constructing a human body joint internal dependence connection diagram, an individual external dependence connection diagram and an interactive dependence connection diagram corresponding to each frame diagram; obtaining all joint connection graphs of the interactive whole body; distributing weight values to edges of each connection graph of the human joints; performing graph convolution operation to obtain the spatial characteristics of the skeleton sequence; and performing time sequence modeling by using the long and short period memory network to obtain the corresponding category of the interactive behavior. The invention can learn the basic human behavior characteristics by the internal dependence connection edge, can learn the additional behavior characteristics by the external dependence connection edge, can better learn the interaction relationship between two persons by the interaction dependence connection edge, and can better represent the motion relationship of double interaction behaviors, thereby improving the identification performance.
Description
Technical Field
The invention belongs to the technical field of computer vision and deep learning, and particularly relates to a human skeleton behavior identification method based on a graph convolution network.
Background
The human behavior recognition and understanding based on the video is the leading edge direction which is concerned by people in the field of image processing and computer vision, and along with the technical fusion and development of deep learning and computer vision, the behavior recognition is widely applied to the fields of video analysis, intelligent monitoring, man-machine interaction, augmented reality, video retrieval and the like. Compared with single-person actions, double-person interaction behaviors are more common and difficult in daily life. Double interaction behavior is mainly divided into RGB-based and skeleton joint point data-based studies. The traditional RGB video has poor robustness due to factors such as illumination change, shielding, complex background and the like. And the compact three-dimensional positions of main body joints are contained on the basis of the skeleton joint point data, so that the method is robust to changes of viewpoints, body dimensions and movement speed. Therefore, behavior recognition based on skeletal joint point data has received increasing attention in recent years.
The method for identifying the double interaction behavior based on the skeleton joint points is mainly divided into two categories based on manual characteristics and deep learning methods. For the first class, Vemulapalli[1]Et al represent the human skeleton as a point in the Lie group and implement temporal modeling and classification in Lie algebra. Weng[2]Et al extend the Naive Bayes Nearest Neighbor (NBNN) method to space-time and classify behaviors using phase-to-class distance. The method has complex characteristic design cost ratio and difficult further improvement of identification accuracy. Methods based on deep learning features can be further divided into CNN-based models and RNN-based models. For the CNN-based method, the joint data is converted into pictures and then sent into the network for learning classification. Such methods ignore timing information in the video. For the RNN-based method, the time sequence information can be effectively modeled, but the dependency between the joint points and the interaction relation of two persons are ignored. (see [1 ] for details]Raviteja Vemulapalli,Felipe Arrate,andRama Chellappa.Human action recognition by representing 3dskeletons as points in a lie group.In CVPR,pages 588–595,2014.[2]Junwu Weng,Chaoqun Weng,and Junsong Yuan.Spatiotemporal naive-bayes nearest-neighbor forskeleton-based action recognition.In CVPR,pages 4171–4180,2017.)。
Recently, with the popular application of Graph Convolutional Network (GCN), many researchers have conducted experiments using the GCN method in the field of behavior recognition. However, the current research is mainly directed to single-person behaviors and mostly adopts a natural connection diagram of a human body, and the dependence between unnatural connection joints of the human body is ignored. In the existing double-person interactive application, two persons are divided into two individuals to be modeled respectively, and the interactive dependency relationship between the two persons is ignored.
Disclosure of Invention
Aiming at the problems and the defects in the prior art, the invention provides a double interaction behavior recognition method based on a graph convolution network, wherein the recognition method comprises the steps of obtaining a double interaction framework video; carrying out normalization processing on the joint point coordinates of the obtained video; constructing a human body joint internal dependency graph, an individual external dependency graph and an interaction dependency graph; distributing different weights for the connecting edges of the three joint connecting graphs; sending the data into a graph convolution network for learning and extracting spatial features; sending the spatial characteristics obtained by each frame into a long-term and short-term memory network for time sequence modeling; and obtaining the identification result of the interactive behavior category.
The method specifically comprises the following steps:
step S10, capturing a video: and starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executors as training videos of the interactive actions, marking interactive action meanings of various training videos, and establishing a video training set.
And step S20, performing normalization processing on the preset video frames in the acquired skeleton video to obtain a skeleton sequence to be identified.
Step S30, for each frame of image in the skeleton sequence to be recognized, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein the joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of the skeleton sequence to be recognized together by the three;
step S40, respectively distributing weights for the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized to obtain corresponding human joint connection graphs with different weight values;
step S50, carrying out image convolution operation on the human body joint connection image with different weight values corresponding to each frame of the skeleton sequence to be recognized to obtain the spatial characteristics of the skeleton sequence to be recognized;
and step S60, performing time sequence modeling on the time dimension based on the spatial characteristics of the skeleton sequence to be recognized, and obtaining the behavior category of the skeleton sequence to be recognized.
Further, "normalize the preset video frame in the obtained skeleton video and then use it as the skeleton sequence to be identified", the method is as follows:
step S11, sampling the obtained original skeleton video at preset equal intervals as training and recognizing skeleton sequences;
step S12, performing rotation, translation and scale normalization processing on the coordinates of each frame of joint points in the obtained skeleton sequence to obtain a skeleton sequence to be identified, wherein the specific method comprises the following steps:
whereinThe ith coordinate value for the original acquisition T frame, J and T representing the set of the joint point and acquisition frame,the processed coordinate values;
rotation matrix R and rotation origin oRIs defined as:
wherein v is1And v2Is a vector perpendicular to the ground and the difference vector of the left hip joint and the right hip joint of the initial skeleton in each sequence,and v1×v2Respectively represent v1And v2The projection of the vector above and the outer product of these two vectors,andthe coordinates of the left and right hip joints of the initial skeleton of each sequence are represented.
Further, for each frame of image in the skeleton sequence to be identified, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; the method comprises the following steps of constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of a skeleton sequence to be recognized together by the three parts, wherein the method comprises the following steps:
regarding each frame of double interaction as a whole to construct a G (x, W) graph, performing human body modeling on each frame, whereinThree-dimensional coordinates containing 2N joints, W is a 2N × 2N weighted adjacency matrix:
(w1,2)mny, a first human joint point m and a second human joint point n
α, γ represents the weight of the corresponding intrinsic dependency, extrinsic dependency and interactive dependency, respectively.
Further, the method comprises the following steps of respectively distributing weights to the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized to obtain corresponding human joint connection graphs with different weight values:
α is 3, β is 1, γ is 5 to emphasize the internal connection, and the external connection is added to emphasize the inter-connection.
Further, "the human body joint connection diagram with different weight values corresponding to each frame diagram of the skeleton sequence to be identified is subjected to diagram convolution operation to obtain the spatial characteristics of the skeleton sequence to be identified", and the method comprises the following steps:
wherein, represents graph convolution operation;representing the graph convolution kernel. W is the weighted adjacency matrix of the human body articulation graph.
The specific graph convolution kernel is calculated as follows: graph laplacian normalization over the spectral domain: l ═ In-D-1/2WD-1/2Where D is a diagonal matrix, Dii=∑jwijScaling L toTo representWherein λmaxIs the maximum eigenvalue of L, TkIs a chebyshev polynomial. The convolution operation can be expressed as:
here η∈ [ η ]0,η1...,ηK-1]Is the training parameter, and K is the size of the graph convolution kernel.
Further, "based on the spatial feature of the skeleton sequence to be recognized, performing convolution operation in a time dimension to obtain a behavior category of the skeleton sequence to be recognized", the method includes:
and (3) for the spatial feature information of each frame obtained by the graph convolution operation, the spatial feature information is sent into a long-period memory network for time sequence modeling after being unfolded through a full connection layer, and softmax is adopted for classification so as to obtain a final interactive behavior classification result.
The invention has the advantages and effects that:
the double-person interactive behavior recognition method based on the graph convolution network comprises the steps of constructing a weighted joint connection graph added with a double-person interactive dependency relationship, obtaining a double-person interactive space characteristic with discriminability by adopting the graph convolution network, and sending the double-person interactive space characteristic into a long-period memory network to obtain a dynamic time relationship for modeling, so that the recognition precision is improved.
Drawings
FIG. 1 is a schematic flow chart diagram of a double-person interactive behavior recognition method based on graph convolution network according to the present invention;
FIG. 2 is a schematic diagram of the intra-articular dependency-connection diagram, the extra-articular dependency-connection diagram and the inter-connection diagram of the human body constructed by the present invention;
FIG. 3 is an algorithmic flow chart of the present invention;
FIG. 4 is a diagram of an LSTM module cell;
FIG. 5 is a confusion matrix of results of the test of the NTU RGB + D data set according to the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
The invention discloses a double interaction behavior recognition method based on a graph convolution network, which comprises the following steps:
step S10, capturing a video: and starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executors as training videos of the interactive actions, marking interactive action meanings of various training videos, and establishing a video training set.
And step S20, performing normalization processing on the preset video frames in the acquired skeleton video to obtain a skeleton sequence to be identified.
Step S30, for each frame of image in the skeleton sequence to be recognized, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein the joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of the skeleton sequence to be recognized together by the three;
step S40, respectively distributing weights for the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized to obtain corresponding human joint connection graphs with different weight values;
step S50, carrying out image convolution operation on the human body joint connection image with different weight values corresponding to each frame of the skeleton sequence to be recognized to obtain the spatial characteristics of the skeleton sequence to be recognized;
and step S60, performing time sequence modeling on the time dimension based on the spatial characteristics of the skeleton sequence to be recognized, and obtaining the behavior category of the skeleton sequence to be recognized.
In order to more clearly describe the double-person interactive behavior recognition method based on the graph convolution network, the following will expand the detailed description of the steps in the embodiment of the method of the present invention with reference to fig. 1.
Step S10, capturing a video: and starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executors as training videos of the interactive actions, marking interactive action meanings of various training videos, and establishing a video training set.
With the development of image processing technology, the microsoft Kinect camera can be directly adopted to obtain skeleton videos of two persons with interactive behaviors, and corresponding joint point data is stored.
And step S20, performing normalization processing on the preset video frames in the acquired skeleton video to obtain a skeleton sequence to be identified.
Due to the change of people and the change of visual angles in shooting, normalization processing is carried out on the shot people and the shot visual angles in a data processing stage, and the method specifically comprises the following steps:
whereinThe ith coordinate value for the original acquisition T frame, J and T representing the set of the joint point and acquisition frame,the processed coordinate values;
rotation matrix R and rotation origin oRIs defined as:
wherein v is1And v2Is a vector perpendicular to the ground and the difference vector of the left hip joint and the right hip joint of the initial skeleton in each sequence,and v1×v2Respectively represent v1And v2The projection of the vector above and the outer product of these two vectors,andthe coordinates of the left and right hip joints of the initial skeleton of each sequence are represented.
Step S30, for each frame of image in the skeleton sequence to be recognized, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein the joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; the method comprises the following steps of constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection diagram of each frame of a skeleton sequence to be recognized together by the three parts, wherein the method comprises the following steps:
regarding each frame of double interaction as a whole to construct a G (x, W) graph, performing human body modeling on each frame, whereinThree-dimensional coordinates containing 2N joints, W is a 2N × 2N weighted adjacency matrix:
(w1,2)mny, a first human joint point m and a second human joint point n
α, γ represents the weight of the corresponding intrinsic dependency, extrinsic dependency and interactive dependency, respectively.
Step S40, respectively distributing weights for the edges of three kinds of joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized, and obtaining corresponding human body joint connection graphs with different weight values:
the weights are assigned, α -3, β -1, γ -5 to emphasize the internal connection, and to attach external connections, highlighting the inter-connection.
Step S50, carrying out graph convolution operation on the human body joint connection graph with different weight values corresponding to each frame graph of the skeleton sequence to be recognized, and obtaining the spatial characteristics of the skeleton sequence to be recognized:
given a video of T frames, constructing a graph [ G ] according to the method of claim 31,G2,...,GT]For each graph G constructed at t framesTIt is input into the graph volume layer:
wherein, represents graph convolution operation;representing the graph convolution kernel. W is the weighted adjacency matrix of the human body articulation graph.
The specific graph convolution kernel is calculated as follows:
graph laplacian normalization over the spectral domain: l ═ In-D-1/2WD-1/2Where D is a diagonal matrix, Dii=∑jwijScaling L toTo representWherein λmaxIs the maximum eigenvalue of L, TkIs a chebyshev polynomial. The convolution operation can be expressed as:
here η∈ [ η ]0,η1...,ηK-1]Is the training parameter, and K is the size of the graph convolution kernel.
Step S60, performing convolution operation in the time dimension based on the spatial features of the skeleton sequence to be recognized, to obtain a behavior category of the skeleton sequence to be recognized:
spatial feature information f of each frame obtained by image convolution operationtAnd after the full connection layer is expanded, sending the full connection layer into a long-term and short-term memory network for time sequence modeling, and classifying by adopting softmax to obtain a final interactive behavior recognition result.
A dataset of the present authentication algorithm is presented. The NTU RGB + D data set is the largest current behavior recognition data set based on a framework, has 56000 sequences and 400 ten thousand frames, has 60 types of actions, has 25 joint points for each framework, and relates to single-person actions and double-person actions. This embodiment will use 11 kinds of double interaction behaviors in NTU RGB + D as the data set.
The protocols of this method of evaluating a data set are of two types: cross subjects (CS, cross-subjects) and cross-views (CV, cross-views). The proposed method is evaluated herein using CV criteria.
Camera No. 2, 3 captures data for training and camera No. 1 captures data for testing, according to CV evaluation criteria. The final recognition rate is 88%, and a remarkable recognition effect is achieved. The confusion matrix is shown in fig. 4.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
Claims (7)
1. A human skeleton behavior identification method based on a graph convolution network is characterized in that: the identification method comprises the steps of obtaining a double-person interactive skeleton video; carrying out normalization processing on the joint point coordinates of the obtained video; constructing a human body joint internal dependency graph, an individual external dependency graph and an interaction dependency graph; distributing different weights for the connecting edges of the three joint connecting graphs; sending the data into a graph convolution network for learning and extracting spatial features; sending the spatial characteristics obtained by each frame into a long-term and short-term memory network for time sequence modeling; and obtaining the identification result of the interactive behavior category.
2. The method for recognizing human skeleton behaviors based on the graph convolution network according to claim 1, wherein the method comprises the following steps: the identification method specifically comprises the following steps:
step S10, capturing a video: starting a camera, recording double interactive videos, collecting skeleton videos of various interactive actions of different action executors as training videos of the interactive actions, carrying out interactive action meaning marking on various training videos, and establishing a video training set;
step S20, normalizing the preset video frames in the obtained skeleton video to be used as a skeleton sequence to be identified;
step S30, for each frame of image in the skeleton sequence to be recognized, constructing a corresponding human body joint intrinsic dependence connection image according to joint point coordinates, wherein the joint points are nodes of the image, and natural connection among the joint points is an intrinsic dependence connection edge of the image; constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of the skeleton sequence to be recognized together by the three;
step S40, respectively distributing weights for the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be recognized to obtain corresponding human joint connection graphs with different weight values;
step S50, carrying out image convolution operation on the human body joint connection image with different weight values corresponding to each frame of the skeleton sequence to be recognized to obtain the spatial characteristics of the skeleton sequence to be recognized;
and step S60, performing time sequence modeling on the time dimension based on the spatial characteristics of the skeleton sequence to be recognized, and obtaining the behavior category of the skeleton sequence to be recognized.
3. The method for recognizing human skeleton behaviors based on the graph convolution network as claimed in claim 2, wherein the method comprises the following steps: in the step S20, "normalize the preset video frames in the obtained skeleton video to obtain a skeleton sequence to be recognized", the method includes:
step S11, sampling the obtained original skeleton video at preset equal intervals as training and recognizing skeleton sequences;
step S12, performing rotation, translation and scale normalization processing on the coordinates of each frame of joint points in the obtained skeleton sequence to obtain a skeleton sequence to be identified, wherein the specific method comprises the following steps:
whereinThe ith coordinate value for the original acquisition T frame, J and T representing the set of the joint point and acquisition frame,the processed coordinate values;
rotation matrix R and rotation origin oRIs defined as:
wherein v is1And v2Is a vector perpendicular to the ground and the difference vector of the left hip joint and the right hip joint of the initial skeleton in each sequence,and v1×v2Respectively represent v1And v2The projection of the vector above and the outer product of these two vectors,andthe coordinates of the left and right hip joints of the initial skeleton of each sequence are represented.
4. The method for recognizing human body skeleton behavior based on graph convolution network as claimed in claim 2, wherein in step S30, "for each frame of graph in the skeleton sequence to be recognized, a corresponding human body joint intrinsic dependence connection graph is constructed according to joint point coordinates, joint points are nodes of the graph, and natural connections between joint points are intrinsic dependence connection edges of the graph; the method comprises the following steps of constructing respective external dependence connecting edges of a single person and interactive dependence connecting edges of two persons, and forming a human body joint connection graph of each frame of a skeleton sequence to be recognized together by the three parts, wherein the method comprises the following steps:
regarding each frame of double interaction as a whole to construct a G (x, W) graph, performing human body modeling on each frame, whereinThree-dimensional coordinates containing 2N joints, W is a 2N × 2N weighted adjacency matrix:
(w1,2)mny, a first human joint point m and a second human joint point n
α, γ represents the weight of the corresponding intrinsic dependency, extrinsic dependency and interactive dependency, respectively.
5. The method according to claim 2, wherein in step S40, "assigning weights to the edges of three joint connection graphs corresponding to each frame graph of the skeleton sequence to be identified to obtain corresponding human joint connection graphs with different weight values" respectively includes:
α is 3, β is 1, γ is 5 to emphasize the internal connection, and the external connection is added to emphasize the inter-connection.
6. The method for recognizing human body skeleton behavior based on graph convolution network according to claim 2, wherein in step S50, "performing graph convolution operation on human body articulation graphs with different weight values corresponding to each frame graph of the skeleton sequence to be recognized to obtain spatial features of the skeleton sequence to be recognized" includes:
given a video of T frames, construct graph [ G1,G2,...,GT]For each ofA graph G constructed at t frameTIt is input into the graph volume layer:
wherein, represents graph convolution operation;representing a graph convolution kernel, W is a weighted adjacency matrix of the human body articulation graph;
the specific graph convolution kernel is calculated as follows:
graph laplacian normalization over the spectral domain: l ═ In-D-1/2WD-1/2Where D is a diagonal matrix, Dii=∑jwijScaling L toTo representWherein λmaxIs the maximum eigenvalue of L, TkFor chebyshev polynomials, the convolution operation can be expressed as:
7. The method for recognizing human body skeleton behavior based on graph convolution network according to claim 2, wherein in step S60, "performing convolution operation in time dimension based on spatial feature of the skeleton sequence to be recognized to obtain behavior category of the skeleton sequence to be recognized" includes:
spatial feature information f of each frame obtained by image convolution operationtAfter being unfolded through the full connecting layer, the long-term and short-term memory net is sent intoAnd performing time sequence modeling, and classifying by adopting softmax to obtain a final interactive behavior recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010146319.0A CN111353447B (en) | 2020-03-05 | 2020-03-05 | Human skeleton behavior recognition method based on graph convolution network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010146319.0A CN111353447B (en) | 2020-03-05 | 2020-03-05 | Human skeleton behavior recognition method based on graph convolution network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111353447A true CN111353447A (en) | 2020-06-30 |
CN111353447B CN111353447B (en) | 2023-07-04 |
Family
ID=71194272
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010146319.0A Active CN111353447B (en) | 2020-03-05 | 2020-03-05 | Human skeleton behavior recognition method based on graph convolution network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111353447B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329562A (en) * | 2020-10-23 | 2021-02-05 | 江苏大学 | Human body interaction action recognition method based on skeleton features and slice recurrent neural network |
CN112668550A (en) * | 2021-01-18 | 2021-04-16 | 沈阳航空航天大学 | Double-person interaction behavior recognition method based on joint point-depth joint attention RGB modal data |
CN113128425A (en) * | 2021-04-23 | 2021-07-16 | 上海对外经贸大学 | Semantic self-adaptive graph network method for human action recognition based on skeleton sequence |
CN113283400A (en) * | 2021-07-19 | 2021-08-20 | 成都考拉悠然科技有限公司 | Skeleton action identification method based on selective hypergraph convolutional network |
CN113792712A (en) * | 2021-11-15 | 2021-12-14 | 长沙海信智能系统研究院有限公司 | Action recognition method, device, equipment and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150186713A1 (en) * | 2013-12-31 | 2015-07-02 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for emotion and behavior recognition |
WO2017133009A1 (en) * | 2016-02-04 | 2017-08-10 | 广州新节奏智能科技有限公司 | Method for positioning human joint using depth image of convolutional neural network |
CN107301370A (en) * | 2017-05-08 | 2017-10-27 | 上海大学 | A kind of body action identification method based on Kinect three-dimensional framework models |
CN108304795A (en) * | 2018-01-29 | 2018-07-20 | 清华大学 | Human skeleton Activity recognition method and device based on deeply study |
CN108764107A (en) * | 2018-05-23 | 2018-11-06 | 中国科学院自动化研究所 | Behavior based on human skeleton sequence and identity combination recognition methods and device |
CN109376720A (en) * | 2018-12-19 | 2019-02-22 | 杭州电子科技大学 | Classification of motion method based on artis space-time simple cycle network and attention mechanism |
CN110045823A (en) * | 2019-03-12 | 2019-07-23 | 北京邮电大学 | A kind of action director's method and apparatus based on motion capture |
US20190251340A1 (en) * | 2018-02-15 | 2019-08-15 | Wrnch Inc. | Method and system for activity classification |
CN110197195A (en) * | 2019-04-15 | 2019-09-03 | 深圳大学 | A kind of novel deep layer network system and method towards Activity recognition |
CN110222611A (en) * | 2019-05-27 | 2019-09-10 | 中国科学院自动化研究所 | Human skeleton Activity recognition method, system, device based on figure convolutional network |
US20200042776A1 (en) * | 2018-08-03 | 2020-02-06 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for recognizing body movement |
-
2020
- 2020-03-05 CN CN202010146319.0A patent/CN111353447B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150186713A1 (en) * | 2013-12-31 | 2015-07-02 | Konica Minolta Laboratory U.S.A., Inc. | Method and system for emotion and behavior recognition |
WO2017133009A1 (en) * | 2016-02-04 | 2017-08-10 | 广州新节奏智能科技有限公司 | Method for positioning human joint using depth image of convolutional neural network |
CN107301370A (en) * | 2017-05-08 | 2017-10-27 | 上海大学 | A kind of body action identification method based on Kinect three-dimensional framework models |
CN108304795A (en) * | 2018-01-29 | 2018-07-20 | 清华大学 | Human skeleton Activity recognition method and device based on deeply study |
US20190251340A1 (en) * | 2018-02-15 | 2019-08-15 | Wrnch Inc. | Method and system for activity classification |
CN108764107A (en) * | 2018-05-23 | 2018-11-06 | 中国科学院自动化研究所 | Behavior based on human skeleton sequence and identity combination recognition methods and device |
US20200042776A1 (en) * | 2018-08-03 | 2020-02-06 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for recognizing body movement |
CN109376720A (en) * | 2018-12-19 | 2019-02-22 | 杭州电子科技大学 | Classification of motion method based on artis space-time simple cycle network and attention mechanism |
CN110045823A (en) * | 2019-03-12 | 2019-07-23 | 北京邮电大学 | A kind of action director's method and apparatus based on motion capture |
CN110197195A (en) * | 2019-04-15 | 2019-09-03 | 深圳大学 | A kind of novel deep layer network system and method towards Activity recognition |
CN110222611A (en) * | 2019-05-27 | 2019-09-10 | 中国科学院自动化研究所 | Human skeleton Activity recognition method, system, device based on figure convolutional network |
Non-Patent Citations (6)
Title |
---|
CHENYANG SI等: "An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition", ARXIV * |
LEI SHI等: "Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks", ARXIV * |
吴潇颖等: "基于CNN与双向LSTM的行为识别算法", 计算机工程与设计, no. 02 * |
曹江涛等: "基于整体和个体分割融合的双人交互行为识别", 辽宁石油化工大学学报, vol. 39, no. 06 * |
董安等: "基于图卷积的骨架行为识别", 现代计算机, no. 02 * |
韩丽丽: "基于LSTM的人体行为识别方法研究", pages 1 - 3 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329562A (en) * | 2020-10-23 | 2021-02-05 | 江苏大学 | Human body interaction action recognition method based on skeleton features and slice recurrent neural network |
CN112329562B (en) * | 2020-10-23 | 2024-05-14 | 江苏大学 | Human interactive action recognition method based on skeleton characteristics and slicing recurrent neural network |
CN112668550A (en) * | 2021-01-18 | 2021-04-16 | 沈阳航空航天大学 | Double-person interaction behavior recognition method based on joint point-depth joint attention RGB modal data |
CN112668550B (en) * | 2021-01-18 | 2023-12-19 | 沈阳航空航天大学 | Double interaction behavior recognition method based on joint point-depth joint attention RGB modal data |
CN113128425A (en) * | 2021-04-23 | 2021-07-16 | 上海对外经贸大学 | Semantic self-adaptive graph network method for human action recognition based on skeleton sequence |
CN113283400A (en) * | 2021-07-19 | 2021-08-20 | 成都考拉悠然科技有限公司 | Skeleton action identification method based on selective hypergraph convolutional network |
CN113283400B (en) * | 2021-07-19 | 2021-11-12 | 成都考拉悠然科技有限公司 | Skeleton action identification method based on selective hypergraph convolutional network |
CN113792712A (en) * | 2021-11-15 | 2021-12-14 | 长沙海信智能系统研究院有限公司 | Action recognition method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111353447B (en) | 2023-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344701B (en) | Kinect-based dynamic gesture recognition method | |
CN106919920B (en) | Scene recognition method based on convolution characteristics and space vision bag-of-words model | |
CN111353447B (en) | Human skeleton behavior recognition method based on graph convolution network | |
CN108460356B (en) | Face image automatic processing system based on monitoring system | |
CN108520226B (en) | Pedestrian re-identification method based on body decomposition and significance detection | |
Berretti et al. | Representation, analysis, and recognition of 3D humans: A survey | |
CN108596102B (en) | RGB-D-based indoor scene object segmentation classifier construction method | |
CN109086754A (en) | A kind of human posture recognition method based on deep learning | |
CN111339942A (en) | Method and system for recognizing skeleton action of graph convolution circulation network based on viewpoint adjustment | |
WO2021218238A1 (en) | Image processing method and image processing apparatus | |
CN110458235B (en) | Motion posture similarity comparison method in video | |
CN111914643A (en) | Human body action recognition method based on skeleton key point detection | |
Kovač et al. | Frame–based classification for cross-speed gait recognition | |
CN109815923B (en) | Needle mushroom head sorting and identifying method based on LBP (local binary pattern) features and deep learning | |
CN113011253B (en) | Facial expression recognition method, device, equipment and storage medium based on ResNeXt network | |
CN114419732A (en) | HRNet human body posture identification method based on attention mechanism optimization | |
CN114332911A (en) | Head posture detection method and device and computer equipment | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
CN114170537A (en) | Multi-mode three-dimensional visual attention prediction method and application thereof | |
CN114663807A (en) | Smoking behavior detection method based on video analysis | |
CN110348395B (en) | Skeleton behavior identification method based on space-time relationship | |
CN112149528A (en) | Panorama target detection method, system, medium and equipment | |
CN114330535A (en) | Pattern classification method for learning based on support vector regularization dictionary | |
CN112270228A (en) | Pedestrian re-identification method based on DCCA fusion characteristics | |
CN113011506A (en) | Texture image classification method based on depth re-fractal spectrum network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |