CN113158970B - Action identification method and system based on fast and slow dual-flow graph convolutional neural network - Google Patents

Action identification method and system based on fast and slow dual-flow graph convolutional neural network Download PDF

Info

Publication number
CN113158970B
CN113158970B CN202110510781.9A CN202110510781A CN113158970B CN 113158970 B CN113158970 B CN 113158970B CN 202110510781 A CN202110510781 A CN 202110510781A CN 113158970 B CN113158970 B CN 113158970B
Authority
CN
China
Prior art keywords
branch
fast
features
slow
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110510781.9A
Other languages
Chinese (zh)
Other versions
CN113158970A (en
Inventor
高跃
陈自强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110510781.9A priority Critical patent/CN113158970B/en
Publication of CN113158970A publication Critical patent/CN113158970A/en
Application granted granted Critical
Publication of CN113158970B publication Critical patent/CN113158970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a system for recognizing actions based on a fast-slow dual-flow graph convolutional neural network, wherein the method comprises the following steps: acquiring human skeleton joint characteristics; regularizing the human body skeleton joint characteristics, and deforming the shapes of the human body skeleton joint characteristics in one batch; copying the processed human body skeleton joint characteristics to generate two identical human body skeleton joint characteristics, and respectively inputting the two identical human body skeleton joint characteristics to a fast branch and a slow branch of a fast-slow double-flow graph convolution network for characteristic learning; and performing dimensionality elimination on the features of each action category through a global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through a full connection layer, and obtaining the score of each action category through a Softmax function. The method solves the problem that the modeling of the time sequence information is weak in the prior art, and is a method for capturing the time sequence information and the fast and slow motion information better.

Description

Action identification method and system based on fast and slow dual-flow graph convolutional neural network
Technical Field
The invention relates to the technical field of action recognition based on skeleton information, in particular to the technical field of action recognition based on skeleton information.
Background
In the task of motion recognition based on skeletal information, a method based on a graph convolution neural network is the current mainstream method. The graph convolution neural network is designed for feature extraction of a single static graph structure, and is weak for extracting time sequence information. The human skeleton information is a time-series continuous graph structure data, and can also be regarded as a dynamic graph data. For the task of motion recognition, capturing only the spatial structure information (single frame skeleton information) of the static image and ignoring the timing information cannot achieve satisfactory performance. Generally, for actions which only need a single frame of static information and can be distinguished, the method based on the graph convolution neural network can obtain better performance; and some actions are similar to other actions due to the static frame, and the actions can be distinguished by adding time sequence action information, so that the model has better modeling capability of the time sequence information.
The design center of gravity of many current methods based on the graph convolution neural network improves the performance of the model by defining adaptive adjacency matrixes, new graph structure modeling methods, new node connection and the like on the aspect of capturing spatial structure information. Compared with the ST-GCN which applies GCN to the task of human skeleton action recognition for the first time, the methods have certain performance improvement. However, in the modeling of the timing information, the methods simply follow the two-dimensional convolution used by the ST-GCN to model the timing information, and are not greatly improved.
In the RGB video-based method, interaction between modeling timing information and modeling spatio-temporal information has been an important issue, and researchers have modeled motion information using an optical flow modality or modeled both temporal and spatial information using a 3D convolutional network. In recent years, a convolutional neural network based method Slowfast has been greatly successful in an RGB video based motion recognition method.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention provides a method for recognizing actions based on a fast and slow dual-flow graph convolutional neural network, which is designed on the basis of a graph convolutional neural network and is used for capturing time sequence information and fast and slow action information better by using the fast and slow dual-flow graph convolutional neural network so as to improve the accuracy of action recognition.
The second purpose of the invention is to provide an action recognition system based on a fast and slow dual-flow graph convolutional neural network.
A third object of the invention is to propose a computer device.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for identifying an action based on a fast-slow dual-flow graph convolutional neural network, including the following steps:
step S10, obtaining human skeleton joint characteristics;
step S20, regularization processing is carried out on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing time sequence dimension, and the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;
step S30, copying the human body skeleton joint features processed in the step S20 to generate two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and S40, performing dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
Optionally, in an embodiment of the present application, the step S10 includes the following steps:
human skeleton joint features are obtained from the data set, and the feature shape of each sample is as follows:
(C,T,M,V)
wherein C is the number of characteristic channels, has a value of 3 and represents the three-dimensional coordinates (x, y, z) of the joint points; t represents the number of frames of the action; m represents the number of persons performing the action; v represents the number of human joint points.
Optionally, in an embodiment of the present application, the step S20 includes the following steps:
carrying out regularization processing on data, using batch training in the training process, wherein the characteristic shape of the tensor of one batch is as follows:
(B,C,T,M,V)
firstly, deforming the one-dimensional batch tensor into the following steps:
(B,M*V*C,T)
and then, using one-dimensional batch regularization module to regularize the time sequence T dimension, and re-deform the features into the original shape (B, C, T, M, V).
Optionally, in an embodiment of the present application, the specific steps in step S30 include:
each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t, 1), and t is the time sequence receptive field of the convolution kernel; after the two convolution layers, a batch regularization layer and a ReLU activation function are attached, so that the characteristics of each channel are ensured to keep the same distribution; the computation of the convolution block is described using the following formula:
Figure BDA0003060294420000031
B k and C k Is an adaptive adjacency matrix proposed in 2s-AGCN, which changes during network training, wherein B k Is set to A at initialization k For learning the potential association of any two nodes; c k Is a matrix calculated according to the sample characteristics for describing the sample specific node associations.
Optionally, in an embodiment of the present application, the following two formulas respectively describe the feature shapes of the input features of the map convolution block at the same stage:
f fast i n =(B,βC,αT,V,M)
f slow i n =(B,C,T,V,M)
the timing dimension of the fast branch is always at 1 Alpha is a positive integer representing the ratio of the input frame rate of the fast branch, in which the number of channels beta, to the frame rate of the slow branch in the initial input signature i C i Is significantly less than the channel number C of the convolution block of the same-stage slow branch graph i Where i is the block number, β i Is a value less than 1, e.g. 1/3, V of both branches being identicalThe number of graph nodes.
Optionally, in one embodiment of the present application, the information learned by both fast and slow branches is shared using a cross-connect module, fusing from fast to slow branches, since
Figure BDA0003060294420000032
And
Figure BDA0003060294420000033
the characteristic shapes of (A, B, beta C, alpha T, V, M) and (B, C, T, M, V) are respectively, firstly, a two-dimensional convolution layer is adopted to carry out characteristic shape conversion, a batch regularization layer and a ReLU function are added after the characteristic shape conversion is carried out, and then the two characteristics are fused in a splicing or adding mode.
Optionally, in an embodiment of the application, in step S40, the final feature obtained in step S30 eliminates three dimensions of a time sequence T, a graph node V, and a number M through a global pooling layer, and maps the feature to each action category through a full connection layer, and finally, a score of each action category is obtained through a Softmax function.
In order to achieve the above object, a second aspect of the present invention provides a system for identifying actions based on a fast-slow dual-flow graph convolutional neural network, including the following modules:
the acquisition module is used for acquiring the characteristics of the human skeleton joints;
the processing module is used for carrying out regularization processing on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and then the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;
the generating module is used for copying the human body skeleton joint features processed by the processing module, generating two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain the features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and the determining module is used for carrying out dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
In order to achieve the above object, a third aspect of the present application provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for motion recognition based on inter-joint association modeling according to the first aspect of the present application.
To achieve the above object, a non-transitory computer-readable storage medium is provided in a fourth embodiment of the present application, and a computer program is stored thereon, and when being executed by a processor, the computer program implements a motion recognition method based on inter-joint association modeling as described in the first embodiment of the present application.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of an action identification method based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application.
FIG. 2 is a schematic structural diagram of a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application;
FIG. 3 is a diagram illustrating the change of the feature shapes of the input features of the fast and slow branches with the increase of the number of the convolution blocks according to the embodiment of the present application;
fig. 4 is a schematic view of a transverse connection module according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of an action recognition system based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.
The following describes an action recognition method based on a fast and slow dual-flow graph convolutional neural network according to an embodiment of the present invention with reference to the accompanying drawings.
As shown in fig. 1, to achieve the above object, an embodiment of the first aspect of the present invention provides a method for identifying an action based on a fast-slow dual-flow graph convolutional neural network, including the following steps:
step S10, obtaining the characteristics of human skeleton joints;
step S20, regularization processing is carried out on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing a time sequence dimension, and then the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;
step S30, copying the human body skeleton joint features processed in the step S20 to generate two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and S40, performing dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
In an embodiment of the present application, further, the step S10 includes the following steps:
human skeletal joint features are obtained from public data sets such as NTU RGB + D, and the feature shape of each sample is as follows:
(C,T,M,V)
wherein C is the number of characteristic channels, has a value of 3 and represents the three-dimensional coordinates (x, y, z) of the joint points; t represents the number of frames of the motion; m represents the number of persons performing the action; v represents the number of human joint points.
In an embodiment of the present application, further, the step S20 includes the following steps:
carrying out regularization processing on data, using batch training in the training process, wherein the characteristic shape of the tensor of one batch is as follows:
(B,C,T,M,V)
firstly, deforming the one-dimensional batch tensor into:
(B,M*V*C,T)
and then, using one-dimensional batch regularization module to regularize the time sequence T dimension, and re-deform the features into the original shape (B, C, T, M, V).
As shown in fig. 2, our network structure contains two branches, which we call fast and slow branches, respectively.
In an embodiment of the present application, further, the specific steps in step S30 include:
each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t, 1), and t is the time sequence feeling of the convolution kernel; after the two convolution layers, a batch regularization layer and a ReLU activation function are attached, so that the characteristics of each channel are ensured to keep the same distribution; the calculation of the convolution block is described using the following formula:
Figure BDA0003060294420000061
B k and C k Is an adaptive adjacency matrix proposed in 2s-AGCN, which changes during network training, wherein B k Is set to A at initialization k The method is used for learning the potential association of any two nodes; c k Is a matrix calculated according to the sample characteristics for describing the sample specific node associations.
Optionally, in an embodiment of the present application, the following two formulas respectively describe the feature shapes of the input features of the map convolution block at the same stage:
f fast i n =(B,βC,αT,V,M)
f slow i n =(B,C,T,V,M)
the timing dimension of the fast branch is always at 1 Alpha is a positive integer representing the ratio of the input frame rate of the fast branch, in which the number of channels beta, to the frame rate of the slow branch in the initial input signature i C i Is significantly less than the channel number C of the convolution block of the same-stage slow branch graph i Where i is the block number, β i Is a value less than 1, e.g., 1/3, and the V of both branches is identical, both being the number of graph nodes.
In one embodiment of the present application, further, assuming that there are N convolutional blocks in the network structure, in the slow branch, in the time-series convolutional layer in the convolutional block of the graph, we will reduce the frame rate by the step size of the time-series convolutional layer, so there is T 1 ≥T 2 ≥…≥T N (ii) a On the other hand, in each tile, the number of output channels will gradually increase with the increase of the tile to improve the capture capability of the slow branch for the graph space structure information, so there is C 1 ≤C 2 ≤…≤C N . In fast branch, the step size of the convolution kernel in the time-ordered convolution layer of all the graph convolution blocksAre all set to 1 to ensure that the frame rate does not drop, so the timing dimension of the fast branch is always aT 1 And alpha is a positive integer and represents the ratio of the input frame rate of the fast branch to the frame rate of the slow branch in the initial input features. In the fast branch, the number of channels beta i C i Is significantly less than the channel number C of the convolution block of the same-stage slow branch graph i Where i is the block number, β i Is a value less than 1, such as 1/3. The V of both branches is identical, both being the number of graph nodes.
In one embodiment of the present application, further, as shown in FIG. 4, the information learned by both fast and slow branches is shared using the cross-connect module, fusing from fast branch to slow branch, since
Figure BDA0003060294420000062
And
Figure BDA0003060294420000063
the characteristic shapes of (A, B, beta C, alpha T, V, M) and (B, C, T, M, V) are respectively, firstly, a two-dimensional convolution layer is adopted to carry out characteristic shape conversion, a batch regularization layer and a ReLU function are added after the characteristic shape conversion is carried out, and then the two characteristics are fused in a splicing or adding mode.
We first use a two-dimensional convolutional layer for feature shape transformation, and then add batch regularization layer and ReLU function, and then fuse the two features by means of splicing or addition. The above process can be described by the following formula.
Figure BDA0003060294420000071
Figure BDA0003060294420000072
Figure BDA0003060294420000073
Wherein Conv2D is a two-dimensional convolutional layer, BN is a batch regularization layer, reLU is an activation function, fuse is a fusion function, and the fusion mode can adopt modes such as Sum (Sum) and splicing (collocation), and the two modes have close performance.
Further, the present embodiment employs a cross connection module inserted between the two branches to share information between the two modules. <xnotran> , 10 , 3, 128, 128, 128, 128, 256, 256, 256, 512, 512 3, 32, 32, 64, 64, 64, 64, 64, 128, 128. </xnotran>
In an embodiment of the application, in step S40, the final feature obtained in step S30 is eliminated through a global pooling layer by three dimensions of time sequence T, graph node V, and number M, and the feature is mapped to each action category through a full connection layer, and finally, a score of each action category is obtained through a Softmax function.
To achieve the above object, as shown in fig. 5, a second aspect of the present application provides a fast-slow dual-flow graph convolutional neural network-based action recognition system according to the present invention, which includes the following modules:
the acquisition module is used for acquiring the characteristics of the human skeleton joint;
the processing module is used for carrying out regularization processing on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing time sequence dimension, and the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;
the generating module is used for copying the human body skeleton joint features processed by the processing module, generating two identical human body skeleton joint features, inputting the two identical human body skeleton joint features into a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain the features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
and the determining module is used for carrying out dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function.
In order to implement the foregoing embodiments, the present invention further provides a computer device, where the computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method implements the method for identifying an action based on a fast-slow dual-flow graph convolutional neural network according to the embodiments of the present application.
In order to implement the foregoing embodiments, the present invention further provides a non-transitory computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for identifying an action based on a fast-slow dual-flow graph convolutional neural network according to an embodiment of the present application is implemented.
Although the present application has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative and not restrictive of the application of the present application. The scope of the present application is defined by the appended claims and may include various modifications, adaptations, and equivalents of the invention without departing from the scope and spirit of the application.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. A motion recognition method based on a fast and slow dual-flow graph convolutional neural network is characterized by comprising the following steps:
step S10, obtaining human skeleton joint characteristics;
step S20, regularization processing is carried out on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing time sequence dimension, and the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;
step S30, copying the human body skeleton joint features processed in the step S20 to generate two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
step S40, performing dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function;
the specific steps in step S30 include:
each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t, 1), and t is the time sequence receptive field of the convolution kernel; a batch regularization layer and a ReLU activation function are attached to the two convolution layers, so that the characteristics of each channel are kept in the same distribution; the calculation of the convolution block is described using the following formula:
Figure FDA0003921444460000011
B k and C k Is an adaptive adjacency matrix proposed in 2s-AGCN, which changes during network training, wherein B k Is set to A at initialization k For learning the potential association of any two nodes; c k The matrix is obtained by calculation according to sample characteristics and is used for describing specific node association of the sample;
the transverse connection module is used for sharing the information learned by the fast branch and the slow branch, and the fast branch is fused with the slow branch
Figure FDA0003921444460000012
And
Figure FDA0003921444460000013
the feature shapes of (A, B, beta C, alpha T, V, M) and (B, C, T, M, V) are respectively (B, beta C, alpha T, V, M) and (B, C, T, M, V), firstly, a two-dimensional convolution layer is adopted to carry out feature shape conversion, a batch regularization layer and a ReLU function are added after feature shape conversion is carried out, and then the two features are fused in a splicing or adding mode.
2. The method of claim 1, wherein the step S10 comprises the steps of:
human skeleton joint features are obtained from the data set, and the feature shape of each sample is as follows:
(C,T,M,V)
wherein C is the number of characteristic channels, has a value of 3 and represents the three-dimensional coordinates (x, y, z) of the joint points; t represents the number of frames of the motion; m represents the number of persons performing the action; v represents the number of human joint points.
3. The method of claim 1, wherein the step S20 comprises the steps of:
regularization processing is carried out on data, batch training is used in the training process, and the characteristic shape of the tensor of one batch is as follows:
(B,C,T,M,V)
firstly, a one-dimensional batch of tensors is deformed, and the deformation is as follows:
(B,M*V*C,T)
and then regularizing the time sequence T dimension by using one-dimensional regularization module, and re-deforming the features into the original shape (B, C, T, M, V).
4. The method of claim 1, wherein the following two equations describe the characteristic shapes of the input features of the convolution blocks at the same stage respectively:
f fastin =(B,βC,αT,V,M)
f slowin =(B,C,T,V,M)
the timing dimension of the fast branch is always at, where a is a positive integer and represents the ratio of the input frame rate of the fast branch to the frame rate of the slow branch in the initial input features, and the number of channels in the fast branch is β i C i Is significantly less than the channel number C of the convolution block of the same-stage slow branch graph i Where i is the block number, β i Is a value less than 1, and the V of both branches are identical, both being the number of graph nodes.
5. The method of claim 1, wherein in step S40, the final features obtained in step S30 are passed through a global pooling layer to eliminate three dimensions of time sequence T, graph nodes V and number of people M, and the features are mapped to various action categories through a full connection layer, and finally, a score of each action category is obtained through a Softmax function.
6. A motion recognition system based on a fast and slow biflow graph convolutional neural network is characterized by comprising:
the acquisition module is used for acquiring the characteristics of the human skeleton joints;
the processing module is used for carrying out regularization processing on the human body skeleton joint features, wherein the shapes of a batch of the human body skeleton joint features are deformed, a one-dimensional regularization module is used for regularizing time sequence dimension, and the shapes of a batch of the human body skeleton joint features are deformed into the original shapes again;
the generating module is used for copying the human body skeleton joint features processed by the processing module, generating two identical human body skeleton joint features, inputting the two identical human body skeleton joint features to a fast branch and a slow branch of a fast-slow double-flow graph convolution network respectively for feature learning, and fusing learning results of the fast branch and the slow branch to obtain the features of each action category, wherein the fast branch and the slow branch of the fast-slow double-flow graph convolution network have the same network structure and have different network parameter configurations and input features;
the determining module is used for carrying out dimensionality elimination on the features of each action category through the global pooling layer, mapping the features subjected to dimensionality elimination to the corresponding action categories through the full connection layer, and obtaining the score of each action category through a Softmax function;
the generation module is further configured to:
each branch comprises a plurality of continuously superposed graph convolution blocks, and each graph convolution block comprises a space graph convolution layer and a time sequence convolution layer; the time sequence convolution layer is a two-dimensional convolution module, the size of a convolution kernel is (t, 1), and t is a time sequence receptive field of the convolution kernel; after the two convolution layers, a batch regularization layer and a ReLU activation function are attached, so that the characteristics of each channel are ensured to keep the same distribution; the calculation of the convolution block is described using the following formula:
Figure FDA0003921444460000031
B k and C k Is in 2s-The adaptive adjacency matrix proposed in the AGCN changes during network training, where B k Is set to A at initialization k For learning the potential association of any two nodes; c k The matrix is obtained by calculation according to sample characteristics and is used for describing specific node association of the sample;
the transverse connection module is used for sharing the information learned by the fast branch and the slow branch, and the information is fused from the fast branch to the slow branch
Figure FDA0003921444460000041
And
Figure FDA0003921444460000042
the characteristic shapes of (A, B, beta C, alpha T, V, M) and (B, C, T, M, V) are respectively, firstly, a two-dimensional convolution layer is adopted to carry out characteristic shape conversion, a batch regularization layer and a ReLU function are added after the characteristic shape conversion is carried out, and then the two characteristics are fused in a splicing or adding mode.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-5 when executing the computer program.
8. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method of any one of claims 1-5.
CN202110510781.9A 2021-05-11 2021-05-11 Action identification method and system based on fast and slow dual-flow graph convolutional neural network Active CN113158970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110510781.9A CN113158970B (en) 2021-05-11 2021-05-11 Action identification method and system based on fast and slow dual-flow graph convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110510781.9A CN113158970B (en) 2021-05-11 2021-05-11 Action identification method and system based on fast and slow dual-flow graph convolutional neural network

Publications (2)

Publication Number Publication Date
CN113158970A CN113158970A (en) 2021-07-23
CN113158970B true CN113158970B (en) 2023-02-07

Family

ID=76874442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110510781.9A Active CN113158970B (en) 2021-05-11 2021-05-11 Action identification method and system based on fast and slow dual-flow graph convolutional neural network

Country Status (1)

Country Link
CN (1) CN113158970B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550027A (en) * 2022-01-18 2022-05-27 清华大学 Vision-based motion video fine analysis method and device
CN114201475B (en) * 2022-02-16 2022-05-03 北京市农林科学院信息技术研究中心 Dangerous behavior supervision method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN110059598A (en) * 2019-04-08 2019-07-26 南京邮电大学 The Activity recognition method of the long time-histories speed network integration based on posture artis
CN111860128A (en) * 2020-06-05 2020-10-30 南京邮电大学 Human skeleton behavior identification method based on multi-stream fast-slow graph convolution network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11544535B2 (en) * 2019-03-08 2023-01-03 Adobe Inc. Graph convolutional networks with motif-based attention
CN112131908B (en) * 2019-06-24 2024-06-11 北京眼神智能科技有限公司 Action recognition method, device, storage medium and equipment based on double-flow network
CN111325099B (en) * 2020-01-21 2022-08-26 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network
CN112183313B (en) * 2020-09-27 2022-03-11 武汉大学 SlowFast-based power operation field action identification method
CN112381004B (en) * 2020-11-17 2023-08-08 华南理工大学 Dual-flow self-adaptive graph rolling network behavior recognition method based on framework

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN110059598A (en) * 2019-04-08 2019-07-26 南京邮电大学 The Activity recognition method of the long time-histories speed network integration based on posture artis
CN111860128A (en) * 2020-06-05 2020-10-30 南京邮电大学 Human skeleton behavior identification method based on multi-stream fast-slow graph convolution network

Also Published As

Publication number Publication date
CN113158970A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN113449857B (en) Data processing method and data processing equipment
CN112308200B (en) Searching method and device for neural network
CN109558862B (en) Crowd counting method and system based on attention thinning framework of space perception
JP2018073393A (en) 3d reconstruction of real object from depth map
Zhang et al. Progressive hard-mining network for monocular depth estimation
EP3905194A1 (en) Pose estimation method and apparatus
CN113158970B (en) Action identification method and system based on fast and slow dual-flow graph convolutional neural network
CN111480169A (en) Method, system and apparatus for pattern recognition
CN111667459B (en) Medical sign detection method, system, terminal and storage medium based on 3D variable convolution and time sequence feature fusion
CN110738650B (en) Infectious disease infection identification method, terminal device and storage medium
US20230326173A1 (en) Image processing method and apparatus, and computer-readable storage medium
CN112132739A (en) 3D reconstruction and human face posture normalization method, device, storage medium and equipment
JP2017068608A (en) Arithmetic unit, method and program
CN110781894A (en) Point cloud semantic segmentation method and device and electronic equipment
CN116071300A (en) Cell nucleus segmentation method based on context feature fusion and related equipment
CN116310219A (en) Three-dimensional foot shape generation method based on conditional diffusion model
CN113065529B (en) Motion recognition method and system based on inter-joint association modeling
CN113554656B (en) Optical remote sensing image example segmentation method and device based on graph neural network
Lv et al. Memory‐augmented neural networks based dynamic complex image segmentation in digital twins for self‐driving vehicle
CN112884702A (en) Polyp identification system and method based on endoscope image
JP2023145404A (en) System and method for using pyramid and uniqueness matching priors to identify correspondences between images
JP2021527859A (en) Irregular shape segmentation in an image using deep region expansion
CN113191367B (en) Semantic segmentation method based on dense scale dynamic network
CN113065637B (en) Sensing network and data processing method
CN113516670A (en) Non-mode image segmentation method and device with enhanced feedback attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant