CN113239884A - Method for recognizing human body behaviors in elevator car - Google Patents

Method for recognizing human body behaviors in elevator car Download PDF

Info

Publication number
CN113239884A
CN113239884A CN202110625850.0A CN202110625850A CN113239884A CN 113239884 A CN113239884 A CN 113239884A CN 202110625850 A CN202110625850 A CN 202110625850A CN 113239884 A CN113239884 A CN 113239884A
Authority
CN
China
Prior art keywords
joint
skeleton
graph
matrix
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110625850.0A
Other languages
Chinese (zh)
Inventor
邓海南
詹跃明
余晓毅
王文蝶
陈鸣利
肖渝
赵兴明
蔡润龙
杨利鸿
杨玲
郎仲宽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Tielian Intelligent Technology Co ltd
Chongqing Energy College
Original Assignee
Chongqing Tielian Intelligent Technology Co ltd
Chongqing Energy College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Tielian Intelligent Technology Co ltd, Chongqing Energy College filed Critical Chongqing Tielian Intelligent Technology Co ltd
Priority to CN202110625850.0A priority Critical patent/CN113239884A/en
Publication of CN113239884A publication Critical patent/CN113239884A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to a human behavior recognition method in an elevator car, which is based on a behavior recognition method of a framework self-adaptation and joint enhancement graph convolution network; firstly, learning the connection relation among all joint points by using a Gaussian function of embedded operation, and adaptively adjusting the joint structure according to input skeleton data; secondly, introducing a soft attention mechanism, and measuring the difference of the contribution of each joint point so as to enhance the feature expression of the joint points with high contribution and weaken the interference of the features of the joint points with low contribution; and finally, constructing an end-to-end graph convolution network to learn the spatio-temporal co-occurrence characteristics of the skeleton joint points. The problem of insufficient modeling of human joint motion in the current algorithm can be avoided, and the interference is reduced. The algorithm provided by verification on the MSR Action 3D of the public behavior recognition data set is feasible, the recognition accuracy is high, and another feasible choice is provided for 3D human body skeleton data modeling and human behavior recognition.

Description

Method for recognizing human body behaviors in elevator car
Technical Field
The invention belongs to the technical field of data identification processing in physics, and particularly relates to a human behavior identification method in an elevator car.
Background
At present, researches and engineering development based on convolutional neural networks are more and more, such as CN110223706A, CN110222653A and CN1109740695A, and extensive researches are obtained in the direction of skeleton behavior recognition based on deep learning, and as mentioned in the above-mentioned CN110222653A, human body behavior recognition is also an important subject in the field of decision-making of behavior patterns of human body through body motion analysis.
In the study based on the convolutional neural network, the skeleton data is used as a pseudo image in a vector matrix form, and the spatial local features of the joint points are extracted; modeling the skeleton data into coordinate vectors based on a cyclic neural network, learning high-level time dynamic characteristics of skeleton sequences, introducing a global context attention mechanism, and selectively paying attention to important joints of each frame of skeleton sequences; in order to extract detailed characteristics from any structure diagram, a Graph Convolution Network (GCN) technology is applied, and a space-time graph modeling is carried out on human skeleton data by using a space-time graph convolution network (TS-GCN) so as to obtain a good identification effect. However, it is clearly not sufficient to rely on only one fixed skeleton map to express flexible and variable joint movements for different behaviors of different individuals. Furthermore, from the human kinematics perspective, the contribution of each joint feature of the skeleton to the action differentiation is inconsistent, and for some similar behaviors, too many low-contribution features may affect the final classification decision.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method for identifying human body behaviors in an elevator car, avoid the problem of insufficient modeling of human body joint actions in the current algorithm, provide another feasible selection for 3D human body skeleton data modeling, and obtain feasible and accurate technical effects.
In order to solve the technical problems, the invention adopts the following technical scheme:
the human behavior recognition method in the elevator car comprises the following steps: 1) designing a human skeleton space-time topological graph, learning the connection relation among all joint points by using a Gaussian function of embedded operation, and adaptively adjusting joint structure connection according to input sample data;
2) a soft attention mechanism is introduced to measure the difference between the joint points, enhance the feature expression of the joint points with high contribution and weaken the interference of the features of the joint points with low contribution, and extract the joint features with high discriminability;
3) and constructing an end-to-end graph convolution network to learn the space-time co-occurrence characteristics of the human skeleton joint points.
Furthermore, the predefined human skeleton diagram is formed by joint points and skeleton edges connected with every two joints according to the body structure; a complete human body action comprises a T frame skeleton sequence with N jointsThe skeleton map is represented by G ═ (V, E), where the set of joint points V ═ VtiI T1., T, i 1., N includes all joints in the bone sequence, and the set of bone edges E ═ E · E ·, NS,ET};
Wherein, the space edge E represents the connection of adjacent joint points in a frame of skeleton sequenceS={vtivtjL (i, j) belongs to H }; h is the joint set naturally connected with human body, and the time edge E connected with the same joint point between the continuous framesT={vtiv(t+1)i}。
Further, in the spatial dimension, the graph convolution operation is applied to any joint point v in the skeleton graphtiThe feature extraction of (a) is expressed as:
Figure BDA0003102094200000021
where f is the extracted joint point vtiB (v) ofti)={vtj|d(vtj,vti) D is vtiD controlling the range of the neighbor node taken, vtjIs represented by the formulatiDirectly connected articulation points; zti(vtj)=|{vtk|lti(vtk)=lti(vtj) } | is a normalization term, w is a weight function of a neighboring joint point; in order to learn the differential characteristics of each neighbor node differentially, the neighbor nodes are divided into K sets according to the human motion characteristics, and the labels are mapped toti(vtj):B(vti) → 0.. K-1, for each node vtjAssigning a unique weight vector;
the operation of graph convolution in the time domain is obtained by graph convolution expansion on the space domain, and the parameter gamma is used as the time range for controlling the neighbor set, so that the neighbor set in two dimensions of space and time is represented as follows:
Figure BDA0003102094200000022
label mapping set corresponding to neighbor nodeIs synthesized intoST(vqj)=lti(vtj) + (q-t + Γ/2) xK, wherein lti(vtj) Is the case of a single frame vtiMapping the label of (2);
in a graph convolution network, after performing self node information aggregation and neighbor node information aggregation on each joint point, the information is propagated to the next joint point until all joint points in a skeleton sequence are traversed, and a graph convolution formula is expressed as follows:
Figure BDA0003102094200000023
the characteristic graph f is a tensor of C multiplied by T multiplied by N dimensionality, C is the number of channels, T is the frame number of the skeleton graph sequence, and N is the joint number of a single-frame skeleton graph;
the connection relationship of each joint in the skeleton diagram is represented by an N × N adjacency matrix A and a unit matrix I, D is a degree matrix of each joint point, and D-1/2(A+I)D-1/2Representing a normalized skeleton graph structure, wherein W is a weight matrix learned in a graph convolution network; according to the neighbor node partition strategy, the adjacent matrix A is decomposed into K matrixes AKThe graph convolution formula is further expressed as:
Figure BDA0003102094200000024
dynamically adjusting the connection relation and the connection strength of each joint point according to the input skeleton data, wherein the formula of the self-adaptive graph volume layer is adjusted as follows:
Figure BDA0003102094200000031
Akoriginal matrix representing natural articulation of joints in the body, BkExpressing an adaptive matrix, embedding a normalized Gaussian function in a network layer to calculate the similarity of two joint points in a skeleton diagram, so as to measure the connection relation of the two joint points, specifically:
Figure BDA0003102094200000032
in the formula, phi (v)i)=WΦviAnd Ψ (v)j)=WΨvjAre all embedded operations, WΦAnd WΨIs the corresponding weight parameter;
performing two-way parallel convolution kernel by using two embedded functions of phi and psi to perform 1 × 1 convolution operation, performing dimension transformation on output characteristics of the phi and psi respectively, performing matrix multiplication, and classifying by using a softmax function to obtain an adaptive matrix:
Bk=softmax(fin TWΦk TWΦkfin) (7)
wherein the elements
Figure BDA0003102094200000033
Is normalized to [0, 1 ]]In the meantime.
Further, f containing the spatial structure characteristics of each joint is taken out from the t frame skeleton diagramtFirstly, transforming the joint characteristics through a full connection layer, aggregating all transformed joint characteristics to obtain query characteristics:
Figure BDA0003102094200000034
wherein W is a learnable weight matrix, and the attention scores of all the joint points in the skeleton diagram are expressed as:
mt=softmax(Wstanh(Wfft+Wqqt+bfq)+bs) (9)
wherein Ws,Wf,WqAre all learnable weight matrices, bfq,bsAre all offset, mt=(mt1,mt2,...,mtN) Representing the importance degree of the corresponding joint point in the t frame skeleton diagram, and the value of the importance degree is normalized by the softmax function[0,1](ii) a The attention layer can output a skeleton map with enhanced joint characteristics, and each joint point vtiIs expressed as:
Figure BDA0003102094200000035
further, it is verifiable by public behavior recognition data set MSRAction 3D.
Compared with the prior art, the invention has the following beneficial effects:
the invention relates to a human behavior recognition method in an elevator car, which is based on a behavior recognition method of a framework self-adaptation and joint enhancement graph convolution network; firstly, learning the connection relation among all joint points by using a Gaussian function of embedded operation, and adaptively adjusting the joint structure according to input skeleton data; secondly, introducing a soft attention mechanism, and measuring the difference of the contribution of each joint point so as to enhance the feature expression of the joint points with high contribution and weaken the interference of the features of the joint points with low contribution; and finally, constructing an end-to-end graph convolution network to learn the spatio-temporal co-occurrence characteristics of the skeleton joint points. The problem of insufficient modeling of human joint motion in the current algorithm can be avoided, and the interference is reduced. The algorithm provided by verification on the MSR Action 3D of the public behavior recognition data set is feasible, the recognition accuracy is high, and another feasible choice is provided for 3D human body skeleton data modeling and human body behavior recognition. During the in-service use, use in the elevator car, can gather passenger information, provide the safety guarantee, supervise civilization and take advantage of the ladder.
Drawings
FIG. 1 is a block diagram of an overall framework of a framework adaptation and joint enhancement graph convolution network in a method according to an embodiment;
FIG. 2 is a schematic diagram of a human spatiotemporal skeleton diagram represented by a MSRACtion 3D skeleton diagram in an embodiment;
fig. 3 is a schematic diagram of a partition strategy according to a spatial structure of an MSRAction 3D skeleton diagram in an embodiment;
FIG. 4 is a diagram of a skeleton adaptive map convolutional layer in an embodiment;
FIG. 5 is a schematic view of an enhanced attention layer of a joint in an exemplary embodiment;
FIG. 6 is a sample diagram of the MSRACtion 3D partial skeleton action in an embodiment;
FIG. 7 is a graph of the variation of training loss values in an exemplary embodiment;
FIG. 8 is a graph illustrating a test recognition rate variation in an exemplary embodiment;
FIG. 9 is a diagram of an original adjacency matrix in an embodiment;
FIG. 10 is a diagram of a skeletal adaptive adjacency matrix in an embodiment;
FIG. 11 is a schematic diagram of an exemplary embodiment of a confusion matrix for a reference network TS-GCN;
FIG. 12 is a schematic diagram of an exemplary confusion matrix between the reference network and the joint enhancement layer TS-GCN + JE;
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
The overall framework of the method for identifying human behaviors in the elevator car of the specific embodiment is shown in fig. 1, a 3-layer framework self-adaption and joint enhancement graph convolution network (SA + JE-GCN) is stacked, and by utilizing the hierarchical characteristics of the network, not only can high-level complex motion characteristics be learned, but also the expression capability and performance of a network model can be improved. In the SA + JE-GCN network, a skeleton self-adaptive layer and a joint attention enhancement layer learn the global features and the local features of a skeleton map space, a time convolution layer is used for extracting the time dynamic features of continuous frames of a skeleton map sequence, and a residual error structure is introduced to prevent the problems of gradient explosion and gradient disappearance in the model training process so as to further improve the performance of the network.
MSRAction 3D skeleton map representation:
the raw skeletal data extracted from the depth map is composed of three-dimensional coordinates in space of a series of body joint points. Graph convolution networks are based on the convolution operation of topological graph structures, and in the T-SGCN, a predefined human skeleton graph can be formed by joint points and bone edges connected with every two joints according to body structures. A complete human motion comprises T frames of bones with N jointsA skeleton map may be represented by G ═ (V, E), where the set of joint points V ═ VtiI T1., T, i 1., N includes all joints in the bone sequence, and the set of bone edges E ═ E · E ·, NS,ETWherein represents the spatial edge E connecting adjacent joint points in a frame skeleton sequenceS={vtivtjL (i, j) belongs to H, wherein H is a joint set naturally connected with the human body, and a time edge E connected with the same joint point between continuous framesT={vtiv(t+1)i}. Fig. 2 is a space-time skeleton diagram structure of a running motion sequence, which includes the characteristic representation of the skeleton sequence in the space dimension and the time dimension. The circles are joint points, and the two boxes respectively represent a space edge connecting the joint point and the adjacent node in the single-frame sequence and a time edge connecting the same joint point in the continuous frames.
Performing convolution modeling on a space-time diagram:
based on a predefined human body spatio-temporal skeleton diagram, the TS-GCN network adopts a layered structure to stack nine-layer diagram convolution network so as to learn the complex and high-level spatio-temporal co-occurrence characteristics of skeleton diagram data. In the space dimension, the graph volume operation is carried out on any joint point v in the skeleton graphtiThe feature extraction of (a) is expressed as:
Figure BDA0003102094200000051
where f is the extracted joint point vtiB (v) ofti)={vtj|d(vtj,vti) D ≦ means vtiD controls the range of the neighbor node taken, D ═ 1 means that only and v is takentiDirectly connected articulation point vtj。Zti(vtj)=|{vtk|lti(vtk)=lti(vtj) The | is a normalization item, w is a weight function of a neighbor joint point, in order to learn the differentiated characteristics of each neighbor node differently, the neighbor nodes are divided into K sets according to the motion characteristics of the human body, and a label is mapped to lti(vtj):B(vti) → 0, K-1 for each node vtjAssign a unique weight toAmount of the compound (A).
In this embodiment, the spatial structure partitioning strategy in fig. 3 is adopted, and with number 4 spine joint points as the center of gravity of the human skeleton, the neighbor set can be divided into three subsets: a root node set (the joint point itself is No. 2), a near center point set (closer than the root node to the gravity point, for example, No. 3), and a far center point set (farther than the root node to the gravity point, for example, No. 9).
The operation of graph convolution in the time domain can be obtained by graph convolution expansion on the space domain, and the parameter Γ is used as the time range for controlling the neighbor set, so that the neighbor set in two dimensions of space and time can be represented by the following formula:
Figure BDA0003102094200000052
then the set of label mappings corresponding to the neighbor node is lST(vqj)=lti(vtj) + (q-t + Γ/2) xK, wherein lti(vtj) Is the case of a single frame vtiThe label mapping of (2).
In the graph convolution network, after the node information of each joint point is aggregated with the neighbor node information, the node information is propagated to the next joint point until all the joint points in the skeleton sequence are traversed. In conjunction with a graph matrix that can represent the topology of the graph in graph theory, the graph convolution formula can be written as:
Figure BDA0003102094200000053
the feature map f is a tensor of one dimension C x T x N, C is the number of channels, T is the number of frames of the skeleton map sequence, and N is the number of joints of the single frame skeleton map. The connection relationship of each joint in the skeleton diagram is represented by an N × N adjacency matrix A and a unit matrix I, D is a degree matrix of each joint point, and D is used-1/2(A+I)D-1/2To represent the normalized skeletal graph structure, W is the weight matrix learned in the graph convolution network. According to the neighbor node partition strategy, the adjacency matrix A is decomposed into K matrixes AKThe graph convolution formula can be furtherExpressed as:
Figure BDA0003102094200000061
framework self-adaptive graph volume layer:
in the graph convolution formula (3), the adjacency matrix A is an N × N weight matrix, AijThe connection relationship between the joint points i and j is represented, and the skeleton diagram structure of the input can be described. However, in the TS-GCN, only one skeleton topology map is adopted according to natural connection of joints of a human body, and the skeleton map input in the hierarchical network structure is fixed and unchangeable, and this mode is not optimal in the task of learning different kinds of behavior skeleton features. Thus, the present embodiment proposes a method for extracting the original adjacency matrix A by utilizing the capability of the convolutional network to extract advanced featureskBecome a learnable adaptive matrix BETAkThe connection relationship and the connection strength of each joint point are dynamically adjusted according to the input skeleton data, and the formula of the self-adaptive graph convolution layer can be adjusted as follows:
Figure BDA0003102094200000062
adaptive matrix BkWith an original matrix A representing the natural connection of the joints in the bodykAnd adding the two joint points to replace a fixed predefined skeleton graph, and embedding a normalized Gaussian function in the network layer to calculate the similarity of the two joint points in the skeleton graph so as to measure the connection relation of the two joint points. The specific operation is as follows:
Figure BDA0003102094200000063
in the formula, phi (v)i)=WΦviAnd Ψ (v)j)=WΨvjAre all embedded operations, WΦAnd WΨIs the corresponding weight parameter.
It should be noted that: the non-local neural network may be through a non-local structureAnd establishing a connection relation between the single joint and the skeleton map global joint, and performing convolution operation in an embedding space to update the state between the nodes. The structure of the skeleton adaptive graph convolution layer is shown in fig. 4: the light grey square part (four stars) indicates that the parameters are learnable, input finIs a CinThe characteristic data of the X T multiplied by N dimensionality respectively corresponds to the number of input channels, the number of frame sequences of a skeleton diagram and the number of joints, two-way parallel convolution kernels are executed to be 1 multiplied by 1 convolution operation through two embedded functions of phi and psi, matrix multiplication is carried out after dimensionality transformation is carried out on the output characteristics of the phi and psi, and a self-adaptive matrix is obtained by using softmax function classification:
Bk=softmax(fin TWΦk TWΦkfin) (7)
wherein the elements
Figure BDA0003102094200000064
Is normalized to [0, 1 ]]Compared with a predefined skeleton diagram, the self-adaptive matrix can establish a new connection relation according to input joint point data, is not limited to natural connection in a human body, and is more flexible and changeable. A residual error structure is added into the self-adaptive layer, 1 multiplied by 1 convolution operation is adopted, so that input dimension is consistent with output dimension, and original features of a lower layer are also reserved in output high-layer features.
Joint enhancement layer based on attention mechanism:
in the process of classifying various behaviors of a human body by using a graph convolution network, extracting high-grade effective motion characteristics is the key for improving the identification precision. The attention mechanism is an information processing mechanism researched according to human visual characteristics, can give enough attention to important information and weaken interfered information, and improves the signal-to-noise ratio to achieve the aim of information enhancement. In the behavior recognition based on skeleton information, such as behaviors of ' clapping ' hands, waving hands ' hands and ' shaking hands ', joint features on arms are more important than other joint features of the skeleton, and in some similar behaviors, too many interference features can also influence the final classification effect. The joint enhancement layer with the soft attention mechanism adaptively focuses on key nodes in the skeleton map and automatically computes the importance of each joint.
As shown in FIG. 5, the attention layer having joint enhancement function is obtained by extracting f including the spatial structure characteristics of each joint from the t-th frame skeleton diagramtFirstly, transforming the joint characteristics through a full connection layer, aggregating all transformed joint characteristics to obtain query characteristics:
Figure BDA0003102094200000071
where W is a learnable weight matrix, the attention scores of all the joints in the skeleton map can be written as:
mt=softmax(Wstanh(Wfft+Wqqt+bfq)+bs) (9)
wherein Ws,Wf,WqAre all learnable weight matrices, bfq,bsAre all offset, mt=(mt1,mt2,...,mtN) Representing the importance degree of the corresponding joint point in the t-th frame skeleton diagram, and the value of the importance degree is normalized to 0, 1 by the softmax function]。
Thus, the attention layer may output a skeletal map with enhanced joint features, each joint point vtiThe spatial features of (a) may be expressed as:
Figure BDA0003102094200000072
and (3) comparison, verification and analysis:
in order to verify the effectiveness of the behavior recognition model of the method, three groups of control experiments are respectively carried out on a representative MSRAMion 3D framework data set by taking a time-space graph convolutional network (TS-GCN) as a base line network, and the comparison is carried out with other existing methods.
Dataset, MSRAction 3D dataset: the MSRAMion 3D data set is based on 20 skeleton joint 3D coordinates obtained by a Kinect depth camera and is commonly used for verifying the validity of a human behavior recognition algorithm. Specific joint names referring to fig. 3, the data set was presented by 10 subjects with 20 motion categories in table 1, each motion was repeated 2-3 times, and there were 567 motion sequence sample data, and the frame number of each motion sequence was 10-100.
TABLE 120 action classes for MSRACtion 3D
Figure BDA0003102094200000073
Figure BDA0003102094200000081
FIG. 6 is a skeleton sample diagram of partial actions, where the actions of this data set have very high similarity, and the difference of the frame number captured by different actions is large, which brings great challenge to the verification of the model of the method. In the data sample preprocessing stage, there is a serious data loss in 20 action sequences, so the total data amount of this experiment is 547 samples, and the cross-validation method classified by subjects is used to test the performance of the model, wherein subjects 1, 3, 5, 7, 9 are used for training, and subjects 2, 4, 6, 8, 10 are used for testing.
The experiment was performed based on a graph convolution network of skeletal adaptation and joint enhancement, as shown in fig. 1. The reference network is a stacked 3-layer space-time graph convolution (TS-GCN), and the input/output channels and Padding are (3,32,1), (32,64,2), (64,128,2), respectively. A random gradient descent (SGD) optimization strategy with a momentum Nesterov of 0.9 is adopted, a cross entropy loss function is used as calculation of a gradient back propagation error, the training period is 120 epochs, the initial learning rate is set to be 0.1, attenuation is 0.1 times respectively at 50 epochs and 80 epochs, the weight attenuation coefficient is 0.0001, the dropout is 0.25, and the training batch and the testing batch are both 16. And respectively adding a framework self-adaptive layer (SA), a joint enhancement layer (JE) and a framework self-adaptive and joint enhancement layer (SA + JE) into the base-line network to test the performance of each component.
Four groups of human behavior recognition experiments are carried out on the MSRACtion 3D data set, and the effectiveness of the method is verified by using a reference network TS-GCN and two network layer combined training models respectively. Fig. 7 and 8 are graphs of Loss values (Loss) of the four experiments during network training and recognition accuracy (Acc) of the network test with the training period (epoch), respectively. FIG. 6 is a group diagram with an offset of 0.1, in which the convergence speed of the loss values of the reference network TS-GCN is the slowest, and the loss value is the lowest 0.0313; and the loss value of the framework self-adaptive layer and the joint enhancement layer can be added to realize faster convergence, the minimum loss value is 0.013, and 3D framework data can be better fitted.
The behavior recognition accuracy of the four experiments in fig. 8 tends to be stable when the number of epochs is substantially 100, wherein the recognition rate of the reference network has larger oscillation, and the network with the framework adaptive layer and the joint enhancement layer is more stable. The recognition accuracy of the four experimental tests is given in table 2, respectively.
TABLE 2 comparison of recognition rates of the method in MSRACtion 3D data sets
Figure BDA0003102094200000082
When the space-time graph convolutional network TS-GCN is used as a reference network, the accuracy is 92.05%, when a framework adaptive layer (SA) and a joint enhancement layer (JE) are added, the recognition accuracy is respectively improved by 1.67% and 1.23%, the recognition rate can reach 95.36% after the two are fused, and the recognition rate is improved by 3.31% compared with the reference network TS-GCN. From the detail point of view, the skeleton adaptive layer carries out adaptive adjustment on the adjacent matrix expressing the skeleton map structure.
The original neighborhood matrix predefined in fig. 9 shows that the connection relationship between the joint points is fixed and invariant, which cannot effectively interpret the potential dependency relationship between the position movement information of the joint points of different action skeletons and the joint points. Such as simple running movements, the cooperative dependency between the joints on the foot and hand regions is important and not negligible. The skeletal adaptive adjacency matrix in fig. 10 is more formally flexible and adapts to different behavior recognition tasks better as the skeletal data changes.
Formally, the attention layer with joint augmentation assigns higher weights to highly contributing joint points in the skeletal action, and relatively reduces the weights of the low contributing nodes. In fig. 12, compared to fig. 11, the addition of the reference network of the joint enhancement layer increases the recognition rate of the motions of grab (HCh), jab (DX), side punch (SB), pick up, and throw (PT) by 0.17, 0.16, 0.22, and 0.15, respectively. It is worth noting that the recognition rate of the two actions of High Throw (HT) and positive top-hand ball (TSr) is decreased by 0.17 and 0.12 instead, and the actions are classified as punch (FP) and hammer (H), which have extremely high similarity, and the high-contribution joints between similar actions are also similar to each other, and after the reinforced joint layer is added, the two actions are mixed in classification. This problem states that some similar actions should be given more attention to the joint points rather than focusing on some nodes and discarding information of other low-contributing nodes.
In order to more fully verify the performance of the algorithm proposed in this embodiment, further experimental comparison is performed with the current most advanced method (SOTA) based on the behavior recognition field of the MSRAction 3D dataset, and the comparison result is shown in table 3.
TABLE 3 recognition rate comparison between MSRACtion 3D datasets and other methods
Figure BDA0003102094200000091
The identification accuracy rate of the graph convolution network based on the skeleton self-adaption and the joint enhancement in the experiment is 95.36%, compared with the self-adaption skeleton central point[17]The method is improved by 6.89%, compared with the popular difference cyclic neural network algorithm at home and abroad, the method is improved by 3.33%, and the method is superior to most of the existing behavior recognition methods. Wherein Features combination[23]The medium-feature fusion method has the highest identification accuracy, but uses more than thirty parameters, and needs to consume in order to obtain the optimal identification resultA large amount of time cost is used for manually adjusting parameter values, and the training difficulty of the model is increased. The experimental results in table 3 show that the algorithm provided by the invention has higher feasibility for modeling 3D human skeleton data, and has certain competitiveness compared with the existing behavior identification method; and a feasible selection is provided for 3D human body skeleton data modeling and human body behavior identification.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (5)

1. Method for identifying human behavior in elevator car, characterized by comprising: designing a human skeleton space-time topological graph, learning the connection relation among all joint points by using a Gaussian function of embedded operation, and adaptively adjusting joint structure connection according to input sample data;
a soft attention mechanism is introduced to measure the difference between the joint points, enhance the feature expression of the joint points with high contribution and weaken the interference of the features of the joint points with low contribution, and extract the joint features with high discriminability;
and constructing an end-to-end graph convolution network to learn the space-time co-occurrence characteristics of the human skeleton joint points.
2. The method for recognizing human behavior in an elevator car according to claim 1, characterized in that: the predefined human skeleton graph is formed by joint points and skeleton edges connected with every two joints according to the body structure; a complete body motion comprises a T-frame skeleton sequence with N joints, and the skeleton map is denoted by G ═ (V, E), where the set of joint points V ═ { V, E ═ V-tiI T1., T, i 1., N includes all joints in the bone sequence, and the set of bone edges E ═ E · E ·, NS,ET};
Wherein, it represents the connection of adjacent joint points in a frame skeleton sequenceConnected space edge ES={vtivtjL (i, j) belongs to H }; h is the joint set naturally connected with human body, and the time edge E connected with the same joint point between the continuous framesT={vtiv(t+1)i}。
3. The method for recognizing human behavior in an elevator car according to claim 2, characterized in that: in the space dimension, the graph volume operation is carried out on any joint point v in the skeleton graphtiThe feature extraction of (a) is expressed as:
Figure RE-FDA0003152070360000011
where f is the extracted joint point vtiB (v) ofti)={vtj|d(vtj,vti) D is vtiD controlling the range of the neighbor node taken, vtjIs represented by the formulatiDirectly connected articulation points; zti(vtj)=|{vtk|lti(vtk)=lti(vtj) } | is a normalization term, w is a weight function of a neighboring joint point; in order to learn the differential characteristics of each neighbor node differentially, the neighbor nodes are divided into K sets according to the human motion characteristics, and the labels are mapped toti(vtj):B(vti) → 0.. K-1, for each node vtjAssigning a unique weight vector;
the operation of graph convolution in the time domain is obtained by graph convolution expansion on the space domain, and the parameter gamma is used as the time range for controlling the neighbor set, so that the neighbor set in two dimensions of space and time is represented as follows:
Figure RE-FDA0003152070360000012
the label mapping set corresponding to the neighbor node is lST(vqj)=lti(vtj) + (q-t + Γ/2) xK, wherein lti(vtj) Is the case of a single frame vtiMapping the label of (2);
in a graph convolution network, after performing self node information aggregation and neighbor node information aggregation on each joint point, the information is propagated to the next joint point until all joint points in a skeleton sequence are traversed, and a graph convolution formula is expressed as follows:
Figure RE-FDA0003152070360000021
the characteristic graph f is a tensor of C multiplied by T multiplied by N dimensionality, C is the number of channels, T is the frame number of the skeleton graph sequence, and N is the joint number of a single-frame skeleton graph;
the connection relationship of each joint in the skeleton diagram is represented by an N × N adjacency matrix A and a unit matrix I, D is a degree matrix of each joint point, and D-1/2(A+I)D-1/2Representing a normalized skeleton graph structure, wherein W is a weight matrix learned in a graph convolution network; according to the neighbor node partition strategy, the adjacent matrix A is decomposed into K matrixes AKThe graph convolution formula is further expressed as:
Figure RE-FDA0003152070360000022
dynamically adjusting the connection relation and the connection strength of each joint point according to the input skeleton data, wherein the formula of the self-adaptive graph volume layer is adjusted as follows:
Figure RE-FDA0003152070360000023
Akoriginal matrix representing natural articulation of joints in the body, BkExpressing an adaptive matrix, embedding a normalized Gaussian function in a network layer to calculate the similarity of two joint points in a skeleton diagram, so as to measure the connection relation of the two joint points, specifically:
Figure RE-FDA0003152070360000024
in the formula, phi (v)i)=WΦviAnd Ψ (v)j)=WΨvjAre all embedded operations, WΦAnd WΨIs the corresponding weight parameter;
performing two-way parallel convolution kernel by using two embedded functions of phi and psi to perform 1 × 1 convolution operation, performing dimension transformation on output characteristics of the phi and psi respectively, performing matrix multiplication, and classifying by using a softmax function to obtain an adaptive matrix:
Bk=softmax(fin TWΦk TWΦkfin) (7)
wherein the elements
Figure RE-FDA0003152070360000025
Is normalized to [0, 1 ]]In the meantime.
4. The method for recognizing human behavior in an elevator car according to claim 3, characterized in that: f containing spatial structure characteristics of each joint is taken out from the t frame skeleton diagramtFirstly, transforming the joint characteristics through a full connection layer, aggregating all transformed joint characteristics to obtain query characteristics:
Figure RE-FDA0003152070360000026
wherein W is a learnable weight matrix, and the attention scores of all the joint points in the skeleton diagram are expressed as:
mt=softmax(Wstanh(Wfft+Wqqt+bfq)+bs) (9)
wherein Ws,Wf,WqAre all learnable weight matrices, bfq,bsAre all offset, mt=(mt1,mt2,...,mtN) Denotes the t-thThe importance degree of the corresponding joint point in the frame skeleton map is normalized to [0, 1 ] by the softmax function](ii) a The attention layer can output a skeleton map with enhanced joint characteristics, and each joint point vtiIs expressed as:
Figure RE-FDA0003152070360000031
5. the method for recognizing human behavior in an elevator car according to claim 4, characterized in that: and performing 3D verification through an MSR Action of the public behavior recognition data set.
CN202110625850.0A 2021-06-04 2021-06-04 Method for recognizing human body behaviors in elevator car Withdrawn CN113239884A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110625850.0A CN113239884A (en) 2021-06-04 2021-06-04 Method for recognizing human body behaviors in elevator car

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110625850.0A CN113239884A (en) 2021-06-04 2021-06-04 Method for recognizing human body behaviors in elevator car

Publications (1)

Publication Number Publication Date
CN113239884A true CN113239884A (en) 2021-08-10

Family

ID=77136825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110625850.0A Withdrawn CN113239884A (en) 2021-06-04 2021-06-04 Method for recognizing human body behaviors in elevator car

Country Status (1)

Country Link
CN (1) CN113239884A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688765A (en) * 2021-08-31 2021-11-23 南京信息工程大学 Attention mechanism-based action recognition method for adaptive graph convolution network
CN114613011A (en) * 2022-03-17 2022-06-10 东华大学 Human body 3D (three-dimensional) bone behavior identification method based on graph attention convolutional neural network
CN115984787A (en) * 2023-03-20 2023-04-18 齐鲁云商数字科技股份有限公司 Intelligent vehicle-mounted real-time alarm method for industrial brain public transport
CN116524601A (en) * 2023-06-21 2023-08-01 深圳市金大智能创新科技有限公司 Self-adaptive multi-stage human behavior recognition model for assisting in monitoring of pension robot

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688765A (en) * 2021-08-31 2021-11-23 南京信息工程大学 Attention mechanism-based action recognition method for adaptive graph convolution network
CN113688765B (en) * 2021-08-31 2023-06-27 南京信息工程大学 Action recognition method of self-adaptive graph rolling network based on attention mechanism
CN114613011A (en) * 2022-03-17 2022-06-10 东华大学 Human body 3D (three-dimensional) bone behavior identification method based on graph attention convolutional neural network
CN115984787A (en) * 2023-03-20 2023-04-18 齐鲁云商数字科技股份有限公司 Intelligent vehicle-mounted real-time alarm method for industrial brain public transport
CN116524601A (en) * 2023-06-21 2023-08-01 深圳市金大智能创新科技有限公司 Self-adaptive multi-stage human behavior recognition model for assisting in monitoring of pension robot
CN116524601B (en) * 2023-06-21 2023-09-12 深圳市金大智能创新科技有限公司 Self-adaptive multi-stage human behavior recognition model for assisting in monitoring of pension robot

Similar Documents

Publication Publication Date Title
CN113239884A (en) Method for recognizing human body behaviors in elevator car
Cui et al. Learning dynamic relationships for 3d human motion prediction
Jonschkowski et al. Pves: Position-velocity encoders for unsupervised learning of structured state representations
Jordan et al. Hierarchies of adaptive experts
CN109919122A (en) A kind of timing behavioral value method based on 3D human body key point
CN111652124A (en) Construction method of human behavior recognition model based on graph convolution network
CN103514443B (en) A kind of single sample recognition of face transfer learning method based on LPP feature extraction
CN107437077A (en) A kind of method that rotation face based on generation confrontation network represents study
CN112949647B (en) Three-dimensional scene description method and device, electronic equipment and storage medium
CN105787557A (en) Design method of deep nerve network structure for computer intelligent identification
CN109886072A (en) Face character categorizing system based on two-way Ladder structure
CN106778796A (en) Human motion recognition method and system based on hybrid cooperative model training
CN110826698A (en) Method for embedding and representing crowd moving mode through context-dependent graph
CN113688765B (en) Action recognition method of self-adaptive graph rolling network based on attention mechanism
CN107066951A (en) A kind of recognition methods of spontaneous expression of face and system
CN114998525A (en) Action identification method based on dynamic local-global graph convolutional neural network
CN113378656A (en) Action identification method and device based on self-adaptive graph convolution neural network
CN112990154A (en) Data processing method, computer equipment and readable storage medium
CN117373116A (en) Human body action detection method based on lightweight characteristic reservation of graph neural network
CN116229179A (en) Dual-relaxation image classification method based on width learning system
Cao et al. QMEDNet: A quaternion-based multi-order differential encoder–decoder model for 3D human motion prediction
Praditia Physics-informed neural networks for learning dynamic, distributed and uncertain systems
Faulkner et al. Dyna planning using a feature based generative model
Nandal et al. A Synergistic Framework Leveraging Autoencoders and Generative Adversarial Networks for the Synthesis of Computational Fluid Dynamics Results in Aerofoil Aerodynamics
CN111739168A (en) Large-scale three-dimensional face synthesis method with suppressed sample similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210810

WW01 Invention patent application withdrawn after publication