CN114943324B - Neural network training method, human motion recognition method and device, and storage medium - Google Patents

Neural network training method, human motion recognition method and device, and storage medium Download PDF

Info

Publication number
CN114943324B
CN114943324B CN202210585190.2A CN202210585190A CN114943324B CN 114943324 B CN114943324 B CN 114943324B CN 202210585190 A CN202210585190 A CN 202210585190A CN 114943324 B CN114943324 B CN 114943324B
Authority
CN
China
Prior art keywords
training
neural network
graph
data
human motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210585190.2A
Other languages
Chinese (zh)
Other versions
CN114943324A (en
Inventor
颜延
廖天正
赵金津
任旭超
赵瑞麒
马良
王磊
刘语诗
熊璟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202210585190.2A priority Critical patent/CN114943324B/en
Priority to PCT/CN2022/108857 priority patent/WO2023226186A1/en
Publication of CN114943324A publication Critical patent/CN114943324A/en
Application granted granted Critical
Publication of CN114943324B publication Critical patent/CN114943324B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a neural network training method, a human motion recognition method, a device and a storage medium, wherein the neural network training method comprises the following steps: acquiring a training data set, and preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data; inputting the training graph data into a graph neural network for training, wherein the graph neural network comprises a plurality of graph convolution layers which are sequentially connected; and based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers. Through the mode, the training data set is subjected to data preprocessing, so that training graph data input by the graph neural network is obtained, and the training efficiency and the training accuracy of the neural network are improved.

Description

Neural network training method, human motion recognition method and device, and storage medium
Technical Field
The application relates to the technical field of neural networks, in particular to a neural network training method, a human motion recognition method, equipment and a storage medium.
Background
As a typical pattern recognition problem, many conventional machine learning algorithms have long been used to solve the sensor-based HAR (human activity recognition, human motion behavior recognition) problem, including decision trees, random forests, support vector machines, bayesian networks, markov models, and the like. Under the strict control environment and limited input, the traditional maximum appearance algorithm achieves good classification effect, but the traditional manual feature method is long in time consumption, and the extracted features lack incremental and unsupervised learning capability and generalization capability.
Disclosure of Invention
The application mainly provides a neural network training method, a human motion recognition method, equipment and a storage medium, which are used for solving the problems that the traditional manual feature method consumes a long time, and the extracted features lack incremental and unsupervised learning ability and generalization ability.
In order to solve the technical problems, the application adopts a technical scheme that: provided is a neural network training method, including:
acquiring a training data set, and preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data;
inputting the training graph data into a graph neural network for training, wherein the graph neural network comprises a plurality of graph convolution layers which are sequentially connected;
and based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers.
According to an embodiment of the present application, the training map data input into the map neural network for training includes:
inputting each training image data into a first graph convolution layer of the graph neural network, and obtaining a first output of the first graph convolution layer;
and inputting the first output into a next graph convolution layer of the first graph convolution layer, so that the first output is used as the input of the next graph convolution layer to train until the training of all the graph convolution layers of the graph neural network is completed.
According to an embodiment of the present application, the step of inputting the first output into a next graph convolution layer of the first graph convolution layer to train the first output as the input of the next graph convolution layer includes:
superposing the first output and the training diagram data to obtain fusion data;
and inputting the fusion data into the next graph convolution layer to train the fusion data as the input of the next graph convolution layer.
According to an embodiment of the present application, the first output is generated by training image data and training weight calculation of the first graph convolution layer;
the first output is converted to the input of the next graph convolutional layer by an activation function.
According to an embodiment of the present application, the training map data input into the map neural network for training includes:
extracting spatial features from node features of the training graph data by using a Laplacian operator;
constructing a diagonal matrix by using training weights of the graph neural network as diagonal elements;
and forming the output of the graph neural network for training by using the spatial characteristics and the diagonal matrix.
According to an embodiment of the present application, the training for forming the output of the graph neural network by using the spatial feature and the diagonal matrix includes:
acquiring the spatial characteristics of each node characteristic;
based on a preset convolution kernel receptive field, updating the spatial characteristics of each node characteristic;
and forming the output of the graph neural network for training by using the updated spatial characteristics and the diagonal matrix.
According to an embodiment of the present application, the updating the spatial feature of each node feature based on the preset convolution kernel receptive field includes:
setting a chebyshev polynomial recursion equation according to the preset convolution kernel receptive field;
and inputting the spatial characteristics of each node characteristic into the Chebyshev polynomial recursion equation, and recursively obtaining the spatial characteristics of each node characteristic after updating.
According to an embodiment of the present application, after the plurality of graph convolutional layers, the graph neural network is further connected with at least one full-connection layer, and the at least one full-connection layer is used for training classification tasks.
According to an embodiment of the present application, the training result is based on that the weight matrix of the final graph neural network is obtained, and after the neural network training is completed, the neural network training method further includes:
migrating the graph neural network with the neural network training completed to other neural networks as a part of network structures of the other neural networks, thereby forming a migrated neural network;
and training the migration neural network again.
In order to solve the technical problems, the application adopts another technical scheme that: provided is a human motion recognition method including:
acquiring human body motion data of a user by using a wearable sensor;
preprocessing the human motion data to obtain human motion map data;
inputting the human motion map data into a pre-trained map neural network, and acquiring the prediction information of the map neural network on the human motion of the user based on the human motion map data;
acquiring the motion state of the user based on the prediction information;
the graph neural network is obtained through training by the neural network training method.
In order to solve the technical problems, the application adopts another technical scheme that: providing a terminal device comprising a memory and a processor coupled to the memory;
the memory is used for storing program data, and the processor is used for executing the program data to realize the neural network training method and/or the human body motion recognition method.
In order to solve the technical problems, the application adopts another technical scheme that: there is provided a computer storage medium for storing program data which, when executed by a computer, is adapted to carry out a neural network training method and/or a human motion recognition method as described above.
The application provides a neural network training method, a human motion recognition method, a device and a storage medium, wherein the neural network training method comprises the following steps: acquiring a training data set, and preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data; inputting the training graph data into a graph neural network for training, wherein the graph neural network comprises a plurality of graph convolution layers which are sequentially connected; and based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers. Through the mode, the training data set is subjected to data preprocessing, so that training graph data input by the graph neural network is obtained, and the training efficiency and the training accuracy of the neural network are improved.
Drawings
For a clearer description of the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the description below are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art, wherein:
FIG. 1 is a flow chart of an embodiment of a neural network training method provided by the present application;
FIG. 2 is a schematic diagram of the framework of the neural network provided by the present application;
FIG. 3 is a schematic diagram of a main flow of the neural network training method provided by the present application;
FIG. 4 is a schematic diagram of a learning process of the transfer learning provided by the present application;
FIG. 5 is a schematic flow chart of an embodiment of a human motion recognition method according to the present application;
fig. 6 is a schematic structural diagram of an embodiment of a terminal device provided by the present application;
fig. 7 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present application, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.
The daily exercise behavior of the human body is closely related to the health index and energy balance of the human body. For example, the energy consumption of an individual can be calculated through monitoring exercise behaviors such as running, walking and the like, which has positive significance in the aspects of healthy exercise of the individual, energy balance of the body and the like. In addition, people with dangerous conditions can be effectively timely helped through the identification of abnormal movement behaviors (such as falling and the like) of the human body.
Early human motion behavior recognition (human activity recognition, HAR) using machine vision was a popular direction to capture images or video streams to detect human behavior using image/video processing techniques, such as achieved in the video-based HAR field. However, this method is limited by the influence of complex scenes, uncertainty of actions, privacy problems caused by cameras need to be considered, and the method is only suitable for some specific scenes. In contrast, the wearable sensor is not easily disturbed by the environment, and the acquired signals are more continuous and accurate and can be used for wider scenes.
Sensor technology has made remarkable progress in many areas of computing power, size, accuracy, and manufacturing cost over the past decade. These advances enable most sensors to be integrated into smartphones and other portable devices, making these devices more intelligent and practical. The wearable sensors commonly used for HARs are accelerometers, magnetometers, gyroscopes and integrated inertial measurement units (integrated inertial measurement units, IMU).
Research based on deep learning gradually achieves excellent results and takes the dominant role in the field of human body movement behavior recognition. The automatic feature extraction by the multi-layer neural network significantly reduces the pretreatment of features, and deep learning architecture has proven to perform well in unsupervised learning and reinforcement learning.
The present application proposes a solution to the HAR problem for the sensor from the point of view of building the map. The limbs of the person can cooperate with each other in the moving process, the data collected by the person are mapped through the relativity of the sensors at different positions on the person, the map neural network modeling based on the map theory is used, and the actions are classified through the action information and the interrelationship between the sensors contained in the map network learning map.
In this regard, the application constructs a complete HAR framework, selects the graph neural network to model the human motion, confirms that the GNN (Graph Neural Network ) network has strong transfer learning capability and multi-angle learning capability in the HAR field, effectively makes up the defect that the traditional deep learning cannot effectively capture the graph structure data relationship of the non-European space, and proposes a new idea of modeling on the human motion graph structure data based on the sensor.
Based on the technical foundation, the application provides a specific training method of the graph neural network. Referring to fig. 1 to 3, fig. 1 is a schematic flow chart of an embodiment of a neural network training method provided by the present application, fig. 2 is a schematic frame diagram of the neural network provided by the present application, and fig. 3 is a schematic flow chart of the neural network training method provided by the present application.
As shown in fig. 1, the neural network training method according to the embodiment of the present application may specifically include the following steps:
step S11: and acquiring a training data set, and preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data.
In the embodiment of the present application, the data set adopted by the present application may be MHEALTH data set and PAMAP2 data set, and in other embodiments, other data sets may also be adopted, which is not limited herein. The data of the above two data sets are described below:
MHEALTH dataset
The dataset included data from 10 participants in the laboratory environment. Each subject wears wearable sensors connected to the chest, right wrist and left ankle. Physical activities such as standing, sitting, lying, walking, climbing stairs, bending waist forward, lifting the forearm, bending the knee, riding a bicycle, jogging, running, jumping forward and the like all participate in the experiment. The sampling rate of the recorded data was 50Hz. There are then 12 activity categories in the MHEALTH dataset, a total of 21 channels of sensory signals, in which method the perceived information of the user's body is captured by the chest sensor, and two other from the back sensor.
PAMAP2 data set
The dataset included data obtained from 9 participants aged 24 to 30. The participants wear IMUs (Inertial Measurement Unit, inertial measurement units) on the dominant side of the user, wrist, ankle and chest. Activities performed by each person include ten actions of lying, sitting, standing, walking, running, riding a bicycle, walking fast, going up stairs, going down stairs, and rope skipping. Each IMU contains two 3D acceleration sensors, one gyroscope sensor, one magnetometer sensor, with a sampling frequency of 100Hz. Each IMU contains nine-axis sensor information, for a total of 27 channel sensor signals, in which method the application only requires information from 3 sensors in the dataset, including the right waist, left ankle and back, to maintain consistency in sensor position.
Before inputting the data of the training set into the graph neural network for training, the terminal equipment needs to preprocess the data of the training set and convert the data of the training set into graph data. The specific pretreatment process is as follows:
firstly, the terminal equipment carries out noise filtering normalization on training data acquired by all sensors according to a time sequence and resamples the training data to 50Hz. Secondly, the training data is windowed by a sliding window with a fixed length of 128 and an overlapping rate of 50%, and in other embodiments, sliding windows with different lengths may be used, which will not be described herein.
The terminal device may obtain 5361 active time-series segments from the MHEALTH data set and 11784 active time-series segments from the pamp 2 data set according to the sampling frequency of the different data sets, e.g., the sampling frequency of 50Hz MHEALTH data set with a 2.56 seconds per window duration and the sampling frequency of 100Hz pamp 2 data set with a 1.28 seconds per window duration.
The terminal device regards each active time series segment as a training sample and establishes graph data for each training sample as input to the GNN network. One sensor channel is regarded as a node, the pearson correlation coefficient is used for calculating the correlation between each node, a correlation coefficient matrix is obtained, two nodes with the correlation coefficient larger than 0.2 are regarded as nodes with high correlation and are connected, and data with the length of 128 are embedded into the corresponding sensor channel points to form graph data based on one time slice. Wherein the length of the graph data is determined by the length of the sliding window.
As shown in fig. 3, the terminal device performs related preprocessing on the data of the human body sensor to filter out unnecessary noise information and interference information, and then windows the data to construct a graph for each time sequence segment as input of the GNN network.
Step S12: training the training graph data into a graph neural network, wherein the graph neural network comprises a plurality of graph convolution layers which are connected in sequence.
In an embodiment of the application, the graph roll-up neural network (Graph Convolutional Network, GCN) is a deep learning model that acts on non-euclidean space, unlike conventional deep learning. The application uses GCN network as the neural network to be trained because the application shows incomparable advantages in non-European space such as human body motion recognition based on video over other depth models, and the text has a potential graph structure relation in human body motion recognition based on sensors.
In this regard, the present application proposes a new ResGCNN framework that includes a residual map network structure that uses the same parameters sharing as training weights. As shown in fig. 2, the neural network of the present application includes a plurality of sequentially connected graph convolution layers (ChebNet layers), and the output of each graph convolution Layer and the previous graph convolution Layer are used as inputs. In addition, the terminal device can also connect a full connection layer after a plurality of sequentially connected graph convolution layers, the graph convolution layers are used for feature extraction, and the full connection layer is used for classification tasks.
In the embodiment of the application, the terminal equipment constructs a 16-layer ResChebNet model on the basis of human motion recognition based on the sensor. The ResGCNN framework includes four ResChebNet blocks and two additional Fully Connected (FC) layers to solve the problems of overcomplete and gradient disappearance. At the same time, an intra-block residual structure is involved that adds the inputs of the four blocks to the output of the last block as the final output of the ResChebNet block.
On the basis of human body motion recognition on a sensor, compared with the traditional depth model (CNN, LSTM, DEEP-LSTM and the like), the multi-layer ResChebNet modeling shown in fig. 2 effectively learns the non-European structure relation of the upper sensor, introduces a residual structure and icon standardization PairNorm to solve the over-smooth problem and the condition of gradient disappearance, simultaneously introduces a local residual structure to fully learn local structure perception, fully learns the relation of the graph structure based on the human body motion of the sensor, and enables the result to be more accurate and more powerful.
Based on the ResChebNet model shown in FIG. 2, it is assumed that a training diagram data G is given, which is composed of N vertices and edges formed by the N vertices, such that an edge between any two vertices I and J represents their similarity. The adjacency matrix A of the graph data is a sparse matrix with equal I and J terms, and the value of the sparse matrix A is 1 when the I and the J have connecting edges, and the value of the sparse matrix A is 0 otherwise.
In addition, each node in the graph data has an F-dimensional feature vector, X ε R N×F Representing the feature matrix of all N nodes. The dimension of the feature vector of the node is determined by the length of the graph data. The L-layer graph convolutional neural network (GCN) consists of L-layer graph convolution, which is 16-layer graph convolution as shown in FIG. 2. Each convolution layer constructs the input of each node of the convolution layer of the current layer through the output of each node of the previous layer, and the expression form is as follows:
Z (l+1) =A X (l) W (l) ,X (l+1) =σ(Z (l+1) )
wherein, the liquid crystal display device comprises a liquid crystal display device,is the input of the convolution of the L-layer graph of N nodes, X (0) =X;/>Sigma (·) is the activation function, typically selecting a ReLU; d is a degree matrix, and the specific calculation formula is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,the weight matrix that can be learned is a matrix for transforming the features of the downstream learning task.
Further, in the process of feature extraction and feature transformation of each convolution layer, the terminal device can further popularize the traditional fourier transformation into the fourier transformation on the graph through a graph theory and a convolution theorem, and the formula is as follows:
wherein U is a feature vector matrix decomposed by a Laplace matrix L, namely a Laplace operator, f is node features of input graph data, and h is a topological space feature extracted by a trainable and parameter-sharing convolution kernel.
The GCN network convolution operation core is a trainable and parameter sharing convolution core, which is to be described by the GCNDiagonal element->Instead of a learnable parameter θ, the training is then performed by back-propagating the adjustment parameter θ, so the training formula of the GCN network can be expressed as:
Y=σ(Ug(θ)U T x)
wherein x is a representation vector of each node characteristic in the graph data, and Y is output of each node characteristic after being convolved by the GCN network; each node feature in the graph data is convolved with a convolution kernel to extract a corresponding topological space, and then propagated to the next layer through an activation function σ.
Further, since the GCN network has a drawback that the feature substation is required to perform the laplace matrix, matrix multiplication is calculated in each forward propagation process, and when the graph data size is large, the time complexity is O (n 2 ) Time consuming. Wherein, the liquid crystal display device comprises a liquid crystal display device,the number of convolution kernels of the graph neural network is n, and when n is large, node characteristics are slowly updated. The multi-layer GCN network has an over-smoothing problem, the representation vectors of the node characteristics tend to be consistent, and the nodes are difficult to distinguish.
Therefore, the application approximates the convolution kernel by using a k-order chebshaev polynomial, which is brought into the fourier transform of the graph, and the expression formula is as follows:
wherein the weight parameter is theta k For the k-th power of the matrix, the node connected with the intermediate node k-hop, namely L, can be obtained k Whether or not the element in (a) is 0 indicates whether or not a node in the graph data can reach another node through k hops, where k indicates the size of the convolution kernel receptive field, the feature representation of the center node is updated by aggregating adjacent nodes in each center node k-hop, and the parameter θ k The weight of the k-hop adjacency. The final formula result does not need to be subjected to matrix decomposition, but is transformed (reconstructed) to the Laplace matrix L, so that the calculated amount is obviously reduced. Wherein generally k<n。
Wherein the recursion of the chebyshev polynomials described above is defined as:
the number n of convolution kernel parameters of the GCN network is reduced to k, and the calculation complexity is reduced through iterative definition from the original global convolution to the current local convolution, namely, the node which is away from the central node k-hop is used as an adjacent node.
Step S13: based on the training result, obtaining a weight matrix of the final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of a plurality of graph convolution layers.
In the embodiment of the application, the process of training the neural network is the process of adjusting parameters, and the more the number of layers of the neural network is, the more parameters (weights and bias) can be adjusted means the greater the degree of freedom of adjustment, so that the approximation effect is better. Deep neural networks have been a hotspot problem, and graphic neural networks (GCNs) have not been exceptional, and in the past various experiments and from different aspects (e.g., power system perspective) analyzed that GCN networks increased with the number of layers, node representations were more global and smoother, and each layer of convolution was equivalent to letting node representations approach unity. There is little distinction in dense parts, while sparse parts, the information obtained is relatively not much, which is an overcorrection.
Due to the deep GCN overcomplete, the application introduces a ResChebNet model shown in FIG. 2, and the formula is as follows:
X (l+1) =σ(Z (l+1) )+X (l)
in the application, chebNet (Chebyshev polynomial approximation graph convolution kernel) is used, and structures such as Pair Norm standardization and the like are introduced to control the sum of the distances of the feature vectors between every two nodes to be a constant, so that the distance of the feature vector of the node with a longer distance is also longer.
Further, the transfer learning is a very important deep learning strategy. It reuses knowledge obtained by solving one problem by applying it to another different but related problem, i.e. migration of knowledge from the source domain to the target domain, which will have a great positive impact on many domains that are difficult to improve due to insufficient training data, the learning process of migration learning is shown in fig. 4.
Deep migration learning is divided into four categories: instance-based deep transfer learning, map-based deep transfer learning, network-based deep transfer learning, and countermeasure-based deep transfer learning. The application uses the deep migration learning based on parameters. Because the types of the sensors applied in the experiment are the same, and the acquired data are the same, if the input dimensions of the sensors are the same, the constructed residual error network is the same, and the method is very suitable for optimizing and adding the learning efficiency of the residual error GNN by using the transfer learning based on parameters.
The present application contemplates deep migration learning between different data sets with different sensor settings or activity types. The ResGCNN deep transfer learning includes three main phases, including:
1) The source domain training is performed on the network using the large-scale training dataset.
2) The network that partially pre-processes the source domain migrates into a new network designed for the target domain.
3) Updating the fine-tuning strategy of the transmitted sub-network for the new training task.
Firstly, single position sensor (9 channels) data or three position sensors (27 channels) are selected from the PAMAP2 data set and input into the ResGCNN network for learning and classifying, and meanwhile, parameters learned by the structure in a residual network part are reserved.
The other 3 data sets are then separately input into the network for classification testing, with the care being taken that their sensors are identical in number (i.e., the number of channels is identical) to ensure that they have the same input dimensions. And constructing a residual network structure which is the same as the PAMAP2 data set, and modifying and adding a full connection layer according to the classification requirements of different data sets. The new data set is trained by directly transferring the previously trained PAMAP2 residual network parameters to the new training and locking the parameters, so that the parameters that are iteratively optimized for the new training are only the last full-connection layer portion. To demonstrate the ability of the ResGCNN network to migrate and learn in small samples, the present application will take 30% of the original new sample set for testing.
As shown in fig. 3, the terminal device uses a model of a target data set sample to adaptively optimize all connection layers in the target model, and the last part of the ResGCNN uses a Softmax layer as a HAR classifier, and the data sets are respectively input into a network for training, so that the weight of each layer is continuously optimized. Finally, the terminal device uses the pre-training blocks in the ResGCNN structure executed on the source domain as feature extractors in the target domain to perform migration learning on the ResGCNN.
Further, for classification tasks, classification accuracy, recall, F1 score, and confusion matrix are used to describe the completed results. For each activity category in the dataset, the predictions of the model are compared to the ground truth labels to calculate the number of True Positives (TP), true Negatives (TN), false Positives (FP), and False Negatives (FN). The overall accuracy ACC is equal to:
and the Precision (Precision) and Recall (Recall) of a typical class can be calculated by the following formula:
F1-Score is a balanced combination of precision and recall, and the calculation formula is:
the average of these activity signatures was used to evaluate each experiment. Furthermore, the confusion matrix relates to the visualization of the model performance.
In the embodiment of the application, a terminal device acquires a training data set, and preprocesses training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data; inputting the training graph data into a graph neural network for training, wherein the graph neural network comprises a plurality of graph convolution layers which are sequentially connected; and based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers. Through the mode, the training data set is subjected to data preprocessing, so that training graph data input by the graph neural network is obtained, and the training efficiency and the training accuracy of the neural network are improved.
With continued reference to fig. 5, fig. 5 is a flowchart illustrating an embodiment of a human motion recognition method according to the present application.
As shown in fig. 5, the human motion recognition method according to the embodiment of the present application may specifically include the following steps:
step S21: human motion data of a user is acquired using a wearable sensor.
In the embodiment of the application, the terminal equipment acquires the human body motion data of the user through the wearable sensor on the user.
Step S22: preprocessing the human motion data to obtain human motion map data.
In the embodiment of the present application, the specific process of data preprocessing in step S22 is referred to step S11 in the above embodiment, and will not be described herein.
Step S23: and inputting the human body movement map data into a pre-trained map neural network, and acquiring the prediction information of the map neural network on the human body movement of the user based on the human body movement map data.
In the embodiment of the present application, the pre-trained neural network may specifically be the neural network trained in the foregoing embodiment, and the training process is not described herein.
Step S24: based on the prediction information, a motion state of the user is obtained.
In the embodiment of the application, a scheme for solving the HAR problem for the sensor from the view of a construction diagram is provided in the application. The method maps the data collected by the person through the relativity of sensors at different positions on the person, uses a graph neural network modeling based on a graph theory, and classifies actions through action information and interrelation between the sensors contained in a graph network learning graph. The method constructs a complete HAR framework, selects the graph neural network to model human body movement, confirms that the GNN network has strong migration learning capability and multi-angle learning capability in the HAR field, effectively overcomes the defect that the traditional deep learning cannot effectively capture the graph structure data relationship of a non-European space, and provides a new idea of modeling on the human body movement graph structure data based on a sensor.
The application proves that the image neural network is feasible in the human body motion recognition based on the sensor, provides a data preprocessing method for converting information collected by the sensor into an image structure, is equivalent to the traditional depth (model (CNN, RNN, LSTM, DEEP-LSTM) to obtain a closer or better result on a data set of the method, and also provides a new thought of the image neural network in the human body motion recognition based on the sensor.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
With continued reference to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of a terminal device according to the present application. The terminal device 500 of the embodiment of the present application includes a processor 51, a memory 52, an input-output device 53, and a bus 54.
The processor 51, the memory 52, and the input/output device 53 are respectively connected to the bus 54, and the memory 52 stores program data, and the processor 51 is configured to execute the program data to implement the neural network training method and/or the human motion recognition method according to the above embodiments.
In an embodiment of the present application, the processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with signal processing capabilities. Processor 51 may also be a general purpose processor, a digital signal processor (DSP, digital Signal Process), an application specific integrated circuit (ASIC, application Specific Integrated Circuit), a field programmable gate array (FPGA, field Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.
The present application further provides a computer storage medium, and referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the computer storage medium provided by the present application, in which program data 61 is stored in the computer storage medium 600, and the program data 61 is used to implement the neural network training method and/or the human motion recognition method according to the above embodiments when being executed by a processor.
Embodiments of the present application may be stored in a computer readable storage medium when implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description is only illustrative of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present application.

Claims (10)

1. A human motion recognition method, characterized in that the human motion recognition method comprises:
acquiring human body motion data of a user by using a wearable sensor;
preprocessing the human motion data to obtain human motion map data;
inputting the human motion map data into a pre-trained map neural network, and acquiring the prediction information of the map neural network on the human motion of the user based on the human motion map data;
acquiring the motion state of the user based on the prediction information;
the graph neural network is obtained through training by a neural network training method, and the neural network training method comprises the following steps:
acquiring a training data set, preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data, and the training data set comprises a MHEALTH data set and a PAMAP2 data set;
inputting the training graph data into a graph neural network for training, wherein the graph neural network comprises a plurality of graph convolution layers which are sequentially connected;
based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers;
the training of the training graph data input graph neural network comprises the following steps:
inputting each training graph data into a first graph convolution layer of the graph neural network, and obtaining a first output of the first graph convolution layer;
and inputting the first output into a next graph convolution layer of the first graph convolution layer, so that the first output is used as the input of the next graph convolution layer to train until the training of all the graph convolution layers of the graph neural network is completed.
2. The method for recognizing human motion according to claim 1, wherein,
the step of inputting the first output into a next graph convolution layer of the first graph convolution layer to train the first output as the input of the next graph convolution layer comprises the following steps:
superposing the first output and the training diagram data to obtain fusion data;
and inputting the fusion data into the next graph convolution layer to train the fusion data as the input of the next graph convolution layer.
3. The method for recognizing human motion according to claim 1, wherein,
the first output is generated by the training image data and training weight calculation of the first graph convolution layer;
the first output is converted to the input of the next graph convolutional layer by an activation function.
4. The method for recognizing human motion according to claim 1, wherein,
the step of inputting the training diagram data into a diagram neural network for training comprises the following steps:
extracting spatial features from node features of the training graph data by using a Laplacian operator;
constructing a diagonal matrix by using training weights of the graph neural network as diagonal elements;
and forming the output of the graph neural network for training by using the spatial characteristics and the diagonal matrix.
5. The method for recognizing human motion according to claim 4, wherein,
the training to form the output of the graph neural network by using the spatial features and the diagonal matrix comprises the following steps:
acquiring the spatial characteristics of each node characteristic;
based on a preset convolution kernel receptive field, updating the spatial characteristics of each node characteristic;
and forming the output of the graph neural network for training by using the updated spatial characteristics and the diagonal matrix.
6. The method for recognizing human motion according to claim 5, wherein,
the updating the spatial feature of each node feature based on the preset convolution kernel receptive field comprises the following steps:
setting a chebyshev polynomial recursion equation according to the preset convolution kernel receptive field;
and inputting the spatial characteristics of each node characteristic into the Chebyshev polynomial recursion equation, and recursively obtaining the spatial characteristics of each node characteristic after updating.
7. The method for recognizing human motion according to claim 1, wherein,
and the graphic neural network is further connected with at least one full-connection layer after the plurality of graphic convolution layers, and the at least one full-connection layer is used for training classification tasks.
8. The method for recognizing human motion according to claim 1, wherein,
based on the training result, the weight matrix of the final graph neural network is obtained, and after the neural network training is completed, the neural network training method further comprises the following steps:
migrating the graph neural network with the neural network training completed to other neural networks as a part of network structures of the other neural networks, thereby forming a migrated neural network;
and training the migration neural network again.
9. A terminal device, comprising a memory and a processor coupled to the memory;
wherein the memory is for storing program data and the processor is for executing the program data to implement the human motion recognition method according to any one of claims 1 to 8.
10. A computer storage medium for storing program data which, when executed by a computer, is adapted to carry out the human motion recognition method according to any one of claims 1 to 8.
CN202210585190.2A 2022-05-26 2022-05-26 Neural network training method, human motion recognition method and device, and storage medium Active CN114943324B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210585190.2A CN114943324B (en) 2022-05-26 2022-05-26 Neural network training method, human motion recognition method and device, and storage medium
PCT/CN2022/108857 WO2023226186A1 (en) 2022-05-26 2022-07-29 Neural network training method, human activity recognition method, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210585190.2A CN114943324B (en) 2022-05-26 2022-05-26 Neural network training method, human motion recognition method and device, and storage medium

Publications (2)

Publication Number Publication Date
CN114943324A CN114943324A (en) 2022-08-26
CN114943324B true CN114943324B (en) 2023-10-13

Family

ID=82908434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210585190.2A Active CN114943324B (en) 2022-05-26 2022-05-26 Neural network training method, human motion recognition method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN114943324B (en)
WO (1) WO2023226186A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471910A (en) * 2022-09-06 2022-12-13 中国科学院深圳先进技术研究院 Model training method and device for motion activity recognition model based on FPGA
CN115907001B (en) * 2022-11-11 2023-07-04 中南大学 Knowledge distillation-based federal graph learning method and automatic driving method

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171134A (en) * 2017-12-20 2018-06-15 中车工业研究院有限公司 A kind of operational motion discrimination method and device
CN109215036A (en) * 2018-08-01 2019-01-15 浙江深眸科技有限公司 Human body segmentation's method based on convolutional neural networks
US10416755B1 (en) * 2018-06-01 2019-09-17 Finch Technologies Ltd. Motion predictions of overlapping kinematic chains of a skeleton model used to control a computer system
CN110334573A (en) * 2019-04-09 2019-10-15 北京航空航天大学 A kind of human motion state method of discrimination based on intensive connection convolutional neural networks
CN110929029A (en) * 2019-11-04 2020-03-27 中国科学院信息工程研究所 Text classification method and system based on graph convolution neural network
KR102196962B1 (en) * 2020-03-05 2020-12-31 강윤 Motion recognition of human body using matrix pressure sensor and human body motion prediction system
CN112633482A (en) * 2020-12-30 2021-04-09 广州大学华软软件学院 Efficient width map convolution neural network model and training method thereof
CN112767553A (en) * 2021-02-02 2021-05-07 华北电力大学 Self-adaptive group clothing animation modeling method
CN113240714A (en) * 2021-05-17 2021-08-10 浙江工商大学 Human motion intention prediction method based on context-aware network
CN113255798A (en) * 2021-06-02 2021-08-13 苏州浪潮智能科技有限公司 Classification model training method, device, equipment and medium
CN113326930A (en) * 2020-02-29 2021-08-31 华为技术有限公司 Data processing method, neural network training method, related device and equipment
CN113642432A (en) * 2021-07-30 2021-11-12 南京师范大学 Method for identifying human body posture by convolutional neural network based on covariance matrix transformation
CN113642379A (en) * 2021-05-18 2021-11-12 北京航空航天大学 Human body posture prediction method and system based on attention mechanism fusion multi-flow graph
CN113705772A (en) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 Model training method, device and equipment and readable storage medium
CN114330670A (en) * 2022-01-04 2022-04-12 京东科技信息技术有限公司 Graph neural network training method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304795B (en) * 2018-01-29 2020-05-12 清华大学 Human skeleton behavior identification method and device based on deep reinforcement learning
CN109766895A (en) * 2019-01-03 2019-05-17 京东方科技集团股份有限公司 The training method and image Style Transfer method of convolutional neural networks for image Style Transfer
CN110222653B (en) * 2019-06-11 2020-06-16 中国矿业大学(北京) Skeleton data behavior identification method based on graph convolution neural network
CN112183315B (en) * 2020-09-27 2023-06-27 哈尔滨工业大学(深圳) Action recognition model training method and action recognition method and device

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171134A (en) * 2017-12-20 2018-06-15 中车工业研究院有限公司 A kind of operational motion discrimination method and device
US10416755B1 (en) * 2018-06-01 2019-09-17 Finch Technologies Ltd. Motion predictions of overlapping kinematic chains of a skeleton model used to control a computer system
CN109215036A (en) * 2018-08-01 2019-01-15 浙江深眸科技有限公司 Human body segmentation's method based on convolutional neural networks
CN110334573A (en) * 2019-04-09 2019-10-15 北京航空航天大学 A kind of human motion state method of discrimination based on intensive connection convolutional neural networks
CN110929029A (en) * 2019-11-04 2020-03-27 中国科学院信息工程研究所 Text classification method and system based on graph convolution neural network
CN113326930A (en) * 2020-02-29 2021-08-31 华为技术有限公司 Data processing method, neural network training method, related device and equipment
KR102196962B1 (en) * 2020-03-05 2020-12-31 강윤 Motion recognition of human body using matrix pressure sensor and human body motion prediction system
CN112633482A (en) * 2020-12-30 2021-04-09 广州大学华软软件学院 Efficient width map convolution neural network model and training method thereof
CN112767553A (en) * 2021-02-02 2021-05-07 华北电力大学 Self-adaptive group clothing animation modeling method
CN113240714A (en) * 2021-05-17 2021-08-10 浙江工商大学 Human motion intention prediction method based on context-aware network
CN113642379A (en) * 2021-05-18 2021-11-12 北京航空航天大学 Human body posture prediction method and system based on attention mechanism fusion multi-flow graph
CN113255798A (en) * 2021-06-02 2021-08-13 苏州浪潮智能科技有限公司 Classification model training method, device, equipment and medium
CN113705772A (en) * 2021-07-21 2021-11-26 浪潮(北京)电子信息产业有限公司 Model training method, device and equipment and readable storage medium
CN113642432A (en) * 2021-07-30 2021-11-12 南京师范大学 Method for identifying human body posture by convolutional neural network based on covariance matrix transformation
CN114330670A (en) * 2022-01-04 2022-04-12 京东科技信息技术有限公司 Graph neural network training method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Body Pose Prediction Based on Motion Sensor Data and Recurrent Neural Network;Marcin Wozniak等;《IEEE Transactions on Industrial Informatics》;第2101-2111页 *
张晋等.面向人体动作预测的对称残差网络.《机器人》.2022,第291-298页. *
面向人体动作预测的对称残差网络;张晋等;《机器人》;第291-298页 *

Also Published As

Publication number Publication date
CN114943324A (en) 2022-08-26
WO2023226186A1 (en) 2023-11-30

Similar Documents

Publication Publication Date Title
Mutegeki et al. A CNN-LSTM approach to human activity recognition
Ha et al. Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors
Dua et al. Multi-input CNN-GRU based human activity recognition using wearable sensors
CN110309861B (en) Multi-modal human activity recognition method based on generation of confrontation network
CN108960337B (en) Multi-modal complex activity recognition method based on deep learning model
CN114943324B (en) Neural network training method, human motion recognition method and device, and storage medium
Ahmed The impact of filter size and number of filters on classification accuracy in CNN
Gil-Martín et al. Improving physical activity recognition using a new deep learning architecture and post-processing techniques
CN111539941B (en) Parkinson&#39;s disease leg flexibility task evaluation method and system, storage medium and terminal
CN106570522B (en) Object recognition model establishing method and object recognition method
Hou A study on IMU-based human activity recognition using deep learning and traditional machine learning
Yu et al. A multi-layer parallel lstm network for human activity recognition with smartphone sensors
CN112990211A (en) Neural network training method, image processing method and device
CN106909938A (en) Viewing angle independence Activity recognition method based on deep learning network
Nafea et al. Multi-sensor human activity recognition using CNN and GRU
CN113011562A (en) Model training method and device
Banjarey et al. Human activity recognition using 1D convolutional neural network
Han et al. GraphConvLSTM: Spatiotemporal learning for activity recognition with wearable sensors
Chowdhury et al. hActNET: an improved neural network based method in recognizing human activities
Singh et al. Har using bi-directional lstm with rnn
CN108009512A (en) A kind of recognition methods again of the personage based on convolutional neural networks feature learning
Cao et al. QMEDNet: A quaternion-based multi-order differential encoder–decoder model for 3D human motion prediction
Li et al. Multi-convLSTM neural network for sensor-based human activity recognition
Alghazzawi et al. Sensor-based human activity recognition in smart homes using depthwise separable convolutions
Qin et al. NDGCN: network in network, dilate convolution and graph convolutional networks based transportation mode recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant