CN114943324B

CN114943324B - Neural network training method, human motion recognition method and device, and storage medium

Info

Publication number: CN114943324B
Application number: CN202210585190.2A
Authority: CN
Inventors: 颜延; 廖天正; 赵金津; 任旭超; 赵瑞麒; 马良; 王磊; 刘语诗; 熊璟
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2023-10-13
Anticipated expiration: 2042-05-26
Also published as: CN114943324A; WO2023226186A1

Abstract

The application discloses a neural network training method, a human motion recognition method, a device and a storage medium, wherein the neural network training method comprises the following steps: acquiring a training data set, and preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data; inputting the training graph data into a graph neural network for training, wherein the graph neural network comprises a plurality of graph convolution layers which are sequentially connected; and based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers. Through the mode, the training data set is subjected to data preprocessing, so that training graph data input by the graph neural network is obtained, and the training efficiency and the training accuracy of the neural network are improved.

Description

Neural network training method, human motion recognition method and device, and storage medium

Technical Field

The application relates to the technical field of neural networks, in particular to a neural network training method, a human motion recognition method, equipment and a storage medium.

Background

As a typical pattern recognition problem, many conventional machine learning algorithms have long been used to solve the sensor-based HAR (human activity recognition, human motion behavior recognition) problem, including decision trees, random forests, support vector machines, bayesian networks, markov models, and the like. Under the strict control environment and limited input, the traditional maximum appearance algorithm achieves good classification effect, but the traditional manual feature method is long in time consumption, and the extracted features lack incremental and unsupervised learning capability and generalization capability.

Disclosure of Invention

The application mainly provides a neural network training method, a human motion recognition method, equipment and a storage medium, which are used for solving the problems that the traditional manual feature method consumes a long time, and the extracted features lack incremental and unsupervised learning ability and generalization ability.

In order to solve the technical problems, the application adopts a technical scheme that: provided is a neural network training method, including:

acquiring a training data set, and preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data;

inputting the training graph data into a graph neural network for training, wherein the graph neural network comprises a plurality of graph convolution layers which are sequentially connected;

and based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers.

According to an embodiment of the present application, the training map data input into the map neural network for training includes:

inputting each training image data into a first graph convolution layer of the graph neural network, and obtaining a first output of the first graph convolution layer;

and inputting the first output into a next graph convolution layer of the first graph convolution layer, so that the first output is used as the input of the next graph convolution layer to train until the training of all the graph convolution layers of the graph neural network is completed.

According to an embodiment of the present application, the step of inputting the first output into a next graph convolution layer of the first graph convolution layer to train the first output as the input of the next graph convolution layer includes:

superposing the first output and the training diagram data to obtain fusion data;

and inputting the fusion data into the next graph convolution layer to train the fusion data as the input of the next graph convolution layer.

According to an embodiment of the present application, the first output is generated by training image data and training weight calculation of the first graph convolution layer;

the first output is converted to the input of the next graph convolutional layer by an activation function.

extracting spatial features from node features of the training graph data by using a Laplacian operator;

constructing a diagonal matrix by using training weights of the graph neural network as diagonal elements;

and forming the output of the graph neural network for training by using the spatial characteristics and the diagonal matrix.

According to an embodiment of the present application, the training for forming the output of the graph neural network by using the spatial feature and the diagonal matrix includes:

acquiring the spatial characteristics of each node characteristic;

based on a preset convolution kernel receptive field, updating the spatial characteristics of each node characteristic;

and forming the output of the graph neural network for training by using the updated spatial characteristics and the diagonal matrix.

According to an embodiment of the present application, the updating the spatial feature of each node feature based on the preset convolution kernel receptive field includes:

setting a chebyshev polynomial recursion equation according to the preset convolution kernel receptive field;

and inputting the spatial characteristics of each node characteristic into the Chebyshev polynomial recursion equation, and recursively obtaining the spatial characteristics of each node characteristic after updating.

According to an embodiment of the present application, after the plurality of graph convolutional layers, the graph neural network is further connected with at least one full-connection layer, and the at least one full-connection layer is used for training classification tasks.

According to an embodiment of the present application, the training result is based on that the weight matrix of the final graph neural network is obtained, and after the neural network training is completed, the neural network training method further includes:

migrating the graph neural network with the neural network training completed to other neural networks as a part of network structures of the other neural networks, thereby forming a migrated neural network;

and training the migration neural network again.

In order to solve the technical problems, the application adopts another technical scheme that: provided is a human motion recognition method including:

acquiring human body motion data of a user by using a wearable sensor;

preprocessing the human motion data to obtain human motion map data;

inputting the human motion map data into a pre-trained map neural network, and acquiring the prediction information of the map neural network on the human motion of the user based on the human motion map data;

acquiring the motion state of the user based on the prediction information;

the graph neural network is obtained through training by the neural network training method.

In order to solve the technical problems, the application adopts another technical scheme that: providing a terminal device comprising a memory and a processor coupled to the memory;

the memory is used for storing program data, and the processor is used for executing the program data to realize the neural network training method and/or the human body motion recognition method.

In order to solve the technical problems, the application adopts another technical scheme that: there is provided a computer storage medium for storing program data which, when executed by a computer, is adapted to carry out a neural network training method and/or a human motion recognition method as described above.

The application provides a neural network training method, a human motion recognition method, a device and a storage medium, wherein the neural network training method comprises the following steps: acquiring a training data set, and preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data; inputting the training graph data into a graph neural network for training, wherein the graph neural network comprises a plurality of graph convolution layers which are sequentially connected; and based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers. Through the mode, the training data set is subjected to data preprocessing, so that training graph data input by the graph neural network is obtained, and the training efficiency and the training accuracy of the neural network are improved.

Drawings

For a clearer description of the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the description below are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art, wherein:

FIG. 1 is a flow chart of an embodiment of a neural network training method provided by the present application;

FIG. 2 is a schematic diagram of the framework of the neural network provided by the present application;

FIG. 3 is a schematic diagram of a main flow of the neural network training method provided by the present application;

FIG. 4 is a schematic diagram of a learning process of the transfer learning provided by the present application;

FIG. 5 is a schematic flow chart of an embodiment of a human motion recognition method according to the present application;

fig. 6 is a schematic structural diagram of an embodiment of a terminal device provided by the present application;

fig. 7 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present application, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.

In addition, if there is a description of "first", "second", etc. in the embodiments of the present application, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.

The daily exercise behavior of the human body is closely related to the health index and energy balance of the human body. For example, the energy consumption of an individual can be calculated through monitoring exercise behaviors such as running, walking and the like, which has positive significance in the aspects of healthy exercise of the individual, energy balance of the body and the like. In addition, people with dangerous conditions can be effectively timely helped through the identification of abnormal movement behaviors (such as falling and the like) of the human body.

Early human motion behavior recognition (human activity recognition, HAR) using machine vision was a popular direction to capture images or video streams to detect human behavior using image/video processing techniques, such as achieved in the video-based HAR field. However, this method is limited by the influence of complex scenes, uncertainty of actions, privacy problems caused by cameras need to be considered, and the method is only suitable for some specific scenes. In contrast, the wearable sensor is not easily disturbed by the environment, and the acquired signals are more continuous and accurate and can be used for wider scenes.

Sensor technology has made remarkable progress in many areas of computing power, size, accuracy, and manufacturing cost over the past decade. These advances enable most sensors to be integrated into smartphones and other portable devices, making these devices more intelligent and practical. The wearable sensors commonly used for HARs are accelerometers, magnetometers, gyroscopes and integrated inertial measurement units (integrated inertial measurement units, IMU).

Research based on deep learning gradually achieves excellent results and takes the dominant role in the field of human body movement behavior recognition. The automatic feature extraction by the multi-layer neural network significantly reduces the pretreatment of features, and deep learning architecture has proven to perform well in unsupervised learning and reinforcement learning.

The present application proposes a solution to the HAR problem for the sensor from the point of view of building the map. The limbs of the person can cooperate with each other in the moving process, the data collected by the person are mapped through the relativity of the sensors at different positions on the person, the map neural network modeling based on the map theory is used, and the actions are classified through the action information and the interrelationship between the sensors contained in the map network learning map.

In this regard, the application constructs a complete HAR framework, selects the graph neural network to model the human motion, confirms that the GNN (Graph Neural Network ) network has strong transfer learning capability and multi-angle learning capability in the HAR field, effectively makes up the defect that the traditional deep learning cannot effectively capture the graph structure data relationship of the non-European space, and proposes a new idea of modeling on the human motion graph structure data based on the sensor.

Based on the technical foundation, the application provides a specific training method of the graph neural network. Referring to fig. 1 to 3, fig. 1 is a schematic flow chart of an embodiment of a neural network training method provided by the present application, fig. 2 is a schematic frame diagram of the neural network provided by the present application, and fig. 3 is a schematic flow chart of the neural network training method provided by the present application.

As shown in fig. 1, the neural network training method according to the embodiment of the present application may specifically include the following steps:

step S11: and acquiring a training data set, and preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data.

In the embodiment of the present application, the data set adopted by the present application may be MHEALTH data set and PAMAP2 data set, and in other embodiments, other data sets may also be adopted, which is not limited herein. The data of the above two data sets are described below:

MHEALTH dataset

The dataset included data from 10 participants in the laboratory environment. Each subject wears wearable sensors connected to the chest, right wrist and left ankle. Physical activities such as standing, sitting, lying, walking, climbing stairs, bending waist forward, lifting the forearm, bending the knee, riding a bicycle, jogging, running, jumping forward and the like all participate in the experiment. The sampling rate of the recorded data was 50Hz. There are then 12 activity categories in the MHEALTH dataset, a total of 21 channels of sensory signals, in which method the perceived information of the user's body is captured by the chest sensor, and two other from the back sensor.

PAMAP2 data set

The dataset included data obtained from 9 participants aged 24 to 30. The participants wear IMUs (Inertial Measurement Unit, inertial measurement units) on the dominant side of the user, wrist, ankle and chest. Activities performed by each person include ten actions of lying, sitting, standing, walking, running, riding a bicycle, walking fast, going up stairs, going down stairs, and rope skipping. Each IMU contains two 3D acceleration sensors, one gyroscope sensor, one magnetometer sensor, with a sampling frequency of 100Hz. Each IMU contains nine-axis sensor information, for a total of 27 channel sensor signals, in which method the application only requires information from 3 sensors in the dataset, including the right waist, left ankle and back, to maintain consistency in sensor position.

Before inputting the data of the training set into the graph neural network for training, the terminal equipment needs to preprocess the data of the training set and convert the data of the training set into graph data. The specific pretreatment process is as follows:

firstly, the terminal equipment carries out noise filtering normalization on training data acquired by all sensors according to a time sequence and resamples the training data to 50Hz. Secondly, the training data is windowed by a sliding window with a fixed length of 128 and an overlapping rate of 50%, and in other embodiments, sliding windows with different lengths may be used, which will not be described herein.

The terminal device may obtain 5361 active time-series segments from the MHEALTH data set and 11784 active time-series segments from the pamp 2 data set according to the sampling frequency of the different data sets, e.g., the sampling frequency of 50Hz MHEALTH data set with a 2.56 seconds per window duration and the sampling frequency of 100Hz pamp 2 data set with a 1.28 seconds per window duration.

The terminal device regards each active time series segment as a training sample and establishes graph data for each training sample as input to the GNN network. One sensor channel is regarded as a node, the pearson correlation coefficient is used for calculating the correlation between each node, a correlation coefficient matrix is obtained, two nodes with the correlation coefficient larger than 0.2 are regarded as nodes with high correlation and are connected, and data with the length of 128 are embedded into the corresponding sensor channel points to form graph data based on one time slice. Wherein the length of the graph data is determined by the length of the sliding window.

As shown in fig. 3, the terminal device performs related preprocessing on the data of the human body sensor to filter out unnecessary noise information and interference information, and then windows the data to construct a graph for each time sequence segment as input of the GNN network.

Step S12: training the training graph data into a graph neural network, wherein the graph neural network comprises a plurality of graph convolution layers which are connected in sequence.

In an embodiment of the application, the graph roll-up neural network (Graph Convolutional Network, GCN) is a deep learning model that acts on non-euclidean space, unlike conventional deep learning. The application uses GCN network as the neural network to be trained because the application shows incomparable advantages in non-European space such as human body motion recognition based on video over other depth models, and the text has a potential graph structure relation in human body motion recognition based on sensors.

In this regard, the present application proposes a new ResGCNN framework that includes a residual map network structure that uses the same parameters sharing as training weights. As shown in fig. 2, the neural network of the present application includes a plurality of sequentially connected graph convolution layers (ChebNet layers), and the output of each graph convolution Layer and the previous graph convolution Layer are used as inputs. In addition, the terminal device can also connect a full connection layer after a plurality of sequentially connected graph convolution layers, the graph convolution layers are used for feature extraction, and the full connection layer is used for classification tasks.

In the embodiment of the application, the terminal equipment constructs a 16-layer ResChebNet model on the basis of human motion recognition based on the sensor. The ResGCNN framework includes four ResChebNet blocks and two additional Fully Connected (FC) layers to solve the problems of overcomplete and gradient disappearance. At the same time, an intra-block residual structure is involved that adds the inputs of the four blocks to the output of the last block as the final output of the ResChebNet block.

On the basis of human body motion recognition on a sensor, compared with the traditional depth model (CNN, LSTM, DEEP-LSTM and the like), the multi-layer ResChebNet modeling shown in fig. 2 effectively learns the non-European structure relation of the upper sensor, introduces a residual structure and icon standardization PairNorm to solve the over-smooth problem and the condition of gradient disappearance, simultaneously introduces a local residual structure to fully learn local structure perception, fully learns the relation of the graph structure based on the human body motion of the sensor, and enables the result to be more accurate and more powerful.

Based on the ResChebNet model shown in FIG. 2, it is assumed that a training diagram data G is given, which is composed of N vertices and edges formed by the N vertices, such that an edge between any two vertices I and J represents their similarity. The adjacency matrix A of the graph data is a sparse matrix with equal I and J terms, and the value of the sparse matrix A is 1 when the I and the J have connecting edges, and the value of the sparse matrix A is 0 otherwise.

In addition, each node in the graph data has an F-dimensional feature vector, X ε R ^N×F Representing the feature matrix of all N nodes. The dimension of the feature vector of the node is determined by the length of the graph data. The L-layer graph convolutional neural network (GCN) consists of L-layer graph convolution, which is 16-layer graph convolution as shown in FIG. 2. Each convolution layer constructs the input of each node of the convolution layer of the current layer through the output of each node of the previous layer, and the expression form is as follows:

Z ^(l+1) ＝A ^′ X ^(l) W ^(l) ,X ^(l+1) ＝σ(Z ^(l+1) )

wherein, the liquid crystal display device comprises a liquid crystal display device,is the input of the convolution of the L-layer graph of N nodes, X ⁽⁰⁾ ＝X；/>Sigma (·) is the activation function, typically selecting a ReLU; d is a degree matrix, and the specific calculation formula is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,the weight matrix that can be learned is a matrix for transforming the features of the downstream learning task.

Further, in the process of feature extraction and feature transformation of each convolution layer, the terminal device can further popularize the traditional fourier transformation into the fourier transformation on the graph through a graph theory and a convolution theorem, and the formula is as follows:

wherein U is a feature vector matrix decomposed by a Laplace matrix L, namely a Laplace operator, f is node features of input graph data, and h is a topological space feature extracted by a trainable and parameter-sharing convolution kernel.

The GCN network convolution operation core is a trainable and parameter sharing convolution core, which is to be described by the GCNDiagonal element->Instead of a learnable parameter θ, the training is then performed by back-propagating the adjustment parameter θ, so the training formula of the GCN network can be expressed as:

Y＝σ(Ug(θ)U ^T x)

wherein x is a representation vector of each node characteristic in the graph data, and Y is output of each node characteristic after being convolved by the GCN network; each node feature in the graph data is convolved with a convolution kernel to extract a corresponding topological space, and then propagated to the next layer through an activation function σ.

Further, since the GCN network has a drawback that the feature substation is required to perform the laplace matrix, matrix multiplication is calculated in each forward propagation process, and when the graph data size is large, the time complexity is O (n ² ) Time consuming. Wherein, the liquid crystal display device comprises a liquid crystal display device,the number of convolution kernels of the graph neural network is n, and when n is large, node characteristics are slowly updated. The multi-layer GCN network has an over-smoothing problem, the representation vectors of the node characteristics tend to be consistent, and the nodes are difficult to distinguish.

Therefore, the application approximates the convolution kernel by using a k-order chebshaev polynomial, which is brought into the fourier transform of the graph, and the expression formula is as follows:

wherein the weight parameter is theta _k For the k-th power of the matrix, the node connected with the intermediate node k-hop, namely L, can be obtained _k Whether or not the element in (a) is 0 indicates whether or not a node in the graph data can reach another node through k hops, where k indicates the size of the convolution kernel receptive field, the feature representation of the center node is updated by aggregating adjacent nodes in each center node k-hop, and the parameter θ _k The weight of the k-hop adjacency. The final formula result does not need to be subjected to matrix decomposition, but is transformed (reconstructed) to the Laplace matrix L, so that the calculated amount is obviously reduced. Wherein generally k<n。

Wherein the recursion of the chebyshev polynomials described above is defined as:

the number n of convolution kernel parameters of the GCN network is reduced to k, and the calculation complexity is reduced through iterative definition from the original global convolution to the current local convolution, namely, the node which is away from the central node k-hop is used as an adjacent node.

Step S13: based on the training result, obtaining a weight matrix of the final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of a plurality of graph convolution layers.

In the embodiment of the application, the process of training the neural network is the process of adjusting parameters, and the more the number of layers of the neural network is, the more parameters (weights and bias) can be adjusted means the greater the degree of freedom of adjustment, so that the approximation effect is better. Deep neural networks have been a hotspot problem, and graphic neural networks (GCNs) have not been exceptional, and in the past various experiments and from different aspects (e.g., power system perspective) analyzed that GCN networks increased with the number of layers, node representations were more global and smoother, and each layer of convolution was equivalent to letting node representations approach unity. There is little distinction in dense parts, while sparse parts, the information obtained is relatively not much, which is an overcorrection.

Due to the deep GCN overcomplete, the application introduces a ResChebNet model shown in FIG. 2, and the formula is as follows:

X ^(l+1) ＝σ(Z ^(l+1) )+X ^(l)

in the application, chebNet (Chebyshev polynomial approximation graph convolution kernel) is used, and structures such as Pair Norm standardization and the like are introduced to control the sum of the distances of the feature vectors between every two nodes to be a constant, so that the distance of the feature vector of the node with a longer distance is also longer.

Further, the transfer learning is a very important deep learning strategy. It reuses knowledge obtained by solving one problem by applying it to another different but related problem, i.e. migration of knowledge from the source domain to the target domain, which will have a great positive impact on many domains that are difficult to improve due to insufficient training data, the learning process of migration learning is shown in fig. 4.

Deep migration learning is divided into four categories: instance-based deep transfer learning, map-based deep transfer learning, network-based deep transfer learning, and countermeasure-based deep transfer learning. The application uses the deep migration learning based on parameters. Because the types of the sensors applied in the experiment are the same, and the acquired data are the same, if the input dimensions of the sensors are the same, the constructed residual error network is the same, and the method is very suitable for optimizing and adding the learning efficiency of the residual error GNN by using the transfer learning based on parameters.

The present application contemplates deep migration learning between different data sets with different sensor settings or activity types. The ResGCNN deep transfer learning includes three main phases, including:

1) The source domain training is performed on the network using the large-scale training dataset.

2) The network that partially pre-processes the source domain migrates into a new network designed for the target domain.

3) Updating the fine-tuning strategy of the transmitted sub-network for the new training task.

Firstly, single position sensor (9 channels) data or three position sensors (27 channels) are selected from the PAMAP2 data set and input into the ResGCNN network for learning and classifying, and meanwhile, parameters learned by the structure in a residual network part are reserved.

The other 3 data sets are then separately input into the network for classification testing, with the care being taken that their sensors are identical in number (i.e., the number of channels is identical) to ensure that they have the same input dimensions. And constructing a residual network structure which is the same as the PAMAP2 data set, and modifying and adding a full connection layer according to the classification requirements of different data sets. The new data set is trained by directly transferring the previously trained PAMAP2 residual network parameters to the new training and locking the parameters, so that the parameters that are iteratively optimized for the new training are only the last full-connection layer portion. To demonstrate the ability of the ResGCNN network to migrate and learn in small samples, the present application will take 30% of the original new sample set for testing.

As shown in fig. 3, the terminal device uses a model of a target data set sample to adaptively optimize all connection layers in the target model, and the last part of the ResGCNN uses a Softmax layer as a HAR classifier, and the data sets are respectively input into a network for training, so that the weight of each layer is continuously optimized. Finally, the terminal device uses the pre-training blocks in the ResGCNN structure executed on the source domain as feature extractors in the target domain to perform migration learning on the ResGCNN.

Further, for classification tasks, classification accuracy, recall, F1 score, and confusion matrix are used to describe the completed results. For each activity category in the dataset, the predictions of the model are compared to the ground truth labels to calculate the number of True Positives (TP), true Negatives (TN), false Positives (FP), and False Negatives (FN). The overall accuracy ACC is equal to:

and the Precision (Precision) and Recall (Recall) of a typical class can be calculated by the following formula:

F1-Score is a balanced combination of precision and recall, and the calculation formula is:

the average of these activity signatures was used to evaluate each experiment. Furthermore, the confusion matrix relates to the visualization of the model performance.

In the embodiment of the application, a terminal device acquires a training data set, and preprocesses training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data; inputting the training graph data into a graph neural network for training, wherein the graph neural network comprises a plurality of graph convolution layers which are sequentially connected; and based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers. Through the mode, the training data set is subjected to data preprocessing, so that training graph data input by the graph neural network is obtained, and the training efficiency and the training accuracy of the neural network are improved.

With continued reference to fig. 5, fig. 5 is a flowchart illustrating an embodiment of a human motion recognition method according to the present application.

As shown in fig. 5, the human motion recognition method according to the embodiment of the present application may specifically include the following steps:

step S21: human motion data of a user is acquired using a wearable sensor.

In the embodiment of the application, the terminal equipment acquires the human body motion data of the user through the wearable sensor on the user.

Step S22: preprocessing the human motion data to obtain human motion map data.

In the embodiment of the present application, the specific process of data preprocessing in step S22 is referred to step S11 in the above embodiment, and will not be described herein.

Step S23: and inputting the human body movement map data into a pre-trained map neural network, and acquiring the prediction information of the map neural network on the human body movement of the user based on the human body movement map data.

In the embodiment of the present application, the pre-trained neural network may specifically be the neural network trained in the foregoing embodiment, and the training process is not described herein.

Step S24: based on the prediction information, a motion state of the user is obtained.

In the embodiment of the application, a scheme for solving the HAR problem for the sensor from the view of a construction diagram is provided in the application. The method maps the data collected by the person through the relativity of sensors at different positions on the person, uses a graph neural network modeling based on a graph theory, and classifies actions through action information and interrelation between the sensors contained in a graph network learning graph. The method constructs a complete HAR framework, selects the graph neural network to model human body movement, confirms that the GNN network has strong migration learning capability and multi-angle learning capability in the HAR field, effectively overcomes the defect that the traditional deep learning cannot effectively capture the graph structure data relationship of a non-European space, and provides a new idea of modeling on the human body movement graph structure data based on a sensor.

The application proves that the image neural network is feasible in the human body motion recognition based on the sensor, provides a data preprocessing method for converting information collected by the sensor into an image structure, is equivalent to the traditional depth (model (CNN, RNN, LSTM, DEEP-LSTM) to obtain a closer or better result on a data set of the method, and also provides a new thought of the image neural network in the human body motion recognition based on the sensor.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

With continued reference to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of a terminal device according to the present application. The terminal device 500 of the embodiment of the present application includes a processor 51, a memory 52, an input-output device 53, and a bus 54.

The processor 51, the memory 52, and the input/output device 53 are respectively connected to the bus 54, and the memory 52 stores program data, and the processor 51 is configured to execute the program data to implement the neural network training method and/or the human motion recognition method according to the above embodiments.

In an embodiment of the present application, the processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with signal processing capabilities. Processor 51 may also be a general purpose processor, a digital signal processor (DSP, digital Signal Process), an application specific integrated circuit (ASIC, application Specific Integrated Circuit), a field programmable gate array (FPGA, field Programmable Gate Array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.

The present application further provides a computer storage medium, and referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the computer storage medium provided by the present application, in which program data 61 is stored in the computer storage medium 600, and the program data 61 is used to implement the neural network training method and/or the human motion recognition method according to the above embodiments when being executed by a processor.

Embodiments of the present application may be stored in a computer readable storage medium when implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only illustrative of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present application.

Claims

1. A human motion recognition method, characterized in that the human motion recognition method comprises:

acquiring human body motion data of a user by using a wearable sensor;

preprocessing the human motion data to obtain human motion map data;

acquiring the motion state of the user based on the prediction information;

the graph neural network is obtained through training by a neural network training method, and the neural network training method comprises the following steps:

acquiring a training data set, preprocessing training data in the training data set to obtain a plurality of training image data, wherein each training image data is training image data of a time slice in the training data, and the training data set comprises a MHEALTH data set and a PAMAP2 data set;

based on a training result, acquiring a weight matrix of a final graph neural network to complete the training of the neural network, wherein the weight matrix of the graph neural network consists of final weights of the plurality of graph convolution layers;

the training of the training graph data input graph neural network comprises the following steps:

inputting each training graph data into a first graph convolution layer of the graph neural network, and obtaining a first output of the first graph convolution layer;

2. The method for recognizing human motion according to claim 1, wherein,

the step of inputting the first output into a next graph convolution layer of the first graph convolution layer to train the first output as the input of the next graph convolution layer comprises the following steps:

3. The method for recognizing human motion according to claim 1, wherein,

the first output is generated by the training image data and training weight calculation of the first graph convolution layer;

4. The method for recognizing human motion according to claim 1, wherein,

the step of inputting the training diagram data into a diagram neural network for training comprises the following steps:

5. The method for recognizing human motion according to claim 4, wherein,

the training to form the output of the graph neural network by using the spatial features and the diagonal matrix comprises the following steps:

acquiring the spatial characteristics of each node characteristic;

6. The method for recognizing human motion according to claim 5, wherein,

the updating the spatial feature of each node feature based on the preset convolution kernel receptive field comprises the following steps:

7. The method for recognizing human motion according to claim 1, wherein,

and the graphic neural network is further connected with at least one full-connection layer after the plurality of graphic convolution layers, and the at least one full-connection layer is used for training classification tasks.

8. The method for recognizing human motion according to claim 1, wherein,

based on the training result, the weight matrix of the final graph neural network is obtained, and after the neural network training is completed, the neural network training method further comprises the following steps:

and training the migration neural network again.

9. A terminal device, comprising a memory and a processor coupled to the memory;

wherein the memory is for storing program data and the processor is for executing the program data to implement the human motion recognition method according to any one of claims 1 to 8.

10. A computer storage medium for storing program data which, when executed by a computer, is adapted to carry out the human motion recognition method according to any one of claims 1 to 8.