CN114638285A

CN114638285A - Multi-mode identification method for mobile phone inertial sensor data

Info

Publication number: CN114638285A
Application number: CN202210179112.2A
Authority: CN
Inventors: 张沪寅; 苏今腾; 郭迟
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-06-17
Anticipated expiration: 2042-02-25
Also published as: CN114638285B

Abstract

The invention provides a multi-mode identification method for mobile phone inertial sensor data, which comprises the steps of collecting training sample data when an inertial sensor carrier uses a smart phone; cutting original sensor data by adopting a sliding window to generate a plurality of samples, and setting a label for each sample; constructing a single-task deep neural network model and training, wherein the mode number of the corresponding task is the moving direction number, the pedestrian identity number and the mobile phone carrying mode number respectively; constructing a multi-task deep network model, and enabling the model to perform transfer learning on the single-task network model and train the single-task network model; collecting test data, inputting the test data into a multi-task model for calculation, outputting probability prediction corresponding to each task, detecting a maximum value, and returning a corresponding label as a prediction result of the task if the maximum value is larger than a corresponding threshold value. The invention can effectively dig out the information hidden in the sensor data and accurately identify the moving direction, identity authentication and mobile phone carrying mode of the mobile phone carrier.

Description

Multi-mode identification method for mobile phone inertial sensor data

Technical Field

The invention discloses a multi-mode identification method for mobile phone inertial sensor data (comprising motion direction identification, pedestrian identity verification and mobile phone carrying mode identification) and a method for transfer learning among sensor data classification tasks, and belongs to the technical field of artificial intelligence.

Background

With the development of embedded technology, sensors and wearable devices have more and more functions, and applications are also becoming very wide. In addition to applications in the fields of navigation, positioning, etc., data on inertial sensors can help us identify specific patterns. When the pedestrian wears the inertial sensor to walk, the pedestrian can be identified by the moving direction and the identity of the pedestrian is verified. And if the pedestrian carries a smart phone with an inertial sensor, the carrying mode of the smart phone can be identified.

The motion direction recognition refers to recognition of the moving direction (forward, backward, left and right) of a pedestrian, belongs to a human behavior recognition technology, brings more and more benefits in scientific research, production economy and life service at present, is also paid attention by scientists and scholars, and is mainly divided into two types, namely image-based and inertial sensor-based. The identity authentication is to use the inherent physiological characteristics or behavior characteristics of human body to authenticate the personal identity, but most of the existing biological characteristic identification technologies are to identify the iris, fingerprint or face of the human body, are mostly based on images, and have slow operation speed. The human behavior recognition technology and the identity verification technology based on the inertial sensor have good application prospects due to the fact that the sensor is low in cost, small in energy consumption, small in data volume, easy to calculate and not prone to being influenced by the environment. The mobile phone carrying mode identification refers to identifying a mode of carrying a mobile phone by a human body, generally comprising pocket, swing, holding and locking, the technology has a great effect on estimating a relative angle between a mobile phone coordinate system and a human body coordinate system, and can be applied to navigation and positioning based on a mobile phone inertial sensor.

At present, the scholars use various traditional machine learning methods for the above three classification tasks and achieve good effects. However, these machine learning methods cannot effectively mine information features to a certain extent, resulting in low accuracy, so some researchers begin to use deep learning methods to construct neural networks to complete classification tasks on inertial sensors, and the effect of the methods is greatly improved compared with the traditional machine learning methods. However, no matter the machine learning method or the deep learning method, a local optimal point may be trapped in the model training process, so that the finally trained model is limited in effect, and therefore how to avoid the local optimal point also becomes a difficulty in the field of artificial intelligence.

Disclosure of Invention

The invention aims to provide an effective deep learning method for multi-classification of inertial sensor data, and based on transfer learning, the model avoids local optimization in the training process, the final model can effectively mine hidden information from the inertial sensor data of a smart phone, and the motion direction identification, identity verification and mobile phone carrying mode identification of a mobile phone carrier are completed.

In order to achieve the above object, the technical solution proposed by the present invention includes a method for multi-mode recognition of mobile phone inertial sensor data, comprising the steps of:

step 1, collecting training sample data when an inertial sensor carrier uses a smart phone;

step 2, cutting original sensor data by adopting a sliding window method to generate a plurality of samples, setting each sample to contain a plurality of frame data, overlapping data of adjacent samples, and setting a label for each sample while generating the samples;

step 3, constructing a single-task deep neural network model, constructing one for each classification task, and totally three models, wherein each model comprises three convolution layers, two LSTM units, an attention mechanism module and a full connection layer, the neuron number of the output layer of the full connection layer is equal to the mode number of the corresponding task, and the mode numbers of the corresponding tasks of the three models are respectively the moving direction number, the pedestrian identity number and the mobile phone carrying mode number;

step 4, inputting the labels of the corresponding tasks of the samples generated in the step 2 into the deep network models of the corresponding tasks in the step 3, and training the models to be convergent;

step 5, constructing a multi-task deep network model, wherein the model comprises three parallel multi-convolution layers, two layers of time sequence convolution layers and three parallel decoding layers, and each decoding layer comprises two LSTM units, an attention mechanism module and a full connection layer; enabling the model to perform transfer learning on the single task network model in the step 4 and training based on the sample in the step 2 until convergence;

step 6, collecting test data when a user uses the smart phone with the inertial sensor;

step 7, inputting the test data into the multi-task model trained in the step 5 for calculation, wherein the model outputs three vectors which respectively correspond to probability prediction of each task; detecting the maximum value in each probability prediction vector, and if the maximum value is greater than the corresponding threshold value, returning the corresponding label as the prediction result of the task; otherwise, return to-1, which indicates that the data is of an illegal type in the task.

And when the multitask model performs migration learning on the single-task model in the step 5, the multi-convolution layers in the three corresponding single-task models are identified through moving direction identification, identity authentication and mobile phone carrying mode identification, and the multi-convolution layers are migrated into the multitask model to form three parallel multi-convolution layers, so that the multitask model can dig out more features, and local optimization is slowed down.

In step 2, a sliding window with the length of 128 and the step length of 64 is adopted to cut the original sensor data, a single generated sample has 128 pieces of data, each frame of data has 6 floating point numbers, and the floating point numbers respectively correspond to x-axis data, y-axis data and z-axis data of an accelerometer and x-axis data, y-axis data and z-axis data of a gyroscope; each sample pair is provided with three labels, and the content of each label is the identity number, the moving direction and the mobile phone carrying mode of the data acquirer corresponding to the sample.

Moreover, in the single-task deep neural network model in step 3, the first convolutional layer contains 64 one-dimensional convolutional kernels with the length of 25, the second convolutional layer and the third convolutional layer respectively contain 64 one-dimensional convolutional kernels with the length of 21, the number of hidden layer neurons in the two LSTM units is 128, the processing procedure is as follows,

inputting a sample with a size of (128,6) into the first convolution layer to obtainSize (104,6,64) feature map FM₁，FM₁Input into the second convolution layer to obtain a feature map FM of size (84,6,64)₂，FM₂Input into the third convolutional layer to obtain a feature map FM with a size of (64,6,64)₃(ii) a Then FM will be₃Reducing the vector size to two-dimensional matrix of (64,6 x 64), i.e. 64 vectors with length of 384, inputting the vector size to the first LSTM unit to generate 64 times of outputs, the output vector length of each time is 128, the 64 vectors will be inputted to the second LSTM unit again to generate 64 vectors with length of 128, i.e. two-dimensional matrix of (64, 128), the two-dimensional matrix is h^lstm，h^lstmWill be input to the attention mechanism module to perform weighted summation of scores, which is calculated as follows:

wherein,

α_iis h^lstmThe ith vector

V is a column vector of length 80, W is a two-dimensional matrix of (80,128), b is a column vector of length 80, N is h^lstmThe number of medium vectors, tanh is a hyperbolic tangent function;

the output of the attention mechanism module is a vector h with the length of 128^attenion，h^attenionThen the data is input into a full connection layer and is subjected to softmax transformation to finally obtain the data capable of representing and identifyingA vector of results, each value in the vector corresponding to a predicted probability for each pedestrian identity.

Furthermore, the multitask network model in step 5 processes the data as follows,

after entering the network, the samples with the size of (128,6) are copied into three copies and respectively enter three parallel multi-convolution layers Conv_direction,Conv_id,Conv_poseThree feature maps of size (64,6,64) are obtained, and these three feature maps are stacked along the third dimension to obtain a feature map of size (64,6,192), denoted as FM_all；

FM_allSequentially entering two time sequence convolution layers, wherein the expansion coefficients of the two time sequence convolution layers are respectively 2 and 4, the number of convolution kernels is respectively 96 and 48, and finally obtaining the feature mapping FM with the size of (64,6 and 48)_tcn(ii) a Then reshaped into a two-dimensional matrix of size (64,6 x 48) and input to three parallel decoders simultaneously; data entering a decoder firstly enters a double-layer LSTM unit, the number of neurons in each layer is 64, each LSTM layer outputs 64 vectors with the length of 64, and a two-dimensional matrix with the size of (64,64) is formed; the two-dimensional matrix is input into an attention module, and the output of each attention module enters a full-connection layer to obtain a probability distribution vector;

the multi-task model outputs three probability distribution vectors which correspond to the prediction results of moving direction identification, identity authentication and mobile phone carrying mode identification.

The invention utilizes a deep learning method to complete the identification of the moving direction, the identity of the pedestrian and the carrying mode of the mobile phone, and also utilizes a method for migrating the single task model to the multi-task model based on the idea of migration learning, thereby avoiding the local optimization in the training process to a certain extent. Compared with the prior art, the technology has the advantages of high identification accuracy, low possibility of being influenced by the environment and low required cost. And the technology can also be used for assisting the indoor positioning technology and improving the positioning precision.

The scheme of the invention is simple and convenient to implement, has strong practicability, solves the problems of low practicability and inconvenient practical application of the related technology, can improve the user experience, and has important market value.

Drawings

FIG. 1 is a diagram of a single-tasking neural network architecture according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a sliding window cut-to-produce sample according to an embodiment of the present invention;

FIG. 3 is a diagram of a multitasking neural network architecture according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention is specifically described below with reference to the accompanying drawings and examples.

The invention provides a deep neural network method capable of simultaneously completing motion direction identification, pedestrian identity verification and mobile phone carrying mode identification based on a mobile phone inertial sensor, and provides a training method based on transfer learning so as to avoid local optimization. The core of the technology is a deep neural network which can effectively dig out information hidden in sensor data after learning and training and accurately identify the moving direction and identity of a pedestrian and the carrying mode of a mobile phone.

The technical core of the invention is two deep neural network models (a single-task network model and a multi-task network model) and a training method for migrating and learning the multi-task model from the single-task model.

1) Single task network model

Referring to fig. 1, the model consists of three convolutional layers, two LSTM units, an attention mechanism module, and a fully connected layer. The first convolution layer contains 64 one-dimensional convolution kernels with the length of 25, the second convolution layer and the third convolution layer respectively contain 64 one-dimensional convolution kernels with the length of 21, the number of neurons in hidden layers in the two LSTM units is 128, and the number of neurons in output layers of the full connection layers is equal to the number of classified neurons. After the sample with the size of (128,6) is input into the first convolution layer, the feature map FM with the size of (104,6,64) is obtained₁，FM₁Input to the second convolutional layer to obtain a feature map FM of size (84,6,64)₂，FM₂Inputting into the third convolution layer to obtain the sizeFeature map FM of (64,6,64)₃. Then FM will be₃Dimension reduction into two-dimensional matrix with size of (64,6 x 64), that is, 64 vectors with length of 384, input into the first LSTM unit to generate 64 times of output, each time the output vector length is 128, the 64 vectors will be input into the second LSTM unit again to generate 64 vectors with length of 128, that is, two-dimensional matrix with size of (64, 128), and the two-dimensional matrix is recorded as h^lstm，h^lstmThe data is input into an attention mechanism module to carry out score weighted summation, and the calculation method is as follows:

wherein,

α_iis h^lstmThe ith vector

V is a column vector of length 80, W is a two-dimensional matrix of (80,128), b is a column vector of length 80, v, W, b are learnable network parameters, N is h^lstmThe number of medium vectors, tanh, is the hyperbolic tangent function.

The output of the attention mechanism module is a vector h with the length of 128^attention，h^attentionAnd then inputting the data into a full connection layer, and finally obtaining a vector capable of representing the recognition result through softmax transformation, wherein each value in the vector corresponds to the prediction probability of each category.

2) Multitasking model

Referring to fig. 3, the multitask model is formed by splitting and merging three different single task models, and after the training of the three single task models (the motion direction identification model, the identity verification model and the mobile phone carrying mode identification model) is completed, the convolution layer parts in the three models of the motion direction identification, the identity verification and the mobile phone carrying mode identification are sequentially: conv_direction,Conv_id,Conv_pose. The invention migrates the three convolutional layer parts into a multitask model, and the input samples respectively enter Cinv at the same time_direction,Conv_id,Conv_poseObtaining three feature maps with the size of (64,6,64), which are respectively marked as FM_direction,FM_id,FM_poseAnd stacking the three feature maps along a third dimension to obtain a feature map of size (64,6,192), denoted as FM_all。

Then, the invention adopts a time sequence convolution method to carry out FM based on the idea of void convolution_allAnd (5) performing spatial dimension reduction. FM_allSequentially entering two time sequence convolution layers, wherein the differences are respectively 2 and 4, the number of convolution kernels is respectively 96 and 48, and finally obtaining the feature mapping FM with the size of (64,6 and 48)_tcn(ii) a Then reshaped into a two-dimensional matrix of size (64,6 x 48) and input to three parallel decoders simultaneously; the data going to the decoder will first go into a two-layer LSTM unit (64 neurons per layer), and each LSTM layer will output 64 vectors of length 64, i.e. a two-dimensional matrix of size (64, 64).

The three two-dimensional matrixes are respectively input into an attention module (the structure of the attention module is the same as that of the attention module in the single-task model), and the output of each attention module enters the full-connection layer to obtain a probability distribution vector representing the recognition result.

The multi-task model outputs three vectors which respectively correspond to the prediction results of motion direction identification, identity authentication and mobile phone carrying mode identification.

When a multi-task model is used for multi-classification (motion direction identification, identity authentication and mobile phone carrying mode identification), the method mainly comprises the following steps.

The first step is as follows: collecting training sample data when an inertial sensor carrier uses a smartphone: in specific implementation, a mobile phone carrier can use the smart phone to collect data for neural network learning. During collection, the collection frequency is set to be 50 Hz. The collected data includes four movement directions (forward, backward, left and right), four mobile phone carrying modes (pocket, swing, holding, and sitting), and a plurality of collectors.

The second step is that: cutting original sensor data by adopting a sliding window method to generate a plurality of samples, wherein each sample contains n frame data, and the adjacent samples have P% data overlap, and labeling is done for each sample while generating the sample: the raw sensor data was sliced using a sliding window of 128 long, 64 steps to generate a number of samples, each sample containing 128 frames of data, with 50% overlap of data between adjacent samples. And while generating the samples, labeling each sample, and marking the corresponding moving direction, the mobile phone carrying mode and the number of the collector.

Referring to fig. 2, in the embodiment, it is preferable to set the generated single sample to have 128 frames of data, each frame of data has 6 floating point numbers, which correspond to the x, y, and z axis data of the accelerometer and the x, y, and z axis data of the gyroscope, respectively. When the label is made, each sample has three labels, and the content of the label is the identity number, the moving direction and the mobile phone carrying mode of the data acquirer corresponding to the sample.

The third step: constructing a single-task deep neural network model, wherein each classified task is constructed by one model, and the three models are total, each model comprises three convolution layers, two LSTM units, an attention mechanism module and a full connection layer, wherein the first convolution layer contains 64 one-dimensional convolution kernels with the length of 25, the second convolution layer and the third convolution layer respectively contain 64 one-dimensional convolution kernels with the length of 21, the number of neurons in hidden layers in the two LSTM units is 128, the number of neurons in output layers of the full connection layer is equal to the number of modes of corresponding tasks, and the number of modes of corresponding tasks of the three models is respectively the number of moving directions, the number of pedestrians and the number of mobile phone carrying modes;

the specific single-task deep neural network model implementation is seen in section 1) above.

The fourth step: and inputting the sample set generated in the third step into each single task model for learning and training, and setting appropriate training parameters (learning rate, training round number and the like) to train the model until the model converges.

The fifth step: constructing a multi-task deep network model, wherein the model comprises three parallel multi-convolution layers, two layers of time sequence convolution layers and three parallel decoding layers, and each decoding layer comprises two LSTM units, an attention mechanism module and a full connection layer; enabling the model to perform transfer learning on the single task network model in the fourth step and training based on the samples in the second step until convergence;

and loading the multi-task model into the convolution layer parameters of each single-task model, reading the sample set, and training until convergence. And storing the trained multi-task model to the rear end of the server, so that the server can calculate the received sensor data in real time, and finish moving direction identification, identity authentication and mobile phone carrying mode identification.

And a sixth step: collecting test data when a user uses a smartphone with an inertial sensor: in specific implementation, a test user can acquire data through a WeChat applet on the smart phone, and each group of data (with the period of 2.56s) is automatically sent to the back end of the server.

The seventh step: inputting the test data into the multi-task model trained in the fifth step for calculation, wherein the model outputs three vectors which respectively correspond to probability prediction of each task; detecting the maximum value in each probability prediction vector, and if the maximum value is greater than a corresponding threshold value, returning a corresponding label as a prediction result of the task; otherwise, returning to-1, indicating that the data is of an illegal type in the task: during specific implementation, the multitask model on the server processes and calculates data, and returns the recognition result of each task to the mobile phone terminal.

In specific implementation, a person skilled in the art can implement the automatic operation process by using a computer software technology, and a system device for implementing the method, such as a computer-readable storage medium storing a corresponding computer program according to the technical solution of the present invention and a computer device including a corresponding computer program for operating the computer program, should also be within the scope of the present invention.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A multi-mode identification method for mobile phone inertial sensor data is characterized by comprising the following steps:

step 7, inputting the test data into the multi-task model trained in the step 5 for calculation, wherein the model outputs three vectors which respectively correspond to probability prediction of each task; detecting the maximum value in each probability prediction vector, and if the maximum value is greater than a corresponding threshold value, returning a corresponding label as a prediction result of the task; otherwise, return to-1, which indicates that the data is of an illegal type in the task.

2. A method of multimodal identification of the number of inertial sensors of a mobile phone according to claim 1, characterized in that: and 5, when the multi-task model performs migration learning on the single-task model, the multi-convolution layers in the three corresponding single-task models are identified through moving direction identification, identity authentication and mobile phone carrying mode identification and are migrated into the multi-task model to form three parallel multi-convolution layers, so that the multi-task model can dig out more features and slow down local optimization.

3. A method for multimodal identification of the number of inertial sensors of a mobile phone according to claim 1 or 2, characterized in that: in the step 2, a sliding window with the length of 128 and the step length of 64 is adopted to cut the original sensor data, a single generated sample has 128 pieces of data, each frame of data has 6 floating point numbers, and the data respectively correspond to x-axis data, y-axis data and z-axis data of an accelerometer and x-axis data, y-axis data and z-axis data of a gyroscope; each sample pair is provided with three labels, and the content of each label is the identity number, the moving direction and the mobile phone carrying mode of the data acquirer corresponding to the sample.

4. A method of multimodal identification of the number of inertial sensors of a mobile phone according to claim 3, characterized in that: in the single-task deep neural network model in step 3, the first convolutional layer contains 64 one-dimensional convolutional kernels with the length of 25, the second convolutional layer and the third convolutional layer respectively contain 64 one-dimensional convolutional kernels with the length of 21, the number of neurons in hidden layers in the two LSTM units is 128, the processing process is as follows,

after a sample with a size of (128,6) is input into the first convolutional layer, a feature map FM with a size of (104,6,64) is obtained₁，FM₁Input into the second convolution layer to obtain a feature map FM of size (84,6,64)₂，FM₂Input into the third convolutional layer to obtain a feature map FM with a size of (64,6,64)₃(ii) a Then FM will be₃Dimension reduction into two-dimensional matrix with size of (64,6 x 64), that is, 64 vectors with length of 384, input into the first LSTM unit to generate 64 times of outputs, each time the output vector length is 128, the 64 vectors will be input into the second LSTM unit again to generate 64 vectors with length of 128, that is, two-dimensional matrix with size of (64, 128), and the two-dimensional matrix is recorded as h^lstm，h^lstmWill be input to the attention mechanism module to perform weighted summation of scores, which is calculated as follows:

wherein,

α_iis h^lstmThe ith vector

V is a column vector of length 80, W is a two-dimensional matrix of (80,128), b is a column vector of length 80, N is h^lstmThe number of medium vectors, tanh, is hyperbolicA cut function;

the output of the attention mechanism module is a vector h with the length of 128^attenion，h^attenionAnd then the data are input into a full connection layer, and are transformed by softmax to finally obtain a vector capable of representing the recognition result, wherein each value in the vector corresponds to the prediction probability of each pedestrian identity.

5. A method of multimodal identification of the number of inertial sensors of a mobile phone according to claim 4, characterized in that: the processing procedure of the multitask network model in step 5 on the data is as follows,

after the sample with the size of (128,6) enters the network, it is copied into three copies, and the three copies enter three parallel multiple convolution layers Conv_direction,Conv_id,Conv_poseThree feature maps of size (64,6,64) are obtained, and these three feature maps are stacked along the third dimension to obtain a feature map of size (64,6,192), denoted as FM_all；

FM_allSequentially entering two time sequence convolution layers, wherein the expansion coefficients of the two time sequence convolution layers are respectively 2 and 4, the number of convolution kernels is respectively 96 and 48, and finally obtaining the feature mapping FM with the size of (64,6 and 48)_tcn(ii) a Then reshaped into a two-dimensional matrix of size (64,6 x 48) and simultaneously input into three parallel decoders; data entering a decoder firstly enters a double-layer LSTM unit, the number of neurons in each layer is 64, each LSTM layer outputs 64 vectors with the length of 64, and a two-dimensional matrix with the size of (64,64) is formed; the two-dimensional matrix is input into an attention module, and the output of each attention module enters a full connection layer again to obtain a probability distribution vector;