WO2022236579A1

WO2022236579A1 - Gait recognition method and system based on lightweight attention convolutional neural network

Info

Publication number: WO2022236579A1
Application number: PCT/CN2021/092775
Authority: WO
Inventors: 孙方敏; 李烨; 黄浩华
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2021-05-10
Filing date: 2021-05-10
Publication date: 2022-11-17

Abstract

Disclosed in the present invention are a gait recognition method and system based on a lightweight attention convolutional neural network. The method comprises: inputting collected three-axis acceleration gait data and three-axis angular velocity gait data into a lightweight convolutional neural network, so as to extract gait features, wherein the lightweight convolutional neural network performs a one-dimensional convolution calculation on a time axis, respectively extracts features in an acceleration single-axis signal and an angular velocity single-axis signal, and fuses extracted six-axis signal features by means of two-dimensional convolution; for a feature map output by the lightweight convolutional neural network, calculating an attention weight parameter of each channel according to context encoding information of each channel; and for a feature map of each channel that is output by the lightweight convolutional neural network, further extracting features by means of depthwise separable convolution, and then multiplying the extracted features by the attention weight parameter of the corresponding channel, so as to enhance the features, wherein the enhanced features are used for classification, thereby realizing gait recognition. By means of the present invention, the complexity of a model can be reduced, and the accuracy of gait recognition can be improved.

Description

A Gait Recognition Method and System Based on Lightweight Attention Convolutional Neural Network

technical field

The present invention relates to the field of computer application technology, more specifically, to a gait recognition method and system based on a lightweight attentional convolutional neural network.

Background technique

In recent years, the types and quantities of wearable smart devices (such as smartphones, smart watches, etc.) Office and health testing, etc. The popularity of wearable smart devices has brought great convenience to people's lives, but because they may store and collect personal sensitive information during use, there is a high risk of privacy leakage, which makes wearable The security of smart devices has attracted much attention and attention. As the first hurdle to protect information security, identity recognition plays a pivotal role. Gait recognition based on wearable smart devices is an effective identification method, which identifies individuals through their unique walking styles, and has the advantages of long-distance, active, real-time and continuous recognition. Currently, adopting deep learning techniques for gait recognition has achieved significant performance improvements and has become a new promising trend. However, most existing studies only focus on improving recognition accuracy, and their network models usually have high complexity, ignoring the importance of lightweight models for wearable smart devices with limited computing power and storage resources.

Among the existing technologies, biometric technology is the latest technology for access control of wearable smart devices, which identifies individuals based on unique, stable and measurable physiological or behavioral characteristics of human beings. Physiological characteristics mainly include face, fingerprint and iris, etc., while behavioral characteristics are related to a person's behavior patterns, including gait, signature and so on. Although biometric technologies based on physiological characteristics have been widely used, they also have many insurmountable shortcomings. First, the sensors used to acquire physiological characteristics (such as fingerprint scanners, cameras, etc.) are expensive and large in size, which increases the weight and cost of wearable smart devices. Secondly, there is a risk of copying physiological features such as fingerprints and faces. For example, 3D printing can easily copy users' fingerprints to unlock devices. Finally, biometric technology based on physiological characteristics requires explicit interaction between the user and the device, and cannot achieve long-distance, active, real-time and continuous identification. When the device is lost in the unlocked state, the security risk is huge.

As a behavioral characteristic, gait refers to the walking posture of the human body. Research has shown that each individual's gait is unique and stable, making it difficult to imitate or replicate. Gait-based identification (gait recognition) does not require explicit interaction between the user and the device, and is an active, real-time and continuous identification method with high security. With the development of microelectronics technology, inertial sensors with small size, low power consumption and low cost are integrated in almost all wearable smart devices, which makes it possible to use wearable smart devices to obtain gait information and use corresponding algorithms to It is possible to realize user identification. Gait identification technology based on wearable smart devices has received extensive attention and research from scholars at home and abroad. At present, gait recognition methods based on wearable smart devices mainly include three categories: template matching methods, machine learning methods and deep learning methods.

The template matching method identifies the identity of the user by calculating and comparing the similarity between the gait template stored in the wearable smart device and the gait cycle to be detected. If the similarity is higher than the preset threshold, the user is identified for legitimate users. The methods used to calculate the similarity mainly include Dynamic Time Warping (DTW), Pearson Correlation Coefficient (PCC) and cross-correlation etc. Currently, many studies have proposed different template matching methods and achieved good performance under laboratory conditions. However, template matching methods need to detect gait cycles to construct gait templates and test samples, and gait cycle detection is a challenging task because it is sensitive to noise and device location, pace, road conditions and device Any change in position can easily lead to failure of gait cycle detection or loss of phase within a gait cycle, which will lead to wrong recognition decisions. Therefore, the robustness and accuracy of the template matching method can not meet the needs of practical applications.

Machine learning methods achieve identity recognition by extracting features of gait signals for classification. Existing studies have used algorithms such as support vector machine (SVM), nearest neighbor (KNN) and random forest (RF) for gait identification, and achieved better performance than template matching methods. However, the model recognition accuracy of machine learning methods is greatly affected by manually extracted features. Manually extracting features requires researchers to have rich professional knowledge and experience in related fields, with professionalism and a certain degree of subjectivity. Preprocessing, feature engineering, and continuous experimental verification and improvement are required to obtain good results, which is time-consuming and difficult.

Recent studies have shown that adopting deep learning models such as convolutional neural networks (CNNs) for gait recognition has achieved significant performance improvements and has become a new promising trend. Deep learning networks have powerful nonlinear representation learning capabilities, which can automatically extract useful features from input data for classification and other tasks. Existing studies have proposed many deep learning-based gait recognition methods, which have been extensively compared with traditional machine learning algorithms and template matching algorithms, and have achieved better performance improvements in recognition accuracy. Although the deep learning method can automatically extract useful features from the data, it has better robustness and higher recognition performance than the template matching method and the machine learning method, but the models proposed by the existing research have high complexity. It is not suitable for wearable smart devices with limited computing power and capacity.

Contents of the invention

An object of the present invention is to provide a lightweight attentional convolutional neural network for gait recognition based on wearable smart devices, which can achieve better performance improvement while occupying less memory resources.

According to a first aspect of the present invention, a gait recognition method based on a lightweight attentional convolutional neural network is provided. The method includes the following steps:

Step S1: Input the collected triaxial acceleration and triaxial angular velocity gait data into a lightweight convolutional neural network to extract gait features. The lightweight convolutional neural network performs one-dimensional convolution calculations on the time axis to extract The features in the single-axis acceleration signal and the single-axis angular velocity signal, and use two-dimensional convolution to fuse the extracted six-axis signal features to obtain the output feature map;

Step S2: For the feature map output by the lightweight convolutional neural network, calculate the attention weight parameters of each channel according to the context coding information of each channel;

Step S3: For the feature map of each channel output by the lightweight convolutional neural network, use depth separable convolution to further extract features and multiply them with the attention weight parameters of the corresponding channel, and then perform gait recognition, wherein The depthwise separable convolution performs convolution operations only in the spatial dimension.

According to a second aspect of the present invention, there is provided a gait recognition system based on a lightweight attention convolutional neural network. The system includes:

Lightweight convolutional neural network: used to extract gait features and obtain output feature maps with triaxial acceleration and triaxial angular velocity gait data as input, where the lightweight convolutional neural network performs one-dimensional convolution on the time axis Product calculation, respectively extracting the features in the acceleration single-axis signal and angular velocity single-axis signal, and using two-dimensional convolution to fuse the extracted six-axis signal features;

Attention module: used to calculate the attention weight parameters of each channel according to the context encoding information of each channel for the feature map output by the lightweight convolutional neural network; and for the output of the lightweight convolutional neural network The feature map of each channel is further extracted by depth-separable convolution and then multiplied by the attention weight parameter of the corresponding channel to obtain enhanced features, wherein the depth-separable convolution only performs convolution operations in the spatial dimension;

Prediction output module: used for gait recognition according to the enhanced features.

Compared with the prior art, the present invention has the advantage of proposing a lightweight neural network model suitable for wearable smart devices, which can obtain higher recognition accuracy while occupying less memory resources, solving the problem of Existing research requires a high-complexity model to obtain high recognition accuracy.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

Fig. 1 is the flowchart of the gait recognition method based on lightweight attention convolutional neural network according to one embodiment of the present invention;

Fig. 2 is a structural diagram of a lightweight attention convolutional neural network according to one embodiment of the present invention;

Fig. 3 is a schematic diagram of extracting a feature map of each channel through depthwise separable convolution according to an embodiment of the present invention.

Detailed ways

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.

Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.

In all examples shown and discussed herein, any specific values should be construed as exemplary only, and not as limitations. Therefore, other instances of the exemplary embodiment may have different values.

It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.

The present invention proposes a lightweight attentional convolutional neural network, which is a new technical solution for realizing gait recognition based on wearable smart devices. In short, the technical solution first uses a lightweight convolutional neural network (CNN) to extract gait features from the three-axis acceleration and three-axis angular velocity data collected by wearable smart devices. Then, a new attention weight calculation method is proposed, and an attention module is designed based on the attention weight calculation method, contextual encoding information and depthwise separable convolution, which is embedded into a lightweight CNN to enhance step dynamic features and simplify the complexity of the model. Finally, the enhanced gait features are input into, for example, a Softmax classifier for classification, and then the gait recognition result is output.

Specifically, as shown in FIG. 1 , the provided gait recognition method based on a lightweight attention convolutional neural network includes the following steps.

Step S110, taking the three-axis acceleration and three-axis angular velocity gait data as input, and using a lightweight convolutional neural network to extract features.

In one embodiment, the lightweight attention convolutional neural network is shown in Figure 2, which generally includes an input layer, a convolutional neural network, an attention module (marked as Attention) and an output layer (to obtain a predicted output module).

The input layer receives the three-axis acceleration and three-axis angular velocity gait data collected by wearable smart devices, the convolutional neural network is used to extract gait features from the gait data, and the attention module is used to process the extracted gait features Enhancement, the enhanced features are input into the Softmax classifier for classification and output recognition results.

The convolutional neural network in Figure 2 is designed as a lightweight network structure, hereinafter referred to as L-CNN (Lightweight CNN), which is the front end of the entire network and is used to extract features from input data. For example, L-CNN contains four convolutional layers and two pooling layers. Two pooling layers are respectively placed after the first and third convolutional layers to further extract the main features of the convolutional layers. Set a batch normalization layer (BatchNormalization, BN) and a ReLU activation layer after each convolution layer or pooling layer. The BN layer and the ReLU layer can speed up network training and convergence, and prevent gradient disappearance or explosion and overfitting question. The first three convolutional layers of L-CNN use 1D convolution, that is, convolution calculations are performed on the time axis, and the features in the single-axis signals of acceleration and angular velocity are extracted respectively. This method is conducive to obtaining better feature representation of single-axis signals. . The last convolutional layer of L-CNN uses 2D convolution to fuse the 6-axis signal features extracted by the previous three convolutional layers to obtain more useful potential advanced features, which will help the network to obtain better recognition performance . The hierarchical structure and parameter settings of L-CNN are shown in Table 1 below.

Table 1: L-CNN hierarchy and parameter settings

Step S120, for the feature map output by the lightweight convolutional neural network, use the channel attention mechanism to extract enhanced features.

In Figure 2, the attention module uses the channel attention mechanism to learn the correlation between a single channel information and all channel information, and uses this correlation as the weight of different channels to multiply the original feature map , so as to enhance the feature map of the important channel, the larger the weight value, the more important the information contained in the feature map of the channel. In the prior art, an attention weight calculation module is usually composed of Global Average Pooling (GAP) and Fully Connected Layers (FC) to obtain the weights of different channels, but this method is due to the The presence of connected layers increases the model parameters of the network.

In one embodiment, a new channel weight calculation method is proposed. Let F∈R ^H×W×C be a set of feature maps output by L-CNN, where H, W and C denote the height, width and channel dimension of the feature map respectively, and the weight calculation formula of the i-th channel is defined as:

The molecule F _i represents the context encoding information contained in the i-th channel, which can be represented by a value or the sum of a set of data,

Represents the sum of all channel context encoding information.

In order to obtain context encoding information, preferably, a Context Encoding Module (Context Encoding Module, CEM) is used to capture global context information and selectively highlight feature maps associated with categories. This module can be found in (“Deep TEN: Texture Encoding Network”. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, July 21-26, 2017, Zhang, H.; Xue, J.; Dana , K.). Since this module combines dictionary learning and residual encoding, it carries domain-specific information and can be transferred to the processing of gait timing signals. The whole CEM is differentiable, and embedding this module into a convolutional neural network can achieve end-to-end learning optimization.

For example, CEM contains K D-dimensional encoding vectors. The values of these encoding vectors are generally initialized randomly, and reasonable values are automatically learned during the continuous training of the network. After a single feature map is processed by CEM, a new vector set E with a fixed length of K is obtained, and each element in E contains context coding information. In one embodiment, let the inherent dictionary only contain one coding vector, that is, K=1. At this time, a single feature map only obtains one weight parameter containing context coding information after passing through the context coding module, and contains input features of C channels The graph will get _C weight parameters, recorded as γ={E ₁ ,E ₂ ,...,EC }. The attention weight of each channel can be calculated according to formula (1) and γ={E ₁ , E ₂ ,...,E _C }.

After obtaining the channel attention weight, the present invention does not directly multiply the channel weight parameter and the original feature map, but makes some improvements to it. The channel attention mechanism can determine which channels are important and enhance the feature maps of important channels, so as to select feature maps that are more relevant to the target task. But the channel attention mechanism ignores that there may still be some useless or redundant features in the feature maps of important channels. To solve this problem in the channel attention mechanism, the feature map of each channel is further extracted and then multiplied by the attention weight parameter.

In one embodiment, depthwise separable convolution (Depthwise Separable Convolution, DS-Conv) is used to implement feature extraction on the feature map of each channel. Depth-separable convolution is a model lightweight technology. Unlike traditional convolution, which performs convolution operations in two dimensions, space and channel, depth-separable convolution only performs convolution operations in the spatial dimension. Convolution calculations are performed only in the spatial dimension so that the depthwise separable convolution does not need to specify the number of convolution kernels, thereby significantly reducing the number of parameters that the model needs to learn.

Based on formula (1), context coding module and depthwise separable convolution, the embodiment of the present invention proposes a channel attention method that can effectively improve model recognition performance and simplify model complexity, which is named CEDS-A (Attention with Context Encoding and Depthwise Separable Convolution), its structure is shown in Figure 3. Input F _(H,W,C) represents a set of feature maps output by L-CNN, DS-Conv represents depth separable convolution, γ _(1,1,C) represents channel attention weights, Y _{(H',W ', C)} is a new set of feature maps obtained.

Equation (2) is a mathematical description of Figure 3, where D _C represents a depthwise separable convolution operation, for example, the size of its convolution kernel is set to 1x3, and δ _N represents BN+Sigmoid.

Step S130, using enhanced features for gait recognition.

Based on the enhanced features extracted above, a classifier, such as a Softmax classifier, can be further used to judge whether the corresponding gait features are legal, thereby realizing personal identity verification. The invention can effectively improve the recognition rate of gait identity authentication, and can be applied to monitoring systems in various occasions.

In order to further verify the effect of the present invention, experiment is carried out. The results show that, compared with the recognition accuracy and model parameter quantity of the existing similar research, the model proposed by the present invention obtains higher recognition performance under the condition that the complexity is reduced by 87.8% on average. The specific experimental process is as follows.

1), experimental data

Experiments are conducted on the whuGait dataset collected in real scenarios and the OU-ISIR dataset with the largest number of experimental participants to evaluate the performance of the proposed network model. The whuGait data set contains gait data collected by 118 experimenters outdoors through smartphones in a completely unrestrained environment. It is not clear when, where and how each experimenter walks. The whuGait dataset consists of 8 sub-datasets: dataset #1 to dataset #4 for identification, dataset #5 and dataset #6 for authentication, dataset #7 and dataset #8 for Walking data and non-walking data are separated. The present invention only uses two sub-data sets from data set #1 to data set #2. The OU-ISIR dataset is currently the inertial sensor-based gait dataset with the largest number of experimental participants, and it includes gait data of 744 experimenters (389 males, 355 females, ranging in age from 2 to 78 years old).

The OU-ISIR and whuGait datasets can refer to the open source processed datasets on the GitHub website (https://github.com/qinnzou/). See Table 2 for details of the datasets used in the experiments. There is no intersection between the training set and the test set used in the experiment, and the overlap rate of samples refers to the overlap between the internal samples of the training set and the test set.

Table 2: Experimental dataset information

2), experimental method

The network model uses Early Stopping to control the number of iterations of network training. The early stopping method is a widely used model training method, which means that during the network training process, if the performance of the network on the verification set has not been improved for N consecutive iterations, the learning and training of the network will be stopped. The early stopping method saves the model or model parameters with the best performance of the network on the verification set during the training process by monitoring whether the performance indicators (such as accuracy rate, average error, etc.) have improved, which can prevent the network from over-fitting and improve the generality of the model. performance. In the invention, the accuracy rate is used as a monitoring index, and N is set to 50 to control the training of the network, that is, if the accuracy rate of the network on the verification set has not improved for 50 consecutive iterations, the training of the network will end.

3), evaluation indicators

To evaluate the performance of the model, accuracy, recall and F1-score are used as evaluation indicators. The larger the value of these three evaluation indicators, the better the performance of the model.

4), Experimental results and analysis

On the whuGait and OU-ISIR data sets, the method proposed by the present invention is mainly consistent with the existing technical solution "Deep Learning-Based Gait Recognition Using Smartphones in the Wild" (IEEE Transactions on Information Forensics and Security, 2020, 15, 3197- 3212], Zou, Q.; Wang, Y.; Wang, Q.; etc.) for comparison.

The experimental comparison results are shown in Table 3 below, and it can be seen that:

(1) On recognition accuracy, the method proposed by the present invention (marked as L-CNN+CEDS-A) is higher than the experimental results of existing CNN+LSTM on data set #1 and data set #2 1.39% and 0.95%, and 25.16% higher on the OU-ISIR dataset.

(2) On the amount of model parameters, the parameter amount of the model of the present invention is reduced by 87.8% on average compared with the parameter amount of the existing CNN+LSTM model, which shows that the memory resource occupied on the model size is less.

The above experimental results show that the method proposed in the present invention achieves higher recognition accuracy with a lighter model than the existing research methods, which is important for wearable smart devices with limited resources. and meaningful.

Table 3: Comparison with Existing Research Results

Correspondingly, the present invention also provides a gait recognition system based on a lightweight attentional convolutional neural network, which is used to realize one or more aspects of the above method. For example, the system includes: a lightweight convolutional neural network, which is used to use three-axis acceleration and three-axis angular velocity gait data as input, extract gait features, and obtain an output feature map, wherein the lightweight convolutional neural network is used in Perform one-dimensional convolution calculation on the time axis, extract the features in the acceleration single-axis signal and angular velocity single-axis signal, and use two-dimensional convolution to fuse the extracted six-axis signal features; the attention module is used for For the feature map output by the lightweight convolutional neural network, calculate the attention weight parameters of each channel according to the context coding information of each channel; and for the feature map of each channel output by the lightweight convolutional neural network , use the depth separable convolution to further extract features and multiply the corresponding channel attention weight parameters to obtain enhanced features, wherein the depth separable convolution only performs convolution operations in the spatial dimension; the prediction output module is used for Perform gait recognition according to the enhanced features.

In summary, the present invention proposes a new channel attention weight calculation method, which is simple and effective, and hardly increases the number of parameters of the model. Based on the proposed channel attention weight calculation method, context encoding module and depthwise separable convolution, the present invention proposes a channel attention module that can effectively improve model recognition performance and simplify model complexity. The lightweight convolutional neural network and the channel attention module designed by the present invention are combined to form a complete gait recognition network, which achieves better performance improvement while occupying less memory resources.

The present invention can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the invention are implemented by executing computer readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by means of hardware, implementation by means of software, and implementation by a combination of software and hardware are all equivalent.

Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein. The scope of the invention is defined by the appended claims.

Claims

A gait recognition method based on a lightweight attention convolutional neural network, comprising the following steps:

Step S1: Input the collected triaxial acceleration and triaxial angular velocity gait data into a lightweight convolutional neural network to extract gait features. The lightweight convolutional neural network performs one-dimensional convolution calculations on the time axis to extract The features in the single-axis acceleration signal and the single-axis angular velocity signal, and use two-dimensional convolution to fuse the extracted six-axis signal features to obtain the output feature map;

Step S2: For the feature map output by the lightweight convolutional neural network, calculate the attention weight parameters of each channel according to the context coding information of each channel;

Step S3: For the feature map of each channel output by the lightweight convolutional neural network, use depth separable convolution to further extract features and multiply them with the attention weight parameters of the corresponding channel, and then perform gait recognition, wherein The depthwise separable convolution performs convolution operations only in the spatial dimension.
The method according to claim 1, characterized in that, in step S2, for a set of feature maps F∈R H×W×C , the weight calculation formula of the i-th channel is expressed as:

Among them, H, W and C represent the height of the feature map respectively, F i represents the context encoding information contained in the i-th channel,
Represents the sum of all channel context encoding information.
The method according to claim 1, wherein the context coding information is obtained according to the following steps::

After a single feature map is processed by context encoding, a new vector set E with a fixed length of K is obtained, and each element in E contains context encoding information.
The method according to claim 1, wherein the lightweight convolutional neural network comprises four convolutional layers and two pooling layers, and the two pooling layers are respectively arranged in the first convolutional layer and the third convolutional layer. After the convolutional layer, a batch normalization layer and a ReLU activation layer are set after each convolutional layer or pooling layer. The first three convolutional layers of the lightweight convolutional neural network use one-dimensional convolution. On the time axis Carry out convolution calculations to extract the features in the acceleration and angular velocity single-axis signals; the last convolutional layer uses two-dimensional convolution to fuse the six-axis signal features extracted by the first three convolutional layers.
The method according to claim 1, wherein the triaxial acceleration and triaxial angular velocity gait data are collected by a wearable smart device.
A gait recognition system based on a lightweight attentional convolutional neural network, including:

Lightweight convolutional neural network: used to extract gait features and obtain output feature maps with triaxial acceleration and triaxial angular velocity gait data as input, where the lightweight convolutional neural network performs one-dimensional convolution on the time axis Product calculation, respectively extracting the features in the acceleration single-axis signal and angular velocity single-axis signal, and using two-dimensional convolution to fuse the extracted six-axis signal features;

Attention module: used to calculate the attention weight parameters of each channel according to the context encoding information of each channel for the feature map output by the lightweight convolutional neural network; and for the output of the lightweight convolutional neural network The feature map of each channel is further extracted by depth-separable convolution and then multiplied by the attention weight parameter of the corresponding channel to obtain enhanced features, wherein the depth-separable convolution only performs convolution operations in the spatial dimension;

Prediction output module: used for gait recognition according to the enhanced features.
The system according to claim 6, wherein the attention module includes an input layer, a depth separable convolution layer, a context encoding module, a batch normalization layer and an activation layer, and the relationship between input and output is expressed as :

Among them, F (H, W, C) represents a set of feature maps, D C represents the depth separable convolution operation, the convolution kernel size is set to 1x3, γ (1, 1, C) represents the channel attention weight, Y ( H′, W′, C) are a new set of feature maps obtained, and δN denotes batch normalization and activation processing.
The system according to claim 6, wherein the prediction output module is implemented by a softmax classifier.
A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 1 to 5 are implemented.
A computer device comprising a memory and a processor, wherein a computer program capable of running on the processor is stored in the memory, wherein any one of claims 1 to 5 is implemented when the processor executes the program The steps of the method described in the item.