CN112101472A

CN112101472A - Shoe identification method, device and system, computer storage medium and electronic equipment

Info

Publication number: CN112101472A
Application number: CN202010993739.2A
Authority: CN
Inventors: 姚昱旻; 文雅; 温岚
Original assignee: Changsha Yumin Information Technology Co ltd
Current assignee: Changsha Yumin Information Technology Co ltd
Priority date: 2020-09-21
Filing date: 2020-09-21
Publication date: 2020-12-18

Abstract

The application provides a method for identifying footwear, comprising: acquiring gait data of an identification object; and identifying the shoe type of the identification object according to the gait data and a pre-established deep attention network Sensing-HH. By adopting the shoe identification method, the shoe identification is more effective and accurate. The application also provides a shoe identification device, a system, a computer storage medium and an electronic device.

Description

Shoe identification method, device and system, computer storage medium and electronic equipment

Technical Field

The present application relates to computer technologies, and in particular, to a method, an apparatus, a system, a computer storage medium, and an electronic device for identifying footwear.

Background

With the rapid development of wearable technology, billions of intelligent terminals with built-in high-precision motion sensors comprise: smartphones and wearable devices have become the most intimate and inseparable tool and the most faithful "recorder" for people's life. With these motion sensors, it is often investigated the recognition of the person's own, intrinsic characteristics, such as motion, gesture, identity, etc., with little attention paid to the fact that extrinsic factors can also affect gait.

Problems existing in the prior art:

gait-based biometric identification is largely affected by the footwear worn by the user.

Disclosure of Invention

The embodiment of the application provides a method, a device and a system for identifying shoes, a computer storage medium and electronic equipment, so as to solve the technical problems.

According to a first aspect of embodiments of the present application, there is provided a method of identifying footwear, comprising the steps of:

acquiring gait data of an identification object;

and identifying the shoe type of the identification object according to the gait data and a pre-established deep attention network Sensing-HH.

According to a second aspect of embodiments of the present application, there is provided a footwear identification device comprising:

a data acquisition module configured to acquire gait data of an identification subject;

a footwear identification module configured to identify a footwear type of the recognition object based on the gait data and a pre-established deep attention network Sensing-HH.

According to a third aspect of embodiments of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of a method of footwear identification as described above.

According to a fourth aspect of embodiments herein, there is provided an electronic device comprising one or more processors, and memory configured to store one or more programs; the one or more programs, when executed by the one or more processors, implement a method of footwear identification as described above.

According to a fifth aspect of embodiments of the present application, there is provided a footwear identification system comprising: mobile terminal and as above shoes recognition device, mobile terminal includes:

a motion sensor configured to acquire gait data of a recognition object;

a data communication module configured to transmit the gait data to the footwear identification device.

According to the shoe identification method, the device and the system, the computer storage medium and the electronic equipment, after the gait data of the user in daily life are acquired, the type of the shoe worn by the user is identified by utilizing the pre-established deep attention network Sensing-HH, so that the shoe identification is more effective and accurate.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic flow chart illustrating an implementation of a method of identifying footwear in an embodiment of the present application;

FIG. 2 shows a schematic view of the structure of a footwear identification device in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an electronic device in an embodiment of the present application;

FIG. 4 shows a schematic representation of the structure of a footwear identification system in an embodiment of the present application.

Detailed Description

In the process of implementing the present application, the inventors found that:

although footwear changes gait, there has been little research on footwear identification. Existing mainstream methods typically use primarily RGB cameras, specialized motion capture systems, ground reaction force sensors or Microsoft Kinect sensors, all of which are limited by the laboratory. Researchers have also begun to use specialized wearable motion sensors to study the effects of different shoes on gait parameters, and their research on general gait analysis using motion sensors has focused on model-based methods that require first modeling gait and then converting sensor signals into certain gait-related physiological parameters based on a full understanding of the gait mechanism.

In summary, the existing research is still in the laboratory with limited environment, and there is no scheme for identifying shoes in daily life, especially no research aiming at high-heeled shoes.

In view of the above problems, embodiments of the present application provide a method, an apparatus, a system, a computer storage medium, and an electronic device for identifying footwear.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Fig. 1 shows a schematic flow chart of an implementation of a method for identifying footwear according to an embodiment of the present application.

As shown, the footwear identification method includes:

step 101, acquiring gait data of an identification object;

and 102, identifying the shoe type of the identification object according to the gait data and the pre-established deep attention network Sensing-HH.

According to the shoe identification method provided by the embodiment of the application, after the gait data of the user in daily life is acquired, the type of the shoe worn by the user is identified by using the pre-established deep attention network Sensing-HH, so that the shoe identification is more effective and accurate.

In one embodiment, acquiring gait data identifying a subject comprises:

acquiring gait data of an identification object by using a motion sensor carried by the identification object;

the motion sensor includes a three-axis accelerometer configured to measure acceleration and gravity values in the direction X, Y, Z and a three-axis gyroscope configured to acquire angular velocities of spatial rotation.

In one embodiment, the motion sensor carried by the identification object is held on the identification object's hand, or is tied to the identification object's waist, or is placed in the identification object's pocket.

In one embodiment, identifying the shoe type of the recognition object according to the gait data and the pre-established deep attention network Sensing-HH comprises:

transmitting acceleration data in the gait data to a first deep attention network in a pre-established deep attention network Sensing-HH to obtain acceleration characteristics; transmitting the angular velocity data in the gait data to a second depth attention network in a pre-established depth attention network Sensing-HH to obtain an angular velocity characteristic;

and inputting the acceleration characteristic and the angular speed characteristic into a classification layer in a pre-established deep attention network Sensing-HH to obtain the shoe type of the identification object.

In one embodiment, the pre-established deep attention network Sensing-HH includes a first deep attention network, a second deep attention network, and a classification layer; the first deep attention network is configured to learn acceleration characteristics of the gait data, the second deep attention network is configured to learn angular velocity characteristics of the gait data, and the classification layer is configured to identify a shoe type from the acceleration characteristics and the angular velocity characteristics.

In one embodiment, before transmitting the gait data to the deep attention network, further comprising:

converting the gait data into an equal sampling time sequence;

and dividing the time sequence of equal sampling by a sliding window according to a preset time window and overlapping, and segmenting the time sequence of equal sampling into each subsequence.

In one embodiment, converting the sensor data into an iso-sampled time series includes:

and (3) interpolating the gait data by adopting a cubic spline method, and converting an original signal sequence of the gait data into an equal sampling time sequence.

and filtering the gravity component in the gait data based on the combination of empirical mode decomposition and wavelet threshold.

In one embodiment, after segmenting the equally sampled time series into sub-sequences, the method further comprises:

each value in the subsequence is scaled with the mean-standard deviation of all values in the feature.

In one embodiment, the first deep attention network and/or the second deep attention network comprises:

a deep hybrid connection network, and an attention network configured to divert spatial attention to the deep hybrid connection network.

In one embodiment, the deep hybrid connection network comprises a convolutional layer CNN, a weighted pooling layer, a bidirectional long-short term memory layer BilSTM and a classification output layer,

the convolutional layer is configured to extract spatial features from the gait data;

the weighted pooling layer is configured to weight-pool the spatial features according to an output of the attention network;

the two-way long-short term memory layer is configured to learn two-way long term dependencies of significant features in the spatial features;

the classification output layer is configured to output a feature map.

In one embodiment, extracting spatial features from gait data comprises:

convolving the feature map of the previous layer with a predetermined number of convolution kernels;

according to the output of convolution operation and the deviation obtained by pre-learning, and processing by using an activation function, obtaining a feature map of the next layer

Where X and Y are the size of the 2D convolution kernel running in space and time, respectively, M' is the number of feature maps in the convolution layer (l-1),

is the local filter weight tensor that is,

is a deviation.

In one embodiment, learning a bi-directional long-term dependence of a significant feature in a spatial feature comprises:

inputting the same data into forward LSTM and backward LSTM respectively;

two hidden states are connected in series to calculate Bi-LSTMy_tThe final output of (c) is:

h_t＝LSTM(x_t，h_t-1)

wherein the content of the first and second substances,

is a forward LSTM hidden state, h_tIs to hide the state back to the LSTM at each time step t, LSTM (—) represents the LSTM operation,

and W_hRepresenting the weights of the forward LSTM and backward LSTM, respectively, and b is the deviation at the output layer.

In one embodiment, an attention network, comprising: the system comprises a convolutional layer, a global average pooling layer and a classification output layer;

the global average pooling layer is configured to generate a class activation graph according to the spatial features; the global average pool outputs the spatial average value of the feature map of each unit of the last convolutional layer;

and the classification output layer is configured to output the spatial average value after weighting processing.

In one embodiment, the spatial average is weighted and output, specifically, the m-class score of the output is obtained by using the following formula:

wherein m is the class, S_mIndicating the overall importance of the convolution activation for the class m,

the respective weight for unit k in the last convolutional layer of spatial location (c, t) and the respective input of the classification layer;

f_k(c, t) represents the activation of the unit k in the last convolutional layer of spatial location (c, t), c represents the signal channel, and t represents the time stamp of the signal.

In one embodiment, inputting acceleration characteristics and angular velocity characteristics into a classification layer in a pre-established deep attention network Sensing-HH to obtain a shoe type of an identification object, includes:

determining an acceleration characteristic and a weight thereof, and an angular velocity characteristic and a weight thereof;

and predicting the acceleration characteristic and the angular velocity characteristic through a classification layer in a pre-established deep attention network Sensing-HH to obtain the shoe type of the identification object.

Based on the same inventive concept, the embodiment of the application provides a shoe identification device, the principle of the device for solving the technical problem is similar to a shoe identification method, and repeated parts are not repeated.

Fig. 2 shows a schematic view of the structure of a footwear identification device according to a second embodiment of the present application.

As shown, the footwear identification device includes:

a data acquisition module 201 configured to acquire gait data of an identification subject;

a footwear identification module 202 configured to identify a footwear type of the recognition object based on the gait data and a pre-established deep attention network Sensing-HH.

According to the shoe identification device provided by the embodiment of the application, after the gait data of the user in daily life is acquired, the type of the shoe worn by the user is identified by utilizing the pre-established deep attention network Sensing-HH, so that the shoe identification is more effective and accurate.

In one embodiment, the data acquisition module is configured to acquire gait data of the identification object by using a motion sensor carried by the identification object;

In one embodiment, a footwear identification module, comprising:

an acceleration characteristic unit configured to transmit acceleration data in the gait data to a first deep attention network in a pre-established deep attention network Sensing-HH to obtain an acceleration characteristic;

an angular velocity feature unit configured to transmit angular velocity data in the gait data to a second deep attention network in a pre-established deep attention network Sensing-HH, resulting in an angular velocity feature;

a footwear identification unit configured to input the acceleration characteristic and the angular velocity characteristic to a classification layer in a pre-established deep attention network Sensing-HH, resulting in a footwear category of the identification object.

In one embodiment, the apparatus may further comprise:

a preprocessing module configured to convert the gait data into an equally sampled time series prior to transmission to the deep attention network; and dividing the time sequence of equal sampling by a sliding window according to a preset time window and overlapping, and segmenting the time sequence of equal sampling into each subsequence.

In one embodiment, the preprocessing module is further configured to filter the gravity component of the gait data based on a combination of empirical mode decomposition and wavelet thresholding prior to transmitting the gait data to the deep attention network.

In one embodiment, the pre-processing module is further configured to scale each value in the sub-sequence with the mean-standard deviation of all values in the feature after segmenting the equally sampled time sequence into the respective sub-sequences.

In one embodiment, the footwear identification unit is configured to determine acceleration characteristics and weights thereof, and angular velocity characteristics and weights thereof; and predicting the acceleration characteristic and the angular velocity characteristic through a classification layer in a pre-established deep attention network Sensing-HH to obtain the shoe type of the identification object.

Based on the same inventive concept, embodiments of the present application further provide a computer storage medium, which is described below.

A computer storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the footwear identification method as described above.

By adopting the computer storage medium provided by the embodiment of the application, after the gait data of the daily life of the user is acquired, the type of the shoes worn by the user is identified by utilizing the pre-established deep attention network Sensing-HH, so that the identification of the shoes is more effective and accurate. .

Based on the same inventive concept, the embodiment of the present application further provides an electronic device, which is described below.

Fig. 3 shows a schematic structural diagram of an electronic device in the fourth embodiment of the present application.

As shown, the electronic device includes memory 301 configured to store one or more programs, and one or more processors 302; the one or more programs, when executed by the one or more processors, implement a method of footwear identification as described above.

By adopting the electronic equipment provided by the embodiment of the application, after the gait data of the daily life of the user is acquired, the type of the shoes worn by the user is identified by utilizing the pre-established deep attention network Sensing-HH, so that the shoe identification is more effective and accurate. .

Based on the same inventive concept, the embodiment of the present application further provides a footwear identification system, which is described below.

Fig. 4 shows a schematic view of a shoe identification system according to a fifth embodiment of the present application.

As shown, a footwear identification system, comprising: a mobile terminal 401, and a footwear identification device 402 as described above; a mobile terminal, comprising:

a motion sensor configured to acquire gait data of a recognition object;

By adopting the shoe identification system provided by the embodiment of the application, after the gait data of the user in daily life is acquired, the type of the shoe worn by the user is identified by utilizing the pre-established deep attention network Sensing-HH, so that the shoe identification is more effective and accurate. .

In one embodiment, the motion sensor may include an acceleration sensor, an angular velocity sensor, and the like.

In one embodiment, the mobile terminal is a handheld communication device (e.g., a smart phone, etc.) or a wearable device.

In order to show the usage scenario of the present application more clearly, the embodiment of the present application is described as a specific example.

This application embodiment divides high-heeled shoes according to the high range of heel, obtains following table:

height of heel	Footwear with improved lateral stability
		0-2.54cm	Flat heel
2.54-7.62cm	Middle heel
		＞7.62cm	Ultra-high heel

According to the embodiment of the application, a certain number of participants with similar BMIs at a certain age stage are selected, all the participants carry intelligent equipment provided with a data recording program, and walk on a flat ground for a certain time to obtain gait data.

In particular, the smart device with the data logging program installed can be held in the hand, strapped to the waist, or placed in a handbag, respectively.

The data recording program records the acceleration of the intelligent device in X, Y and Z directions by utilizing three-axis acceleration, including gravity, and obtains the angular velocity of the intelligent device when the intelligent device rotates in space by utilizing a three-axis gyroscope.

Taking the motion sensor sequence S as an example, which is a multi-dimensional time series with fixed sampling intervals after the resampling and interpolation process is completed, for the mth subject, his (or her) motion sensor sequence can be defined as Sm, where Tm is the total number of sampling points of the sequence.

Wherein the content of the first and second substances,

represents the data at the ith sample point of the subject, i e [1, T_m]As shown below, the present invention refers specifically to the 6 values collected by the three-axis accelerometer and the three-axis gyroscope:

in the embodiment of the application, the sequence Sm is divided into a plurality of subsequences in a sliding window mode with a fixed size and is used as a sample for subsequent pattern recognition.

S_mK sub-sequence of

Wherein the content of the first and second substances,

w is the size of the subsequence and θ is the overlap of two adjacent subsequences. Subsequence(s)

The corresponding sampling point is from

To

The embodiment of the application provides a method for identifying the category of a thin high-heeled shoe by introducing a space attention mechanism, and particularly, a deep attention network is created.

The deep attention network comprises two streams which are respectively used for processing acceleration data and angular velocity data, and each stream comprises a signal preprocessing module, a deep hybrid connection network module, a space attention module and a fusion module.

Wherein the content of the first and second substances,

(1) a signal pre-processing module comprising:

interpolating by using a cubic spline method, unifying the sampling rate into 200Hz, and converting an original signal sequence into an equal-sampling time sequence by resampling and interpolating;

since the signal sequence is continuous stream data, the embodiment of the present application uses a sliding window to segment the processed data into sub-sequences;

filtering the gravity component based on a gravity filtering method combining empirical mode decomposition and wavelet threshold;

each value in the subsequence is scaled using the mean-standard deviation of all values in the feature, and the training data set is normalized.

(2) A deep hybrid connectivity network module comprising:

the deep hybrid connection network comprises two CNN-BilSTM sub-networks, each sub-network is responsible for analyzing and processing acceleration and angular velocity signals, specifically comprises a stacked convolution layer and a bidirectional long-short term memory layer, carries out omnibearing deconstruction on the signals from a space-time dimension, and learns deep representation of the signals at the same time.

There are two main components in CNN-BiLSTM, the first being a stacked two-dimensional CNN configured to extract spatial features from processed sensory data (e.g., acceleration and angular velocity). The second is BilSTM, which is responsible for learning the bi-directional long-term dependence of the significant features of CNN extraction.

The embodiment of the present application processes these subsequences with CNN in the same way as similar processing (static) image data, and the specific process is as follows:

the feature map of the previous layer is convolved with several convolution kernels (learned during training), and the output of the convolution operation is added with an offset (to be learned) and then processed through an activation function to form the feature map of the next layer

is the local filter weight tensor,

is a deviation.

Embodiments of the present application also incorporate Long Short Term Memory (LSTM) units that can handle temporal coherence, a disadvantage of conventional LSTM's is that they can only use the previous context, and the embodiments of the present application use an improved model Bi-LSTM, which first inputs the same data into forward LSTM and backward LSTM, respectively, and then concatenates the two hidden states to compute Bi-LSTMy_tThe final output of (c) is:

h_t＝LSTM(x_t，h_t-1)

wherein the content of the first and second substances,

is a forward LSTM hidden state, and h_tIs to hide the state back to the LSTM at each time step t, LSTM (—) represents the LSTM operation,

and W_hRepresenting the weights of the forward LSTM and backward LSTM, respectively, while b is the deviation at the output layer.

(3) A spatial attention module, comprising:

according to the embodiment of the application, an attention network is constructed through a deep hybrid connection network, a global average pool in a CNN part is used for generating a class activation graph, the global average pool outputs a spatial average value of a feature graph of each unit of the last convolutional layer, and the values are weighted and configured to generate a final output; likewise, a weighted sum of the feature maps of the last convolutional layer is computed to obtain the class activation map.

The following describes a case of sorting using softmax.

The weights of the softmax layer in the embodiments of the present application are propagated back to the convolutional layer to decompose the multidimensional time series into significant regions and non-significant regions, the significant regions being considered to contain information about the discriminative gait pattern of wearing high-heeled shoes, which provides important information about predefined footwear.

For a given sensor signal instance, embodiments of the present application use f_k(c, t) represents the activation of the unit k in the last convolutional layer of the spatial location (c; t), where c represents the signal channel and t represents the time stamp of the signal. For a certain class m, the respective weight of a unit k and the respective input of the softmax layer are represented as

The result that a global average pooling can be obtained is denoted as F^k：

For a given class m, the softmax input Sm may indicate the overall importance of convolution activation for the class m, and may result in:

the Attm of the class activation map may also be defined as a class m, which may directly indicate the importance of the activation of the class m at the spatial position (c, t).

Finally, after all these processes are completed, the embodiments of the present application obtain a set of compatibility scores for the m classes of outputs by the softmax function:

the embodiments of the present application shift spatial attention to deep hybrid connectivity networks, emphasizing significant areas with discriminative information.

(4) Fusion module

The present embodiment uses a Fully Connected (FC) layer over two CNN-LSTM subnets (branches) to generate a probability score on the target label, and then combines the designed fusion module with an attention-weighted learning strategy.

The embodiments of the present application modify the attention mechanism to take two sources as inputs and obtain a calculated attention weight from each source to generate a prediction of the current input through the softmax layer.

The outputs of the two CNN-BiLSTM subnets are taken as inputs and weights are calculated for each subnet output as follows:

s＝W′₁x₁+W′₂X₂

e_i＝υ^Ttanh(s+W′_ix_i)

wherein, W'₁、W′₂Are weighting parameters of different branches, and x₁、x₂Respectively, features learned from acceleration data and angular velocity data.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be implemented by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps configured to implement the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of identifying footwear, comprising:

acquiring gait data of an identification object;

2. The method of claim 1, wherein the acquiring gait data identifying a subject comprises:

acquiring gait data of the identification object by using a motion sensor carried by the identification object;

3. The method of claim 2, wherein the motion sensor carried by the identification object is held on the identification object's hand, or is strapped to the identification object's waist, or is placed in the identification object's pocket.

4. The method according to claim 1, wherein said identifying a shoe type of the recognition target based on the gait data and a pre-established deep attention network Sensing-HH comprises:

5. The method of claim 4, further comprising, prior to transmitting the gait data to the deep attention network:

converting the gait data into an isosampled time series;

and carrying out sliding window division on the time sequence of the equal sampling according to a preset time window and overlapping, and segmenting the time sequence of the equal sampling into each subsequence.

6. The method of claim 5, wherein converting the sensor data into an equally sampled time series comprises:

7. The method of claim 5, further comprising, prior to transmitting the gait data to the deep attention network:

8. The method of claim 5, wherein after said segmenting the time series of equal samples into subsequences, further comprising:

9. The method of claim 4, wherein the first deep attention network and/or the second deep attention network comprises:

10. The method of claim 9, wherein the deep hybrid connection network comprises a convolutional layer (CNN), a weighted pooling layer, a bi-directional long short term memory layer (BilsTM), and a classification output layer,

the convolutional layer is configured to extract spatial features from gait data;

the two-way long-short term memory layer is configured to learn two-way long-term dependencies of significant features in the spatial features;

the classification output layer is configured to output a feature map.

11. The method of claim 10, wherein said extracting spatial features from gait data comprises:

is the local filter weight tensor that is,

is a deviation.

12. The method of claim 10, wherein learning the bi-directional long-term dependence of significant ones of the spatial features comprises:

inputting the same data into forward LSTM and backward LSTM respectively;

two hidden states are connected in series to calculate Bi-LSTM y_tThe final output of (c) is:

h_t＝LSTM(x_t，h_t-1)

wherein the content of the first and second substances,

13. The method of claim 9, wherein the attention network comprises: the system comprises a convolutional layer, a global average pooling layer and a classification output layer;

the classification output layer is configured to output the spatial average value after weighting processing.

14. The method according to claim 13, wherein the spatial average is weighted and outputted, specifically, m-class scores of outputs are obtained by using the following formula:

15. The method according to claim 4, wherein the inputting the acceleration characteristic and the angular velocity characteristic into a classification layer in a pre-established deep attention network Sensing-HH to obtain the shoe type of the recognition object comprises:

determining the acceleration characteristic and the weight thereof, and the angular velocity characteristic and the weight thereof;

16. A footwear identification device, comprising:

17. The apparatus of claim 16, wherein the data acquisition module is configured to acquire gait data of an identification subject using a motion sensor carried by the identification subject;

18. The apparatus of claim 17, wherein the motion sensor carried by the identification object is held on the identification object's hand, or is tied to the identification object's waist, or is placed in the identification object's pocket.

19. The apparatus of claim 16, wherein the footwear identification module comprises:

a footwear identification unit configured to input the acceleration characteristics and the angular velocity characteristics to a classification layer in a pre-established deep attention network Sensing-HH, resulting in a footwear category of the identification object.

20. The apparatus of claim 19, further comprising:

a pre-processing module configured to convert gait data into an equally sampled time series prior to transmission to a deep attention network; and carrying out sliding window division on the time sequence of the equal sampling according to a preset time window and overlapping, and segmenting the time sequence of the equal sampling into each subsequence.

21. The apparatus of claim 20, wherein the preprocessing module is further configured to filter the gravity component of the gait data based on a combination of empirical mode decomposition and wavelet thresholding prior to transmitting the gait data to the deep attention network.

22. The apparatus of claim 20, wherein the pre-processing module is further configured to scale each value in the subsequence with a mean-standard deviation of all values in the feature after the segmenting the equally sampled time sequence into subsequences.

23. The apparatus according to claim 19, wherein the footwear identification unit is configured to determine the acceleration characteristic and its weight, and the angular velocity characteristic and its weight; and predicting the acceleration characteristic and the angular velocity characteristic through a classification layer in a pre-established deep attention network Sensing-HH to obtain the shoe type of the identification object.

24. A computer storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 15.

25. An electronic device comprising one or more processors, and memory configured to store one or more programs; the one or more programs, when executed by the one or more processors, implement the method of any of claims 1 to 15.

26. A footwear identification system, comprising: a mobile terminal and comprising a footwear identification device according to any of claims 16 to 23; the mobile terminal includes:

a motion sensor configured to acquire gait data of a recognition object;

27. The system of claim 26, wherein the motion sensors comprise an acceleration sensor, and an angular velocity sensor.

28. The system of claim 26, wherein the mobile terminal is a handheld communication device or a wearable device.