CN114499712B

CN114499712B - Gesture recognition method, device and storage medium

Info

Publication number: CN114499712B
Application number: CN202111578495.2A
Authority: CN
Inventors: 单元元; 周济; 王小乾; 李伟泽; 赵素雅; 朱帅
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2024-01-05
Anticipated expiration: 2041-12-22
Also published as: CN114499712A

Abstract

The application relates to a gesture recognition method, gesture recognition equipment and a storage medium. The application discloses a gesture recognition method, comprising the following steps: acquiring first CSI gesture data from a first environment; training an improved challenge-discrimination domain adaptive ADDA model using the first CSI gesture data, wherein the ADDA model is improved by using a convolutional neural network CNN-two-way long-short-term memory BiLSTM network model, which is a network model of CNN combined with attention-based BiLSTM, as a feature extractor; the ADDA model is used for gesture recognition based on second CSI gesture data from a second environment, wherein the first environment and the second environment are different.

Description

Gesture recognition method, device and storage medium

Technical Field

The invention relates to the technical field of communication, in particular to a method for gesture recognition based on channel state information (Channels State Information, CSI) of a WIFI system.

Background

With the continuous development of the internet of things, man-machine interaction technology applied to intelligent home and medical health scenes is becoming more and more important. The gesture becomes an important mode in man-machine interaction due to the characteristics of convenience, easy understanding, rich meaning and the like, meanwhile, the wireless sensing technology is broken through continuously, and the WIFI equipment is widely deployed in the living environment, so that research on gesture recognition based on channel state information of the WIFI system becomes a hotspot direction. The gesture recognition by using the CSI of the WIFI system has the advantages of being not influenced by illumination, not needing to wear special equipment and the like.

However, the current gesture recognition method based on CSI of WIFI system mainly faces the following two problems:

1) Is easy to be disturbed by the environment

Although the existing gesture recognition technology based on WIFI is very high in recognition rate in a single environment, along with the change of the WIFI environment, such as different indoor object placement positions, even the change of experimenters, the multipath effect of a WIFI channel can be influenced, the characteristic distribution of collected gesture data is different, and finally, the gesture recognition accuracy is greatly reduced. That is, the accuracy of gesture recognition is greatly reduced by training a model using gesture data in one environment and then directly applying the model to other different environments.

2) Gesture motion in gesture data is single

When the existing gesture recognition method based on WIFI is used for collecting data aiming at WIFI CSI activity, each sample of human body activity CSI is single, namely experimental data are ideal, and time sequence characteristics of the WIFI CSI data are not considered, so that the experimental result is over-fitted.

The existing gesture recognition technical scheme based on the WIFI system CSI mainly solves the problems of classification and recognition of gesture features. Patent CN109766951a provides a WIFI gesture recognition method based on time-frequency statistics. According to the method, gesture data received by a network card are utilized to conduct CSI amplitude data extraction, the CSI amplitude data are preprocessed through low-pass filtering to reduce environmental noise, then the CSI amplitude data are subjected to dimension reduction processing through singular value decomposition (Singular Value Decomposition, SVD) algorithm, then time-frequency characteristics of signals are extracted through Short-time Fourier transform (Short-Time Fourier Transform, STFT), statistical characteristics which can be used for classification are obtained through statistical characteristic extraction and characteristic standardization processing of the time-frequency characteristics, and finally a k-Nearest Neighbor (kNN) classification algorithm is utilized to conduct classification judgment on gestures. The method can effectively classify and identify gesture features, solves the problem of gesture identification in a single indoor complex environment, and does not solve the problem of poor universality.

With the development of the internet of things, the WIFI gesture recognition technology is widely applied to intelligent home and medical health scenes, and the problems of easiness in environmental interference and poor universality of the existing WIFI gesture recognition method are unfavorable for the wide-range application of the WIFI gesture recognition technology.

Disclosure of Invention

In view of the above, the embodiment of the invention provides a WIFI gesture recognition method and device based on field self-adaptation, so as to solve the problems that the existing WIFI gesture recognition method is easy to be interfered by environment and poor in universality.

According to a first aspect, an embodiment of the present invention provides a gesture recognition method, including: acquiring first CSI gesture data from a first environment; training an improved challenge-discrimination domain adaptive ADDA (Adversarial Discriminative Domain Adaptation) model using the first CSI gesture data, wherein the add model is improved by using a convolutional neural network CNN-two-way long-short-term memory BiLSTM network model as a feature extractor, the CNN-BiLSTM network model being a network model of CNN combined with a attention-mechanism-based BiLSTM; the ADDA model is used for gesture recognition based on second CSI gesture data from a second environment, wherein the first environment and the second environment are different.

The existing WIFI CSI gesture recognition technology is used for model training in a single environment, and if the existing WIFI CSI gesture recognition technology is directly applied to other environments, gesture recognition accuracy is reduced, and labeled data needs to be collected for training again.

According to the method, the adaptive model of the contrast domain is applied to the CSI gesture recognition field, the feature extraction network in the adaptive ADDA model of the contrast discrimination field is modified into the CNN-BiLSTM, the problem that data training needs to be collected again in different environments is solved, and in addition, the universality of the gesture recognition model can be improved by using the improved ADDA model, and the influence of environmental factors is reduced.

With reference to the first aspect, in a first implementation manner of the first aspect, the method further includes: acquiring the second CSI gesture data; and classifying the second CSI gesture data by using the trained improved ADDA model to obtain a corresponding gesture class.

The training of the ADDA model can be performed by using only the first CSI gesture data from the first environment, and then the trained ADDA model can be used for gesture classification of the CSI gesture data from another different environment without the need of retraining the labeled data collected from the other different environment.

With reference to the first aspect, in a second implementation manner of the first aspect, the adds model is further improved by using a wasbertein distance-based loss function as the challenge-generating loss function.

The application further improves the ADDA model, and changes the loss function generating countermeasure into a loss function based on Wasserstein distance. The loss function based on the Wassertein distance is used as the loss function in the ADDA model, so that the model trained by certain environmental data can still have higher accuracy when being applied to other different environments, and the problem of poor universality of the gesture recognition method is further solved.

With reference to the first aspect, in a third implementation manner of the first aspect, the CNN-BiLSTM network model is trained using third CSI gesture data from the first environment or the second environment.

That is, the CNN-BiLSTM network model may be trained using gesture data in a single environment. And the third CSI gesture data may be the same as or different from the first CSI gesture data or the second CSI gesture data. Specifically, the CNN-BiLSTM network model can be trained by using data from different environments respectively, and finally the CNN-BiLSTM network model trained by using the data from which environment is determined by a certain index, so that the mode effect is optimal.

With reference to the first aspect, in a fourth implementation manner of the first aspect, the training an improved countermeasure discrimination domain adaptive add model using the first CSI gesture data includes: dividing the first CSI gesture data into source domain data and target domain data; extracting features and classifying gestures from the source domain data by using the CNN-BiLSTM network model to obtain gesture types and extracted features; and training the improved ADDA model using the gesture category and the extracted features and the target domain data.

The feature extraction and gesture classification of the source domain data by using the CNN-BiLSTM network model are preprocessing of the source domain data, so as to improve the quality of the data.

With reference to the first aspect, in a fifth implementation manner of the first aspect, the first CSI gesture data and/or the second CSI gesture data includes CSI gesture data associated with gesture actions performed in a plurality of different directions.

The existing gesture recognition technology generally adopts a gesture recognition algorithm under a single environment, the algorithm has very large regulation on gesture actions of experimenters and limitation on equipment, such as the limitation on the directions of the experimenters and the placement of the experimenters, which are different from the actual living scenes.

The CSI gesture data acquired by the method not only relate to a plurality of different environments, but also relate to gesture actions performed in a plurality of different directions, so that the diversity of gestures is increased and the gesture actions in a gesture database are enriched.

With reference to the first aspect, in a sixth implementation manner of the first aspect, the acquiring channel state information CSI gesture data includes: collecting original CSI gesture data, wherein the original CSI gesture data are CSI data packets with three-dimensional data dimensions; denoising the original CSI gesture data; and extracting gesture segments from the denoised raw CSI gesture data.

With reference to the first aspect, in a seventh implementation manner of the first aspect, the extracting the gesture segment from the denoised raw CSI gesture data includes: determining the size of the sliding window; calculating the variance of the denoised raw CSI gesture data in each window; and extracting gesture fragments according to the variance and the size of the buffer.

The gesture segment interception algorithm based on the buffer zone is used for intercepting gesture CSI data. The gesture segment intercepting algorithm based on the buffer zone is improved on the basis of gesture segment intercepting based on a sliding window, and the buffer zone is added to buffer the gesture segments when the beginning and ending stages of gesture data are judged.

According to a second aspect, an embodiment of the present invention provides an electronic device, including: the gesture recognition system comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions so as to execute the gesture recognition method in the first aspect or any implementation manner of the first aspect.

According to a third aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to perform the gesture recognition method of the first aspect or any implementation manner of the first aspect.

Drawings

The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:

FIG. 1 illustrates a flow chart of a gesture recognition method according to an embodiment of the present application;

FIG. 2 illustrates a flowchart of a method for acquiring CSI gesture data according to an embodiment of the present application;

FIG. 3 shows a gesture collection device layout;

FIG. 4 illustrates a flowchart of a method for collecting raw CSI gesture data according to an embodiment of the present application;

FIG. 5 illustrates a flowchart of a method for denoising raw CSI gesture data according to an embodiment of the present application;

FIG. 6 illustrates a flowchart of a method for extracting gesture segments from de-noised raw CSI gesture data according to an embodiment of the present application;

FIG. 7 illustrates a flowchart for determining a beginning of a gesture segment according to one embodiment of the present application;

FIG. 8 illustrates a gesture recognition apparatus according to an embodiment of the present application;

fig. 9 shows an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Fig. 1 shows a flowchart of a field-based self-adaptive WIFI gesture recognition method according to an embodiment of the present application, where the field-adaptive model is applied to the WIFI CSI gesture recognition field, and tag-free data in one environment is predicted by using tag-free data in another environment, so that the cost of data collection is reduced, the universality of the gesture recognition method is improved, and the influence of environmental factors is reduced. Specifically, the domain self-adaptive model is an improved challenge-discrimination domain self-adaptive ADDA model, and a feature extraction network in the ADDA model is modified into CNN-BiLSTM, which solves the problem of requiring data re-collection training in different environments, and furthermore, the improved ADDA can improve the universality of the gesture recognition model and reduce the influence of environmental factors.

As shown in fig. 1, the gesture recognition method may include:

s11: first CSI gesture data from a first environment is acquired.

S12: training an improved challenge-discrimination domain adaptive ADDA model using the first CSI gesture data, wherein the ADDA model is improved by using a Convolutional Neural Network (CNN) -two-way long-short-term memory (BiLSTM) network model as a feature extractor, the CNN-BiLSTM network model being a network model of a combination of CNN and BiLSTM based on an attention mechanism; the ADDA model is used for gesture recognition based on second CSI gesture data from a second environment, wherein the first environment and the second environment are different.

The feature extractor in this modified ADDA model uses the CNN-BiLSTM network model. Among them, biLSTM is a machine learning algorithm, one of which is characterized by a mechanism of attention. That is, the present application employs a combination of CNN and attention-mechanism-based BiLSTM network structure to extract gesture features. The CNN-BiLSTM network model may be trained using gesture data from a single environment (e.g., the first environment in S11), in particular, by:

for the data after preprocessing input into the network, the CNN is used to perform feature extraction on the data, after the high-dimensional feature is extracted, the high-dimensional feature map is mapped into sequence features, feature representations (for example, feature representation tables set before learning and containing feature values such as gender (male and female)) related to the global are learned through BiLSTM with an attention mechanism, then the features are input into a fully connected layer, and finally gesture classification is performed. At the initial stage of model training, training data with labels are input into a CNN-BiLSTM network to obtain a predicted label, then the predicted value and a real label value are utilized to calculate a loss function value, and model parameters are updated and optimized through back propagation of a gradient-based optimization method.

The loss function of gesture feature extraction may be cross entropy loss, and the calculation formula is as follows:

wherein p is _i Representing the actual output of the neuron; y is _i Indicating the desired output, 1 indicates a positive class, and 0 indicates a negative class.

In an embodiment, the gesture recognition method further includes:

acquiring the second CSI gesture data;

and classifying the second CSI gesture data by using the trained improved ADDA model to obtain a corresponding gesture class.

The training of the ADDA model can be performed by using only the first CSI gesture data from the first environment, and then the trained ADDA model can be used for gesture classification of the second CSI gesture data from another different second environment without the need of collecting tagged data from the second environment for retraining.

In a preferred embodiment, the ADDA model is further improved by using a Wassentin distance-based loss function as the challenge-generating loss function. Specifically, the penalty function of generating the challenge in the ADDA model is replaced with a penalty function based on the Wasserstein distance. That is, in the preferred embodiment, the present application further improves upon the ADDA model described above, and the feature extractor uses the CNN-BiLSTM network model, while exchanging the challenge-generating penalty function for a Wassentin distance-based penalty function.

As described above, using the wasbertein distance-based loss function as the loss function in the adds model can enable the model trained by certain environmental data to still have higher accuracy when applied to other different environments, thereby further solving the problem of poor universality of the gesture recognition method.

More specifically, the loss function based on wasbertein distance is improved with respect to the original GAN objective function. The ADDA model may include a source domain feature extractor, a target domain feature extractor, a domain arbiter, a classifier, and a generator. This improvement in this application removes the sigmoid of the last layer of the arbiter, the loss functions of the generator and arbiter do not take log, the loss functions of the arbiter are as follows:

wherein E represents a loss function in machine learning, M _S Representing source domain feature mapping, M _t Representing target domain feature mapping, x _t And x _s Representing some two points in space.

Wherein D is a discriminant, the following formula is required to be satisfied with the 1-Lischitz constraint:

|D(x ₁ )-D(x ₂ )|≤|x ₁ -x ₂ |

wherein x is ₁ And x ₂ Representing the coordinates of some two points.

To satisfy the 1-Lisschitz constraint, a gradient penalty term needs to be added to the loss function of the arbiter, as follows:

finally, the optimization objective formula of the domain discriminator is:

wherein λ is a constant between 0 and 1.

The final optimization objective of the feature extractor is the following:

further, the process of training the improved challenge-discrimination domain adaptive adds model with the first CSI gesture data may include:

the first CSI gesture data is divided into source domain data and target domain data. As an example, such partitioning may be performed randomly at a preset ratio.

And performing feature extraction and gesture classification on the source domain data by using a CNN-BiLSTM network model to obtain gesture types (for example, gesture labels) and extracted features.

An improved ADDA model is trained using the gesture class and the extracted features and the target domain data. Specifically, the preprocessed source domain data (i.e., the gesture category and the extracted feature) and the target domain data are respectively fed into the source domain and the target domain feature extractor, and a domain discriminator is used to determine which domain the feature comes from. Furthermore, at model training time, the parameters of the source domain feature extractor are fixed, and the weight values of the target domain feature extractor are initialized by the parameters in the source domain feature extractor.

S13: and classifying the second CSI gesture data by using the trained improved ADDA model to obtain a corresponding gesture class. Specifically, the second CSI gesture data is classified by using a trained target domain feature extractor and classifier, so as to obtain a target domain gesture tag.

Fig. 2 shows a flowchart of a method for acquiring CSI gesture data (the first CSI gesture data and/or the second CSI gesture data described above) according to an embodiment of the present application. The method comprises the following steps:

s111: raw CSI gesture data is collected. Specific embodiments thereof will be described below in connection with fig. 3 and 4, wherein fig. 3 shows a gesture acquisition device layout and fig. 4 shows a method flow diagram of one specific embodiment for collecting raw CSI gesture data.

S112: denoising the original CSI gesture data. Specific embodiments thereof will be described below in connection with fig. 5.

S113: gesture segments are extracted from the denoised raw CSI gesture data. Specific embodiments thereof will be described below in connection with fig. 6.

FIG. 3 shows a gesture collection device layout. In fig. 3, RX denotes a receiving end, and TX denotes a transmitting end. The placement of RX and TX may be the same in different environments, except for the surrounding environment. The two receivers RX may be spaced two meters apart, the transmitter may be 1.5m away from the center of the two receivers, and the laboratory personnel station may be at an O-point 0.75 meters away from the transmitter. In order to be consistent with the actual situation as much as possible, the experimenter performs gesture actions towards the directions of five arrows in the figure respectively facing TX, and the receiving end collects gesture data by using a CSI-Tools tool. Wherein the five arrow directions are merely examples, and the present application is not limited in this respect.

FIG. 4 illustrates a flowchart of a method for collecting raw CSI gesture data according to an embodiment of the present application. The method comprises the following steps:

s1111: the receiving and transmitting ends are connected. Specifically, the process utilizes a CSI-Tool software package to configure WIFI through terminal commands.

S1112: and setting a mode. Specifically, a WIFI communication channel and a data sampling frequency are selected.

Existing WIFI communication channels typically have 2.4G and 5G, and the present application preferably selects to be performed on the less interfering 5G channel because of the greater interference of the devices using the 2.4G channel.

In an example, the gesture motion performed by the experimenter may select four gestures commonly used in life, such as "swing up", "swing down", "swing left", and "swing right", after collection, the gesture is identified, and the time of the gesture motion performed by the experimenter is about 1.5s, and the acquisition time of the whole motion is about 2s. In this example, since the gesture motion is simple, it can be completed in about 1.5s, if the sampling frequency is set too low, the accurate gesture motion cannot be captured, and meanwhile, the problem of packet falling occurs, so the sampling frequency can be set to 1000HZ.

Further, in this example, in order to improve accuracy of gesture recognition, the present application may assume that the initial state of the experimenter is that the hands are perpendicular to the body trunk, that the arms are bent upward and downward perpendicular to the ground when making the "swing up" and "swing down" gestures, and that the arms are bent leftward and rightward parallel to the ground when making the "swing left" and "swing right", respectively. Meanwhile, when the four actions are performed in five different directions, the actions are kept at a uniform speed as much as possible, and the front and the rear of each action stay for a period of time as much as possible.

S1113: and (5) data receiving and transmitting. As a specific embodiment, the method and the device enable four persons to do four different gesture actions between a transmitting end (TX) and a receiving end (RX) respectively, and collect multipath effects generated by refraction, reflection and the like of signals on a propagation path.

S1114: and (5) data storage. Specifically, the matlab may be used to parse the collected CSI gesture data into a CSI packet with three dimensions.

FIG. 5 illustrates a flowchart of a method for denoising raw CSI gesture data according to an embodiment of the present application. The method selects discrete wavelet transform (Discrete Wavelet Transform, DWT) to perform denoising pretreatment on gesture CSI data. The method comprises the following steps:

s1121: a wavelet function is selected. In discrete wavelet transformation, a proper wavelet function is selected first and then the signal is subjected to multi-scale decomposition. The wavelet basis functions are not unique, the denoising effect based on different wavelet bases is different, and the wavelet basis functions need to be selected according to specific situations. As a preferred embodiment, the present application selects Symlets wavelet basis functions that result in finer granularity of CSI.

S1122: wavelet transform multi-scale decomposition. Specifically, the signal is divided into an approximate coefficient vector and a detail coefficient vector at this stage, the approximate coefficient and the detail coefficient are obtained after the appropriate layer number is selected for decomposition, and then the data is reconstructed by using the two coefficients. Let the discrete signal be denoted H (t), its decomposition can be expressed as:

H(t)＝A _n +D _n +D _n-1 +…+D ₁

where n represents the number of decomposed layers, a represents the low frequency approximation, and D represents the high frequency detail. The coefficients of each layer decomposition are described in further detail as follows:

wherein,representing approximation coefficients +.>Representing detail coefficients, x _n Representing the input of the n-th layer,<·>represents dot product->And->Is two orthogonal sets of wavelet basis functions, k referring to a point in the band.

S1123: and carrying out threshold quantization processing on wavelet coefficients on each scale. Specifically, the present application may choose a dynamic threshold to remove noise components from detail coefficients. The coefficients decomposed after wavelet transformation are compared with a set threshold, if the absolute value exceeds the threshold, the processing is not performed, otherwise, the value of the wavelet coefficient is set to 0.

S1124: the inverse wavelet transform reconstructs the signal. The present application may employ wavelet inverse transforms to reconstruct signals. The inverse transform of the discrete wavelet transform is expressed as:

FIG. 6 illustrates a flowchart of a method for extracting gesture segments from de-noised raw CSI gesture data, according to an embodiment of the present application. In this specific embodiment, a gesture segment interception algorithm based on a buffer is adopted to intercept gesture CSI data. The gesture segment intercepting algorithm based on the buffer zone is improved on the basis of gesture segment intercepting based on a sliding window, and the buffer zone is added to buffer the gesture segments when judging the starting and ending stages of gesture data. The method comprises the following steps:

s1131: the size of the sliding window is determined. In particular, the size of the sliding window may be determined in dependence on the frequency of the actual data samples.

The size of the sliding window can influence the extraction of the gesture segment, and if the window value is too large, redundant data are contained at two ends of the detected gesture segment; if the window value is selected too small, the middle part of the gesture segment is detected as a termination frame by mistake. After the buffer mechanism is adopted, the defect of too small window is avoided. In the case of a sample of experimentally collected data of 1000HZ, the window size may be selected to be 100 data packets, i.e. the sliding window size corresponds to a time length of 0.1s.

S1132: the variance in each window is calculated. Specifically, the mean and variance in each window are calculated as follows:

wherein S (i) represents the amplitude value of the ith window, mean (i) ⁿ Representing in the ith windowThe average amplitude value of the nth subcarrier, K, represents the window size, and Var (i) represents the window variance.

S1133: gesture segments are extracted from the denoised raw CSI gesture data according to the variance and the size of the buffer. Specifically, the window numbers corresponding to the start and end positions of the gesture segment may be determined by a variance threshold, a size of the buffer, and a corresponding threshold. A specific manner of determination will be described below in conjunction with fig. 7, where fig. 7 shows a flowchart for determining the beginning of a gesture segment according to an embodiment of the present application.

As shown in fig. 7, the parameters are initialized first, the variance threshold size thresh, the sizes of the two buffers, and the corresponding thresholds (buf 1, buf2, θ1, θ2) are set, and then the window numbers corresponding to the start and end positions of the gesture segment are determined.

When traversing the window in turn, let the segment corresponding to the window be S (i), and Ping Junfang difference be Var (i). When Var (i) is larger than the threshold value thresh for the first time, the window is stored in the buffer zone 1, and if the value in the buffer zone 1 is larger than the threshold value theta 1, the corresponding window sequence number at the beginning of the gesture is the window sequence number at the moment minus the values in the buffer zone 1 and the buffer zone 2. Otherwise, the window is stored in buffer 2, and if the value in buffer 2 is greater than threshold θ2, the values in both buffers are emptied and the previous steps are repeated to traverse the next window. By the above operation, the corresponding window sequence number t1 at the beginning of the gesture is determined.

After determining the beginning of the gesture, traversing a window from the beginning of the gesture, if the average variance value Var (i) of the window is larger than a threshold value thresh, firstly checking whether a buffer area is empty, if not, firstly adding the window value in the buffer area into a gesture segment, and then emptying the buffer area; otherwise, the window sequence is stored in the buffer area, and if the buffer area is full, the judgment is ended. Repeating the above steps until all windows are traversed or the algorithm judgment is finished. And integrating window serial numbers corresponding to the beginning and ending positions of the gesture segment to obtain a complete gesture segment.

Accordingly, referring to fig. 8, an embodiment of the present application provides an electronic device, which includes: a data acquisition unit 801, configured to acquire first CSI gesture data from a first environment; a model training unit 802 for training an improved challenge-discrimination domain adaptive add model using the first CSI gesture data, wherein the add model is improved by using a convolutional neural network CNN-two-way long-short-term memory BiLSTM network model as a feature extractor, the CNN-BiLSTM network model being a network model in which CNN is combined with a attention-mechanism-based BiLSTM, and the add model is used for gesture recognition based on second CSI gesture data from a second environment, wherein the first environment and the second environment are different.

Further functional descriptions of the above respective modules are the same as those of the above corresponding embodiments, and are not repeated here.

The embodiment of the present invention further provides an electronic device, as shown in fig. 9, where the electronic device may include a processor and a memory, where the processor and the memory may be connected by a bus or other means, and in fig. 9, the connection is exemplified by a bus.

The processor may be a central processing unit (Central Processing Unit, CPU). The processor may also be any other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof.

The memory, as a non-transitory computer readable storage medium, may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to the gesture recognition method in the embodiments of the present invention. The processor executes the non-transitory software programs, instructions and modules stored in the memory to perform various functional applications and data processing of the processor, i.e., to implement the gesture recognition method in the above-described method embodiments.

The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory may optionally include memory located remotely from the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory that, when executed by the processor, perform the gesture recognition method as described above.

The specific details of the electronic device are the same as those of the corresponding embodiments, and are not repeated here.

It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims

1. A method of gesture recognition, comprising:

acquiring first CSI gesture data from a first environment;

training an improved challenge-discrimination domain adaptive adds model using the first CSI gesture data to obtain a predictive label, wherein the adds model is improved by using a convolutional neural network CNN-two-way long-short-term memory BiLSTM network model as a feature extractor, the CNN-BiLSTM network model being a network model combining CNN with a attention-mechanism-based BiLSTM, the challenge-loss function of the adds model being a wasbertein distance loss function, the adds model further comprising a discriminator loss function:

wherein E represents a loss function in machine learning, M _S Representing source domain feature mapping, M _t Representing target domain feature mapping, x _t And x _s Representing a certain two points in space;

the ADDA model is used for carrying out gesture recognition based on second CSI gesture data from a second environment to carry out gesture classification on the second CSI gesture data, wherein the first environment and the second environment are different.

2. The gesture recognition method of claim 1, wherein the method further comprises:

acquiring the second CSI gesture data;

3. The gesture recognition method of claim 1, wherein the CNN-BiLSTM network model is trained with third CSI gesture data from the first environment or the second environment.

4. The method of gesture recognition of claim 1, wherein training an improved challenge-discrimination domain adaptive add model using the first CSI gesture data comprises:

dividing the first CSI gesture data into source domain data and target domain data;

extracting features and classifying gestures from the source domain data by using the CNN-BiLSTM network model to obtain gesture types and extracted features; and

the improved ADDA model is trained using the gesture class and the extracted features and the target domain data.

5. The gesture recognition method of claim 1, wherein the first CSI gesture data and/or the second CSI gesture data comprises CSI gesture data associated with gesture actions performed in a plurality of different directions.

6. The gesture recognition method of claim 2, wherein the acquiring first CSI gesture data from a first environment and/or the acquiring the second CSI gesture data comprises:

collecting original CSI gesture data, wherein the original CSI gesture data are CSI data packets with three-dimensional data dimensions;

denoising the original CSI gesture data; and

gesture segments are extracted from the denoised raw CSI gesture data.

7. The gesture recognition method of claim 6, wherein the extracting gesture segments from the denoised raw CSI gesture data comprises:

determining the size of the sliding window;

calculating the variance of the denoised raw CSI gesture data in each window; and

and extracting gesture fragments according to the variance and the size of the buffer zone.

8. An electronic device, comprising: a memory and a processor, the memory and the processor being communicatively coupled to each other, the memory having stored therein computer instructions that, when executed, perform the gesture recognition method of any of claims 1-7.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing a computer to perform the gesture recognition method of any one of claims 1-7.