CN112733609A

CN112733609A - Domain-adaptive Wi-Fi gesture recognition method based on discrete wavelet transform

Info

Publication number: CN112733609A
Application number: CN202011468272.6A
Authority: CN
Inventors: 吴迪; 吴宇杰; 黄志川; 胡淼
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-04-30
Anticipated expiration: 2040-12-14
Also published as: CN112733609B

Abstract

The invention relates to a field self-adaptive Wi-Fi gesture recognition method based on discrete wavelet transform, which comprises the following steps: s1: collecting CSI signals, gesture categories and field categories of gestures, and using the signals, the gesture categories and the field categories as training data; s2: extracting corresponding characteristics from the acquired CSI signals according to a BVP characteristic extraction method of wavelet transformation; s3: training the deep neural network by using the extracted features, gesture categories and field categories to obtain a deep neural network model; s4: and extracting corresponding features from the newly acquired CSI signals needing to be classified according to a BVP feature extraction method of wavelet transformation, and inputting the extracted corresponding features into the deep neural network model to obtain corresponding gesture classification. A new BVP feature extraction method based on wavelet transformation is provided, corresponding BVP features are extracted by combining collected CSI signals, and a deep neural network is used for constructing a model for gesture recognition by using the extracted features so as to enhance the cross-domain recognition capability of the model.

Description

Domain-adaptive Wi-Fi gesture recognition method based on discrete wavelet transform

Technical Field

The invention relates to the field of pattern recognition, in particular to a field-adaptive Wi-Fi gesture recognition method based on discrete wavelet transformation.

Background

With the popularization of computers in society, the development of technologies for promoting human-computer interaction will have a positive impact on the use of computers. Therefore, there is an increasing emphasis on developing new technologies for cross-domain man-machine barriers. The ultimate goal of research is to make human-computer interactions as natural as human-to-human interactions. Gestures have long been recognized as an interactive technique that can provide more natural, creative, and intuitive communication with our computer. For this reason, adding gestures in human-computer interaction is an important research area.

Gesture recognition this term refers to the entire process of tracking human gestures, recognizing their representations and translating into semantically meaningful commands. Research in gesture recognition is directed to designing and developing systems that can recognize gestures for device control as inputs and by mapping commands to outputs. Generally, the approach to collecting information from gesture interaction is contact or non-contact, and gesture interaction systems can be divided into two types of contact-based sensors and non-contact-based sensors.

In the prior art, reference is made to KevinNYY, Ranganath S, Ghosh D.Objectory modeling in characterization using cybersgles and magnetic separators [ J ]. IEEE TENCON,2004,10: 571-.

The technology is based on a wireless instrument glove "CyberGlove II" for gesture recognition. CyberGlove II has 22 sensors, 3 flex sensors per finger, 4 strap sensors, 1 arch sensor and temperature sensors for measuring flex and strap displacement. Each sensor is very thin and resilient, and in actual use the presence of the sensor is hardly perceptible. The palm-shaped pen is light in weight and comfortable to wear, the palm is meshed, ventilation is easy, fingertips are exposed, and the palm-shaped pen is convenient for a user to grab and write. The piezoelectric sensor is thin and flexible, and basically has no resistance to bending. The influence of the installation position on the bending curvature radius of the fingers is small, the sensor is ensured to accurately and repeatedly measure the hand movement, and the correction standards of all users are consistent; however, the technology needs a glove for installing a plurality of sensors, and the cost is high. Before use, the gloves must be worn firstly, so that the use convenience and flexibility are poor, and the user feels uncomfortable after wearing the gloves for a long time.

In the prior art, reference is made to Cote M, Payeur P, Commeau G.comparative study of adaptive segmentation techniques for acquisition analysis in unconventional environments [ J ]. IEEE International work on imaging systems and techniques,2006: 28-33.

The technology uses a computer vision processing method, and a video or an image obtained after shooting is used for identification. Most of the complete hand interaction mechanisms, which are building blocks of vision-based gesture recognition systems, comprise three basic stages: detection, tracking and identification. Hand detection and segmentation of the corresponding image regions are the primary tasks of the gesture recognition system. This segmentation is crucial because it separates the task-related data from the image background and then passes it on to the subsequent tracking and recognition phase; however, this technique is based on computer vision and requires a camera to take a picture and then recognize the picture. This identification method is largely limited by the position of the camera, the shooting conditions and the angle. If the gesture is not facing the camera, the recognition effect is seriously reduced. In addition, if the light irradiation is poor or the quality of shooting is affected by jitter or the like, the algorithm recognition effect is also reduced. In addition, the privacy leakage and other problems exist in shooting and identification by using the camera.

In the prior art, references are given to Yue Zheng, Yi Zhung, Kun Qian, guiding, Zhung, Yunhao Liu, Chenshu Wu, Zheng Yang, zero-efficiency Cross-Domain knowledge registration with Wi-Fi [ P ]. Mobile Systems, Applications, and Services,2019.

The technique uses Wi-Fi signal changes for identification. A Wi-Fi signal emission source and a receiving source are placed in a fixed place, a characteristic frequency spectrogram can be extracted by detecting the change of Wi-Fi signals due to the Doppler frequency shift phenomenon caused by the gesture movement of a user, and then the characteristic extraction and identification are carried out by using a deep neural network. Because no extra equipment is worn, the user can use the device conveniently and comfortably. Moreover, the Wi-Fi signal is used, so that the user privacy can be protected to a greater extent; however, Wi-Fi signal identification is sensitive to environmental changes, a trained model generally has a good identification effect only in the same environment, and the model can be identified only by performing some operations when being transferred to other environments. The technology uses a compressed sensing technology to map Doppler spectrum features onto a sparse feature matrix so as to increase the cross-domain identification capability of the Doppler spectrum features. However, in the actual recognition, a large number of samples are required for training, the accuracy of cross-domain recognition is also reduced, and algorithms adapted in other fields need to be added for improvement. In addition, the algorithm uses a short-time fourier transform in separating doppler features, which has certain limitations, such as lack of localization analysis capability, inability to analyze non-stationary signals, etc.

Disclosure of Invention

The invention provides a field self-adaptive Wi-Fi gesture recognition method based on discrete wavelet transform, aiming at solving the technical defects that the gesture motion is low in cross-domain recognition accuracy, does not have localization analysis capability and cannot analyze non-stationary signals.

In order to realize the purpose, the technical scheme is as follows:

a field self-adaptive Wi-Fi gesture recognition method based on discrete wavelet transform comprises the following steps:

s1: collecting CSI signals, gesture categories and field categories of gestures, and using the signals, the gesture categories and the field categories as training data;

s2: extracting corresponding characteristics from the acquired CSI signals according to a BVP characteristic extraction method of wavelet transformation;

s3: training the deep neural network by using the extracted features, gesture categories and field categories to obtain a deep neural network model;

s4: and extracting corresponding features from the newly acquired CSI signals needing to be classified according to a BVP feature extraction method of wavelet transformation, and inputting the extracted corresponding features into the deep neural network model to obtain corresponding gesture classification.

In the scheme, a novel BVP feature extraction method based on wavelet transformation is provided, corresponding BVP features are extracted by combining collected CSI signals, and a deep neural network is used for constructing a model for gesture recognition by using the extracted features so as to enhance the cross-domain recognition capability of the model.

In step S2, the BVP feature extraction method of wavelet transform includes the steps of:

s21: collecting signals through a Wi-Fi network card NIC, and extracting CSI signals;

s22: by calculating conjugate multiplication of CSI signals of two antennas on the same NIC, filtering out-of-band noise and static offset, eliminating random offset and only keeping main multipath components with non-zero DFS;

s23: decomposing and reconstructing the measured signals by using a MALLAT algorithm on DFS and applying db1 wavelet filters H, G, h and g to extract a wavelet power spectrum, wherein the peak of the wavelet power spectrum is the frequency shift caused by motion;

s24: and acquiring the current position of a user by using a passive tracking system, synthesizing the peak of the wavelet power spectrum according to the relative position of the user and the Wi-Fi receiver, and acquiring the BVP characteristic matrix by using a compressed sensing technology.

The full name of CSI is Channel State Information (CSI) which is a data format (part of the content of the protocol) used to represent CFR samples of the subcarrier granularity in the system band obtained from the physical layer by the commercial ieee802.11a/g/n wireless network card based on the OFDM technology.

The basic measurement unit of the CSI is one packet

In step S21, t₀At time, extracting the CSI signals as:

l is the total reflected path of the signal,

representing the phase difference, A, caused by the Doppler effect_iAn attenuation factor representing the path;

the sampling interval of each sample with respect to the first sample is 0, Δ t₂,…,…,Δt_M]In the sampling window, we can ignore the attenuation difference between different CSI samples and consider the path change speed as a constant, and the phase difference between the ith CSI sample and the first CSI sample is

Where f is the original carrier frequency of the signal, the phase difference between the mth CSI sample and the first CSI sample is expressed as follows:

for a doppler vector, the CSI sampling matrix with M samples can be expressed as follows:

v_iindicating the speed of change of the ith path, s_i(f,t₀) Is at the first sampling time t₀At measured CSI signal of the i-th path, n_i(f) Is the noise sampled by the ith path.

In step S22, different antennas on the Wi-Fi card share the same RF oscillator, the time-varying random phase offsets of the different antennas are the same, and on the Wi-Fi card, conjugate multiplication is used between the two antennas to eliminate the time-varying random phase offsets:

wherein the content of the first and second substances,x_cm(f,t₀+ t) is the output of the conjugate multiplication, x₁(f,t₀+ t) is the CSI for the first antenna,

is the conjugate of the CSI of the second antenna, G_m1And G_m2Respectively, of the first and second antennas. Wherein x_1,s(f,t₀)

Is the product of the two antenna static path components, which is considered constant in a short time, and the DFS is restored by adjusting the antenna power.

In step S23, decomposition is performed using MALLAT algorithm db1 wavelet, DFS is expressed as:

in the formula, t is a time sequence, f (t) is an original signal, j is a decomposition layer number, Aj is a wavelet coefficient of a low-frequency part of the signal f (t) in the j layer, and Dj is a wavelet coefficient of a high-frequency part of the signal f (t) in the j layer;

signal reconstruction is performed using the MALLAT algorithm, and the DFS after reconstruction is expressed as:

since the doppler frequency shift calculated by the receiver is only the frequency shift caused by the radial component of the current velocity, in step S24, the complex passive tracking system is used to obtain the current position of the user, and the reconstructed DFS extracted in the previous step is synthesized;

the positions of a Wi-Fi transmitting end and a Wi-Fi receiving end are assumed as follows:

the current speed may be expressed as:

wherein ax, ay is

Determined, λ is the wavelength of the Wi-Fi signal, the projected relationship of the receiver position is fixed due to the position of the person and the transmitter, and the matrix a is expressed as:

where j is the jth frequency sampling point and vk is the frequency shift component corresponding to the kth element of the vectored BVPV

D⁽ⁱ⁾＝c⁽ⁱ⁾A⁽ⁱ⁾V

Obtaining a BVP feature matrix by using a compressed sensing technology:

in step S3, the domain categories include a source domain and a target domain.

In step S3, the deep neural network model performs further feature extraction using the deep neural network CNN according to the extracted features, and performs gesture recognition by using the gating loop unit GRU, i.e., features that can be highly abstracted, and performs domain adaptive learning by adding to the domain confrontation training DANN of the neural network, and improves the domain adaptive capability of the model by extracting features that are irrelevant to the domain.

In the field of neural networks, the domain confrontation training DANN is subjected to minimization of a source domain classification error term, a domain classification error term is maximized, a negative sign is added in front of the domain classification error term, and a hyperparameter lambda is introduced as a weight balance parameter to obtain a loss function of the whole network.

The loss function of the overall network is

θ_fIs a parameter of the neural network of the feature extraction part, theta_dIs a parameter of the opposing learning network part, θ_yIs the parameter of the gesture classification recognition part full connection layer,

is the loss value for the classification of the gesture,

is the loss value of the domain classification.

In the above scheme, the convolutional neural network has a characterization learning capability, and can perform translation invariant classification on input information according to its hierarchical structure, and is therefore also referred to as a "translation invariant artificial neural network".

A gated cyclic unit (GRU) is one type of a recurrent neural network. Like LSTM, it is proposed to solve the problems of long-term memory and gradients in back propagation.

When a mobile station moves in a certain direction at a constant rate, phase and frequency changes, which are generally referred to as doppler shift (DFS), are caused due to propagation path differences. It reveals the law that the wave properties change during motion. When the motion is in front of the wave source, the wave is compressed, the wavelength becomes shorter, and the frequency becomes higher (blue shift); when the motion is behind the source, the opposite effect is produced, the wavelength becomes longer and the frequency becomes lower (red shift).

The features are used for domain classification, and the accuracy rate of about 70% can still be obtained, so that the extracted features still have domain differences and have negative influence on cross-domain recognition. The invention aims to improve the effect of cross-domain recognition of a trained model, and in the application scenario, data used for training and data to be recognized cross-domain are just the relation between a source domain and a target domain. The source domain represents a different domain from the test sample, but has rich supervision information, and the target domain represents the domain where the test sample is located, and has no label or only a few labels. The source domain and the target domain tend to belong to the same class of tasks, but are distributed differently. If data features of different domains (e.g., two different data sets) can be mapped to the same feature space, the source domain data can be used to identify the target domain. The present invention therefore introduces a network module that counteracts learning. When the domain is classified, the label of the source domain is different from that of the target domain, and the maximization of the domain classification error means that the domain discriminator cannot distinguish the source domain from the target domain, so that the extracted features become aligned in the distribution of the source domain and the target domain.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a field self-adaptive Wi-Fi gesture recognition method based on discrete wavelet transformation, provides a novel BVP feature extraction method based on wavelet transformation, extracts corresponding BVP features by combining collected CSI signals, and uses a deep neural network to construct a model for gesture recognition by using the extracted features so as to enhance the cross-field recognition capability of the model.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a flow chart of the discrete wavelet transform of the present invention;

FIG. 3 is a diagram of a neural network architecture of the present invention;

fig. 4 is a diagram of frequency shift component synthesis according to the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

the invention is further illustrated below with reference to the figures and examples.

Example 1

As shown in fig. 1 and 4, a domain-adaptive Wi-Fi gesture recognition method based on discrete wavelet transform includes the following steps:

In step S21, t₀At time, extracting the CSI signals as:

l is the total reflected path of the signal,

wherein x is_cm(f,t₀+ t) is the output of the conjugate multiplication, x₁(f,t₀+ t) is the CSI for the first antenna,

since the doppler frequency shift calculated by the receiver is only the frequency shift caused by the radial component of the current velocity, in step S24, as shown in fig. 4, the complex passive tracking system is used to obtain the current position of the user, and the reconstructed DFS extracted in the previous step is synthesized;

the current speed may be expressed as:

wherein ax, ay is

D⁽ⁱ⁾＝c⁽ⁱ⁾A⁽ⁱ⁾V

Obtaining a BVP feature matrix by using a compressed sensing technology:

in step S3, the domain categories include a source domain and a target domain.

The loss function of the overall network is

is the loss value for the classification of the gesture,

is the loss value of the domain classification.

CSI represents the amplitude attenuation and phase change in each subcarrier due to the propagation of the signal from the transmitter to the receiver, and the phase change rate represents the doppler shift of the signal.

Example 2

As shown in FIG. 2, DWT provides high time resolution for high frequency activity in Doppler signatures and high frequency resolution for slower speed activity, x [ n ]: discrete input signal of length N. g [ n ]: the low pass filter can filter the high frequency part of the input signal and output the low frequency part. The down-sampling is to dilute the sampling point, for example, 2 times down-sampling is to dilute the sampling point by 2 times, that is: removing one point every 2 sampled data points; h [ n ]: the high pass filter filters out the low frequency portion and outputs the high frequency portion, as opposed to the low pass filter. And obtaining the Doppler time-frequency characteristic diagram after wavelet transformation.

Example 3

As shown in fig. 3, BVP features are further extracted by using a deep neural network CNN module, and then are passed through a GRU module, i.e., features that can be highly abstracted, and can be used for gesture recognition. A DANN module is added for field adaptive learning, field adaptive capacity of the model is improved by extracting field-independent features, the function of a full-link layer is mainly to realize classification, a convolutional neural network comprises 3 common structures of a convolutional layer, a pooling layer and a full-link layer, and in some more modern algorithms, an inclusion module, a residual block (residual block) and other complex structures are possible. In a common architecture, convolutional and pooling layers are characteristic of convolutional neural networks. The convolution kernel in the convolutional layer contains weight coefficients, while the pooling layer does not, and therefore in the literature, the pooling layer may not be considered a separate layer. The order in which 3 types of common constructs are built into the hidden layer is typically: BVP feature input-convolutional layer-pooling layer-fully-connected layer-output, unfolding layer scatter is used to "Flatten" the input, i.e., to dimension the multidimensional input, often used in the transition from convolutional layer to fully-connected layer. Flatten does not affect the size of the batch. The neural network structure is shown in fig. 3.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A field self-adaptive Wi-Fi gesture recognition method based on discrete wavelet transform is characterized by comprising the following steps:

s2: extracting the acquired CSI signals into corresponding BVP characteristic matrixes according to a BVP characteristic extraction method of wavelet transformation;

s3: training the deep neural network by using the extracted BVP feature matrix, the gesture class and the field class to obtain a deep neural network model;

2. The discrete wavelet transform-based domain adaptive Wi-Fi gesture recognition method of claim 1, wherein in the step S2, the method for extracting BVP features of wavelet transform comprises the following steps:

3. The method for Wi-Fi gesture recognition based on discrete wavelet transform of claim 2, wherein in step S21, t is₀At time, extracting the CSI signals as:

l is the total reflected path of the signal,

the sampling interval of each sample with respect to the first sample is 0, Δ t₂,…,…,Δt_M]In the sampling window, we can ignore the attenuation difference between different CSI signal samples and consider the path change speed as a constant, and the phase difference between the ith CSI signal sample and the first sample is

Where f is the original carrier frequency of the signal, the phase difference between the Mth CSI signal sample and the first CSI signal sampleIs represented as follows:

for a doppler vector, the CSI signal sampling matrix with M samples can be expressed as follows:

4. The method for Wi-Fi gesture recognition based on discrete wavelet transformation of domain adaptation is characterized in that in step S22, different antennas on the Wi-Fi card share the same RF oscillator, time-varying random phase offsets of different antennas are the same, and on the Wi-Fi card, conjugate multiplication is used between two antennas to eliminate the time-varying random phase offsets:

5. The method of claim 4, wherein in step S23, decomposition is performed by using MALLAT db1 wavelet, DFS is expressed as:

in the formula, t is a time sequence, f (t) is an original signal, j is the number of decomposition layers, A_jWavelet coefficients for the low-frequency part of the signal f (t) at level j, D_jIs f (t) wavelet coefficients at the high frequency portion of layer j;

6. the method of claim 5, wherein in step S24, a complex passive tracking system is used to obtain the current position of the user, and the reconstructed DFS extracted in the previous step is synthesized;

the current speed may be expressed as:

wherein ax, ay is

where j is the jth frequency sampling point and vk is the frequency shift component corresponding to the kth element of the vectored BVP matrix V

D⁽ⁱ⁾＝c⁽ⁱ⁾A⁽ⁱ⁾V

Obtaining a BVP feature matrix by using a compressed sensing technology:

7. the discrete wavelet transform-based domain adaptive Wi-Fi gesture recognition method of claim 6, wherein in step S3, the domain categories comprise a source domain and a target domain.

8. The discrete wavelet transform-based domain-adaptive Wi-Fi gesture recognition method of claim 7, wherein in step S3, the deep neural network model performs further feature extraction by using a deep neural network CNN through the extracted features, and then obtains features of high-level abstraction through a gate control loop unit GRU to perform gesture recognition, and domain adaptive learning is performed on a domain confrontation training DANN added to the neural network, and the domain adaptive capability of the model is improved by extracting domain-independent features.

9. The method as claimed in claim 8, wherein the domain adaptive Wi-Fi gesture recognition method based on discrete wavelet transform is characterized in that a domain confrontation training DANN of the neural network performs minimization of a source domain classification error term, maximization of a domain classification error term, addition of a negative sign before the domain classification error term, and introduction of a hyper-parameter λ as a weight balance parameter, so as to obtain a loss function of the whole network.

10. The method of claim 9, wherein the global network loss function is selected from the group consisting of

is the loss value for the classification of the gesture,

is the loss value of the domain classification.