CN114692679B

CN114692679B - Meta-learning gesture recognition method based on frequency modulation continuous wave

Info

Publication number: CN114692679B
Application number: CN202210256419.8A
Authority: CN
Inventors: 郑海峰; 沈翔宇; 冯心欣; 胡锦松
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2024-07-12
Anticipated expiration: 2042-03-16
Also published as: CN114692679A

Abstract

The invention relates to a meta-learning gesture recognition method based on frequency modulation continuous waves. The method can realize high-precision gesture recognition by using a small amount of labeled samples by utilizing the meta-learning network. In addition, the method considers the angle characteristics of the gesture and the internal relation among a plurality of characteristics, extracts key characteristics for gesture recognition by utilizing the two-channel fusion characteristic extraction network based on the 3D convolutional neural network, and effectively improves the accuracy of gesture recognition. Meanwhile, in the multi-dimensional feature fusion stage, the spatial correlation among the multi-dimensional features is considered by utilizing the Hadamard product-based feature fusion method, so that the gesture recognition accuracy is further improved.

Description

Meta-learning gesture recognition method based on frequency modulation continuous wave

Technical Field

The invention relates to the technical field of the Internet of things, in particular to a meta-learning gesture recognition method based on frequency modulation continuous waves.

Background

With the wide spread of the concept of the internet of things and the rapid development of man-machine interaction technology, a non-contact interaction mode becomes more and more important. Gesture interaction, one of the oldest ways of interaction in humans, has received much attention in recent years because of its many advantages. Gesture recognition technology is also widely used in many fields. For example, the wide application of gesture recognition technology in the fields of virtual reality and augmented reality can enhance the interactive experience of people. Some automobiles recently marketed are also equipped with a contactless vehicle control function based on gesture recognition technology, which provides a vehicle control alternative in noisy environments for customers. Because in such an environment the speech recognition that is originally equipped is severely affected by noise. In addition, gesture recognition technology is also applied to many vertical industries, such as industrial internet of things, robot control, smart home, and the like. The gesture recognition technology remarkably improves the efficiency of human-computer interaction and the user experience.

Currently, gesture recognition can be classified into three types, namely, vision-based methods, sensor-based methods, and radar-based methods, according to the kind of data used for gesture recognition. Among the above three types of gesture recognition methods, the vision-based method is easily affected by the illumination condition in the environment, and the recognition accuracy of the method is severely reduced under poor illumination. Sensor-based gesture recognition methods require people to wear specialized sensors to collect data, which is inconvenient and unsuitable for large-scale applications of gesture recognition technology. In recent years, the gesture recognition technology based on radar is receiving attention gradually because of the advantages of long detection distance, no influence of illumination condition and the like. In addition, the lower deployment cost makes commercialization of its products possible. The most significant advantage of radar-based gesture recognition methods over the mainstream vision-based methods in the past is their all-weather operating conditions. The radar-based gesture recognition method can still provide high-precision recognition results even in rainy days or night environments with poor illumination conditions. Radar-based gesture recognition methods have attracted much attention from academia and industry, and many companies, including google and texas instruments, are equipped with gesture recognition technology in their newly marketed products.

With the rapid development of machine learning technology, machine learning-based methods have shown great potential in the field of gesture recognition. However, machine learning based methods often require a large number of labeled samples to achieve satisfactory recognition accuracy. Insufficient data can cause machine learning models to suffer from over-fitting problems, which can seriously affect the accuracy of the recognition. However, the radar method has the problem of difficult radar data acquisition, and the radar data acquisition is a time-consuming and labor-consuming task. Moreover, the gesture acquired earlier is difficult to cover all possible categories in the actual scene. In recent years, some methods have begun to focus on solving the problem of low sample learning in the field of gesture recognition. However, these methods generally do not consider characteristics in the gesture recognition task, such as time correlation and high similarity between gestures, but the method of learning with few samples is directly applied to the gesture recognition task, so only limited performance improvement can be achieved. In addition, the conventional method often neglects the utilization of angle characteristics, which contains abundant characteristic information of the conventional gestures, and is helpful for improving gesture recognition accuracy. In addition, the correlation information between the plurality of features of the gesture is often ignored in the conventional method.

Disclosure of Invention

The invention aims to provide a meta-learning gesture recognition method based on frequency modulation continuous waves, which can realize high-precision gesture recognition by using a small amount of labeled samples by using a meta-learning network. In addition, the method considers the angle characteristics of the gesture and the internal relation among a plurality of characteristics, and utilizes the two-channel fusion characteristic extraction network based on the 3D convolutional neural network to extract key characteristics for gesture recognition, thereby effectively improving the accuracy of gesture recognition.

In order to achieve the above purpose, the technical scheme of the invention is as follows: a meta-learning gesture recognition method based on frequency modulation continuous waves comprises the following steps:

Step S1, acquiring original radar data about a gesture by using a frequency modulation continuous wave FMCW radar, and then obtaining the distance, doppler frequency and angle characteristics of the gesture through signal processing;

Step S2, constructing three characteristics about the gesture obtained in the step S1 and a time dimension into a fourth-order characteristic tensor; reconstructing the fourth-order feature tensor into two third-order feature cubes by multiplexing Doppler frequency features according to the physical relationship among the features;

Step S3, dividing the data set into a training set D ^train and a testing set D ^test, and dividing the training set D ^train and the testing set D ^test into a plurality of subtask sets, wherein each subtask set comprises a support set S and a query set Q; inputting gesture types containing a large number of samples with labels into a meta-learning model for meta-training stage, aiming at extracting migratable knowledge, and endowing the meta-learning model with better generalization performance when facing new tasks;

s4, inputting gesture types containing a small amount of samples into a meta-learning model for a meta-test stage; the meta learning model can realize accurate classification of gestures by using a limited number of samples.

In an embodiment of the present invention, in step S1, the distance R, the doppler frequency f _d, and the angle characteristic θ of the gesture are obtained by:

step S11, coupling the collected radar echo signal with a transmission signal to obtain a beat frequency f _b, which can be written as:

wherein B, T and τ are bandwidth, period, and propagation delay, respectively; the method of introducing a range-doppler matrix RDM to process the beat signal to obtain features of the gesture, where f _b can be rewritten as:

Wherein f _movingBeat and f _staticBeat respectively represent beat frequencies of the target in a moving state and a stationary state, and c, f ₀ and v respectively represent light speed, initial frequency and movement speed of the gesture; in the case where the rapid motion of the target is considered, since the scanning period of the chirp sequence in a single period is too short, the influence of the doppler frequency can be ignored; at this time, the distance R may be obtained by performing a fast fourier transform FFT on a frequency of a spectrum peak point corresponding to an abscissa of each frame of data in a fast time dimension, which may also be referred to as Range-FFT:

Furthermore, in the slow time dimension, the effect on frequency caused by the doppler frequency cannot be neglected, in which case the doppler frequency f _d can be obtained by a second FFT:

Unlike the distance R and doppler frequency f _d, the angular characteristic θ of the gesture is in another spatial dimension; the propagation distance difference delta d exists in the process that the radar signals reach different receiving antennas, and the propagation distance difference delta d can be written by geometric relationship

Δd＝l sin(θ)

Where l is the distance between two adjacent receiving antennas; due to the existence of the radar signal propagation distance difference delta d, a phase difference exists between the received signals of different receiving antennasIt can be obtained by FFT of the phase sequences of the multiple antennas, also known as angle-FFT:

thus, the angle characteristic theta of the gesture can be obtained

In step S12, since the hand gesture turns over the wrist during the execution process, the radar detects the change of the reflecting area of the hand, and thus the capability of the interference signal changes, the meta-average size selection constant false alarm GO-CFAR algorithm is used to act on the angle-doppler diagram before the angle feature θ is extracted, so as to inhibit the interference and improve the recognition accuracy.

In an embodiment of the present invention, the implementation process of the step S2 is as follows

Step S21, normalizing the three features obtained in the step S1, mapping the features into a pattern form, and constructing a fourth-order feature tensor χ with a time dimension;

step S22, a Doppler frequency calculation formula:

from the above equation, it can be seen that the Doppler frequency f _d is related to the velocity v of the target and the angle of incidence θ of the radar echo; in addition, the motion speed v and the motion direction are related to the distance R, and the incident angle θ is the angle characteristic θ; the doppler characteristic f _d can be considered to be correlated with the range characteristic R and the angle characteristic θ; thus, the fourth-order feature tensor χ constructed in step S21 is obtained by multiplexing the doppler and time dimensions, the dimensions of the feature cubes χ _RDM and χ _ADM, reconstructed into two three dimensions, are 32 x 32.

In an embodiment of the present invention, the specific implementation process of the step S3 is as follows:

Step S31, dividing the whole data set into two disjoint subsets, namely a training set D ^train and a testing set D ^test; wherein training set D ^train contains a large number of labeled samples, while D ^test contains only a small number of labeled samples, and some unlabeled samples for testing; then samples a plurality of tasks in two training sets For training, here assuming all tasksAll obeys the same task distributionFor each taskThe system comprises a training set and a testing set, which are named as a supporting set S and a query set Q; in addition, if N types of gestures are shared in the support set S, and each type of gesture comprises K samples, the problem is defined as a meta-learning problem of N-wayK-shot;

S32, inputting gesture data in a training set D ^train into a meta-learning network in the form of N-way K-shot for extracting migratable knowledge in a meta-learning stage; firstly, two feature cubes χ _RDM and χ _ADM respectively pass through a CNN double-channel fusion network based on a 3D convolutional neural network Extracting features of gestures, wherein feature graphs extracted by two channels are fused into one by using a feature fusion mode based on Hadamard products, wherein the spatial correlation of the features is considered through the form of corresponding element products, and the feature fusion mode can be expressed as:

Wherein the method comprises the steps of Is the fusion feature of the output, and z _RDM and z _ADM are feature maps extracted in two channels respectively; then, connecting feature graphs corresponding to the sample x _i from the support set S and the sample x _j from the query set Q through a connection operation C, inputting a relation module g _φ, measuring the similarity of the feature graphs by using a convolutional neural network, and classifying gestures according to the calculated relation score r _i,j; the model selects a mean square error MSE as an objective function for training a meta-learning model;

Wherein r _i,j can be written as:

In an embodiment of the present invention, the specific implementation process of the step S4 is as follows:

in the meta-test stage, a small number of sample gesture categories contained in the test set D ^test are input into the meta-learning model trained in the step S32 in the form of N-way K-shot, so that the meta-learning model can realize gesture recognition after being trained for a small number of times.

Compared with the prior art, the invention has the following beneficial effects:

The invention solves the problem of less sample learning in the gesture recognition method based on radar by using a meta learning network, and remarkably reduces the number of samples used for training while realizing higher recognition precision. In addition, the invention considers the physical relationship between the angle characteristics and the multiple characteristics of the gestures, inputs the characteristics of the gestures into the dual-channel fusion network for characteristic extraction through the input form of the characteristic cube, and performs characteristic fusion in a fusion mode based on Hadamard product in later experiments, thereby improving the use efficiency of a limited number of samples and further improving the accuracy of gesture recognition.

Drawings

FIG. 1 is a schematic overall flow diagram of a system in accordance with an embodiment of the present invention.

FIG. 2 is a schematic diagram of ten gesture categories collected in the present invention.

Fig. 3 is a schematic view of a four-step feature Zhang Liangchong constructed as two three-dimensional feature cubes in accordance with an example of the present invention.

FIG. 4 is a diagram illustrating the partitioning of the dataset of step S3 and step S4 in an example of the present invention.

FIG. 5 is a diagram of a meta-learning gesture recognition for a 5-way 1-shot problem model in an example of the present invention.

Fig. 6 is a two-channel fused feature extraction network based on 3D CNN in an example of the invention.

FIG. 7 is a schematic diagram showing the comparison of the accuracy of the invention provided in one example of the invention with other methods.

Detailed Description

The technical scheme of the invention is specifically described below with reference to the accompanying drawings.

As shown in FIG. 1, the meta-learning gesture recognition method based on the frequency modulation continuous wave comprises the following steps:

Step S1, ten gesture types shown in the figure 2 are collected by using a frequency modulation continuous wave radar hardware system, and then the characteristics of three dimensions of the distance, doppler and angle of the gesture are obtained through signal processing.

Step S2, as shown in FIG. 3, three features about the gesture obtained in step S1 are constructed into a fourth-order feature tensor together with the time dimension. And reconstructing the Doppler features into two third-order feature cubes by multiplexing the Doppler features according to the physical relations among the features.

Step S3, as shown in FIG. 4, the data set is divided into a training set D ^train and a testing set D ^test, and is divided into a plurality of subtask sets, each subtask set comprising a support set S and a query set Q. The gesture type containing a large number of samples with labels is input into a meta-learning model as shown in fig. 5 for a meta-training stage, so as to extract migratable knowledge, and endow the model with better generalization performance when facing new tasks. The samples firstly pass through a two-channel fusion network based on a 3D convolutional neural network as shown in fig. 6 to extract features, then feature graphs corresponding to the samples from the support set and the query set are connected through a connecting module, then similarity of the two is measured by inputting a relation module, and a relation score is calculated to realize gesture recognition.

And S4, inputting the gesture types containing a small amount of samples into a meta-learning model for a meta-test stage. The meta learning model can realize accurate classification of gestures by using a limited number of samples.

Further, the specific implementation process of the step S1 is as follows:

Ten kinds of radar original data shown in fig. 2 are collected by using a frequency modulation continuous wave radar hardware system, and in the collecting process, 4 testers sit at the position 20 cm in front of the radar, and 150 groups of data are collected for each gesture. The acquired radar echo signal is coupled with the transmitting signal to obtain beat frequency f _b, which can be written as:

Where B, T and τ are bandwidth, period and propagation delay, respectively. In the present invention, we introduce a range-Doppler matrix (RDM) method to process beat signals to obtain features of the gesture, where f _b can be rewritten:

Where f _movingBeat and f _staticBeat represent the beat frequency of the target in the moving and stationary states, c, f ₀, and v represent the speed of light, the starting frequency, and the movement speed of the gesture, respectively, where c=3×10 ⁸ m/s. In the case where the rapid motion of the target is considered, since the scanning period of the chirp sequence in a single period is too short, the influence of the doppler frequency can be ignored. At this time, the distance R may be obtained by performing a fast fourier transform (fast Fourier transform, FFT) on the frequency of the spectral peak point corresponding to the abscissa of each frame of data in the fast time dimension, which may also be referred to as a distance fast fourier transform (Range-FFT):

Furthermore, in the slow time dimension, the effect on frequency caused by the doppler frequency cannot be neglected, in which case the doppler frequency f _d can be obtained by a second FFT along the slow time dimension:

Unlike the distance R and doppler frequency f _d, the angular characteristic θ of the gesture is in another spatial dimension. The propagation distance difference deltad exists in the process that the radar signals reach different receiving antennas, and the geometric relationship can be written as follows:

Δd＝l sin(θ)

where l is the distance between two adjacent receiving antennas. Due to the existence of the radar signal propagation distance difference delta d, a phase difference exists between the received signals of different receiving antennas It can be obtained by FFT of the phase sequences of the multiple antennas, also known as angle-Fast Fourier Transform (FFT):

thus, the angle characteristic theta of the gesture can be obtained

Because the hand gesture can take place the turn of wrist in the execution process, cause the radar to detect that the reflection area of hand changes, and then lead to interfering signal ability to change, in some cases, interfering signal echo intensity is close to or even exceeds the echo intensity of palm, causes to draw the characteristic of hand gesture unable accurate reflection gesture, influences the recognition after. Therefore, the invention adopts a meta-average big-selection constant false alarm (greatest of constant FALSE ALARM RATE, GO-CFAR) algorithm to act on an angle-Doppler Map (ADM) before the extraction of the angle characteristic feature theta, so as to inhibit the interference and improve the identification accuracy.

Further, the step S2 specifically includes:

The three features obtained in step S1 are normalized and mapped into a pattern form, and then constructed as a fourth-order feature tensor χ together with the time dimension, as shown in fig. 3.

Doppler shift refers to a change in phase and frequency due to a difference in propagation path when a mobile station moves in a certain direction at a constant rate. According to the calculation formula:

From the above equation, it can be seen that the Doppler frequency f _d is related to the velocity v of the target and the angle of incidence θ of the radar echo. Here, the movement velocity v and the movement direction are related to the distance characteristic R, and the incident angle θ is the angle characteristic θ described above. The doppler characteristic f _d can be considered to be correlated with the range characteristic R and the angle characteristic θ. Thus, we pass the fourth-order feature tensor χ constructed in step S21 through the way of multiplexing the doppler and time dimensions, the dimensions of the feature cubes χ _RDM and χ _ADM, reconstructed into two three dimensions, are 32 x 32.

Further, the specific implementation process of the step S3 is as follows:

As shown in fig. 4, the entire dataset is divided into two disjoint subsets, training set D ^train and test set D ^test. Where training set D ^train contains a large number of labeled samples, while D ^test contains only a small number of labeled samples, and some unlabeled samples for testing. Then samples a plurality of tasks in two training sets For training, here assuming all tasksAll obeys the same task distributionFor each taskThe system comprises a training set and a testing set, which are named as a supporting set S and a query set Q. Furthermore, if there are N types of gestures in the support set S, each type of gesture contains K samples, then this problem is defined as a meta-learning problem of N-wayK-shot.

The gesture data in the training set D ^train is then input into the meta-learning network as shown in fig. 5 in the form of N-way K-shot, to extract the migratable knowledge in the meta-learning phase. First, two feature cubes χ _RDM and χ _ADM respectively pass through a two-channel fusion network based on a 3D convolutional neural network (Convolutional Neural Networks, CNN) as shown in FIG. 6Extracting features of gestures, wherein the invention uses a Hadamard product (Hadamardproduct) based feature fusion mode to fuse feature graphs extracted by two channels into one, and in the process, spatial correlation of the features is considered through a corresponding element product form, which can be expressed as:

Wherein the method comprises the steps of Is the fusion feature of the output, and z _RDM and z _ADM are feature maps extracted in two channels, respectively. Then, the feature graphs corresponding to the sample x _i from the support set S and the sample x _j from the query set Q are connected through the connection operation C, the relationship module g _φ is input, the similarity of the sample x _i from the support set S and the sample x _j from the query set Q is measured by using the convolutional neural network, and the similarity is used for gesture classification according to the calculated relationship score r _i,j. Model selection mean square error (Mean Square Error, MSE) as an objective function for training of models

Wherein r _i,j can be written as:

Further, the specific implementation process of the step S4 is as follows:

In the meta-test stage, a small number of gesture categories of samples contained in the test set D ^test are input into the model with a certain generalization capability after the step S32 in the form of N-way K-shot, so that the model can obtain a better recognition result in the gestures of the categories after a small number of times of training.

The invention provides a meta-learning gesture recognition method based on frequency modulation continuous waves, which can realize high-precision gesture recognition by using a small number of labeled samples by using a meta-learning network. In addition, the method considers the angle characteristics of the gesture and the internal relation among a plurality of characteristics, and utilizes the two-channel fusion characteristic extraction network based on the 3D convolutional neural network to extract key characteristics for gesture recognition, thereby effectively improving the accuracy of gesture recognition.

The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. A meta-learning gesture recognition method based on frequency modulation continuous waves is characterized by comprising the following steps:

S4, inputting gesture types containing a small amount of samples into a meta-learning model for a meta-test stage; enabling the meta learning model to accurately classify gestures by using a limited number of samples;

the specific implementation process of the step S2 is as follows

step S22, a Doppler frequency calculation formula:

From the above equation, the Doppler frequency f _d is related to the velocity v of the target and the angle of incidence θ of the radar echo; in addition, the motion speed v and the motion direction are related to the distance R, and the incident angle θ is the angle characteristic θ; the doppler feature f _d is considered to be correlated with the range feature R and the angle feature θ; thus, the fourth-order feature tensor χ constructed in step S21 is obtained by multiplexing the doppler and time dimensions, reconfigurating into two three-dimensional feature cubes χ _RDM and χ _ADM, which are 32 x 32 in size;

the specific implementation process of the step S3 is as follows:

Step S31, dividing the whole data set into two disjoint subsets, namely a training set D ^train and a testing set D ^test; wherein training set D ^train contains a large number of labeled samples, while D ^test contains only a small number of labeled samples, and some unlabeled samples for testing; then samples a plurality of tasks in two training sets For training, here assuming all tasksAll obeys the same task distributionFor each taskThe system comprises a training set and a testing set, which are named as a supporting set S and a query set Q; in addition, if N types of gestures are shared in the support set S, and each type of gesture contains K samples, the problem is defined as a meta-learning problem of N-way K-shot;

S32, inputting gesture data in a training set D ^train into a meta-learning network in the form of N-wayK-shot for extracting migratable knowledge in a meta-learning stage; firstly, two feature cubes χ _RDM and χ _ADM respectively pass through a CNN double-channel fusion network based on a 3D convolutional neural network Extracting features of gestures, wherein feature graphs extracted by two channels are fused into one by using a feature fusion mode based on Hadamard products, and in the process, spatial correlation of the features is considered through a form of corresponding element products, which is expressed as:

Wherein r _i,j is written as:

2. The method for recognizing a gesture based on meta learning of fm continuous wave according to claim 1, wherein in step S1, the distance R, doppler frequency f _d, and angle characteristic θ of the gesture are obtained by:

Step S11, coupling the acquired radar echo signals with the transmission signals to obtain beat frequency f _b, and writing:

Wherein B, T and τ are bandwidth, period, and propagation delay, respectively; the method of introducing a range-doppler matrix RDM processes the beat signal to obtain features of the gesture, where f _b is rewritten:

Wherein f _movingBeat and f _staticBeat respectively represent beat frequencies of the target in a moving state and a stationary state, and c, f ₀ and v respectively represent light speed, initial frequency and movement speed of the gesture; in the case where the rapid motion of the target is considered, since the scanning period of the chirp sequence in a single period is too short, the influence of the doppler frequency is ignored; at this time, the distance R is obtained by performing fast fourier transform FFT on the frequency of the spectrum peak point corresponding to the abscissa of each frame of data in the fast time dimension, which is called Range-FFT:

furthermore, in the slow time dimension, the effect on frequency caused by the doppler frequency cannot be neglected, in which case the doppler frequency f _d is obtained by a second FFT:

unlike the distance R and doppler frequency f _d, the angular characteristic θ of the gesture is in another spatial dimension; propagation distance difference delta d exists in the process that radar signals reach different receiving antennas, and the propagation distance delta d is written through geometrical relations

Δd＝lsin(θ)

Where l is the distance between two adjacent receiving antennas; due to the existence of the radar signal propagation distance difference delta d, a phase difference exists between the received signals of different receiving antennasIt is obtained by FFT of the phase sequences of the multiple antennas, also known as angle-FFT:

Thereby obtaining the angle characteristic theta of the gesture

3. The method for recognizing a meta-learning gesture based on a frequency modulated continuous wave according to claim 1, wherein the specific implementation process of step S4 is as follows: