US12307821B2 - Radar-based gesture classification using a variational auto-encoder neural network - Google Patents
Radar-based gesture classification using a variational auto-encoder neural network Download PDFInfo
- Publication number
- US12307821B2 US12307821B2 US17/886,264 US202217886264A US12307821B2 US 12307821 B2 US12307821 B2 US 12307821B2 US 202217886264 A US202217886264 A US 202217886264A US 12307821 B2 US12307821 B2 US 12307821B2
- Authority
- US
- United States
- Prior art keywords
- gesture
- positional
- spectrograms
- neural network
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/02—Systems using reflection of radio waves, e.g. primary radar systems; Analogous systems
- G01S13/50—Systems of measurement based on relative movement of target
- G01S13/58—Velocity or trajectory determination systems; Sense-of-movement determination systems
- G01S13/583—Velocity or trajectory determination systems; Sense-of-movement determination systems using transmission of continuous unmodulated waves, amplitude-, frequency-, or phase-modulated waves and based upon the Doppler effect resulting from movement of targets
- G01S13/584—Velocity or trajectory determination systems; Sense-of-movement determination systems using transmission of continuous unmodulated waves, amplitude-, frequency-, or phase-modulated waves and based upon the Doppler effect resulting from movement of targets adapted for simultaneous range and velocity measurements
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S7/00—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
- G01S7/02—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
- G01S7/41—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
- G01S7/415—Identification of targets based on measurements of movement associated with the target
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S7/00—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
- G01S7/02—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
- G01S7/41—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
- G01S7/417—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section involving the use of neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
- G06V10/763—Non-hierarchical techniques, e.g. based on statistics of modelling distributions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/02—Systems using reflection of radio waves, e.g. primary radar systems; Analogous systems
- G01S13/06—Systems determining position data of a target
- G01S13/42—Simultaneous measurement of distance and other co-ordinates
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/02—Systems using reflection of radio waves, e.g. primary radar systems; Analogous systems
- G01S13/50—Systems of measurement based on relative movement of target
- G01S13/58—Velocity or trajectory determination systems; Sense-of-movement determination systems
- G01S13/589—Velocity or trajectory determination systems; Sense-of-movement determination systems measuring the velocity vector
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/12—Acquisition of 3D measurements of objects
Definitions
- Various examples of the disclosure are broadly concerned with recognizing gestures based on a radar measurement.
- HMI Human-machine interaction
- gesture classification sometimes also referred to as gesture recognition
- hand or finger gestures can be recognized and classified.
- Gesture classification finds applications in smartphones, sign language interfaces, automotive infotainment systems, augmented reality-virtual reality systems and smart appliances. Further, gesture classification can facilitate HMI for vending and ticketing machines at public places.
- gesture classification is based on camera images. See, e.g., S. Rautaray and A. Agrawal. 2015. Vision based hand gesture classification for human computer interaction: a survey. Artificial Intelligence Review 43, 1 (2015), 1-54. https://doi.org/10.1007/s10462-012-9356-9.
- camera-based gesture classification suffers from the problems of requiring proper illumination conditions, occlusions from clothing, obstruction at camera lens opening and privacy intruding features.
- radar-based gesture classification provides an attractive alternative. Further, the processing and memory footprint of the solution can be relatively thin making them favorable for embedded implementation.
- a variational-autoencoder neural network algorithm is employed.
- the algorithm can be trained using a triplet loss and center loss. A statistical distance can be considered for these losses.
- a specific type of architecture of a neural network algorithm can be used to facilitate the gesture recognition.
- Specific training techniques for training the variational auto-encoder neural network algorithm are disclosed. These techniques disclosed herein facilitate robust gesture recognition, also for scenarios where radar signals are exposed to noise and/or where inter-user variability of motion patterns associated with the various gestures is encountered. Further, unknown motion patterns—not associated with any predefined gesture class—can be reliably detected as such and rejected.
- a computer-implemented method includes obtaining one or more positional time spectrograms of a radar measurement of a scene.
- the scene includes an object.
- the computer-implemented method also includes predicting a gesture class of a gesture performed by the object based on one or more positional time spectrograms and based on a feature embedding of a variational auto-encoder neural network algorithm.
- a computer program or a computer-program product or a computer-readable storage medium includes program code.
- the program code can be loaded and executed by a processor.
- the processor Upon executing the program code, the processor performs a method.
- the computer-implemented method includes obtaining one or more positional time spectrograms of a radar measurement of a scene.
- the scene includes an object.
- the computer-implemented method also includes predicting a gesture class of a gesture performed by the object based on one or more positional time spectrograms and based on a feature embedding of a variational auto-encoder neural network algorithm.
- a device includes a processor and a memory.
- the processor can load program code from a memory and execute the program code. Upon loading and executing the program code, the processor is configured to obtain one or more positional time spectrograms of a radar measurement of a scene.
- the scene includes an object.
- the processor is also configured to predict a gesture class of a gesture performed by the object based on one or more positional time spectrograms and based on a feature embedding of a variational auto-encoder neural network algorithm.
- a computer-implemented method of training a variational auto-encoder neural network algorithm for predicting a gesture class of a gesture performed by an object of a scene, the gesture class being selected from a plurality of gesture classes includes obtaining multiple training sets of one or more training positional time spectrograms of a radar measurement of the scene including the object. Each one of the multiple training sets is associated with a respective ground-truth label indicative of a respective gesture class. Also, the computer-implemented method includes training the variational auto-encoder neural network algorithm based on the multiple training sets and the associated ground-truth labels.
- a computer program or a computer-program product or a computer-readable storage medium includes program code.
- the program code can be loaded and executed by a processor.
- the processor Upon executing the program code, the processor performs a method of training a variational auto-encoder neural network algorithm for predicting a gesture class of a gesture performed by an object of a scene, the gesture class being selected from a plurality of gesture classes.
- the method includes obtaining multiple training sets of one or more training positional time spectrograms of a radar measurement of the scene including the object. Each one of the multiple training sets is associated with a respective ground-truth label indicative of a respective gesture class.
- the computer-implemented method includes training the variational auto-encoder neural network algorithm based on the multiple training sets and the associated ground-truth labels.
- a device includes a processor and a memory.
- the processor can load program code from a memory and execute the program code.
- the processor Upon loading and executing the program code, the processor is configured to obtain multiple training sets of one or more training positional time spectrograms of a radar measurement of a scene including an object.
- Each one of the multiple training sets is associated with a respective ground-truth label indicative of a respective gesture class of a gesture performed by the object, the gesture class being selected from a plurality of gesture classes.
- the computer-implemented method also includes training a variational auto-encoder neural network algorithm based on the multiple training sets and the associated ground-truth labels.
- FIG. 1 schematically illustrates a system including a radar sensor and a processing device according to various examples.
- FIG. 2 schematically illustrates the radar sensor of FIG. 1 in further detail according to various examples.
- FIG. 3 schematically illustrates multiple gestures and associated gesture classes according to various examples.
- FIG. 4 schematically illustrates a processing pipeline for gesture classification using a variational auto-encoder neural network algorithm according to various examples.
- FIG. 5 schematically illustrates a flowchart of a method according to various examples.
- FIG. 6 schematically illustrates details of the variational auto-encoder neural network algorithm according to various examples.
- FIG. 7 schematically illustrates aspects of the variational auto-encoder neural network algorithm according to various examples.
- FIG. 8 is a flowchart of a method according to various examples.
- FIG. 9 schematically illustrates a data frame including data samples of radar measurement data according to various examples.
- FIG. 10 schematically illustrates a time dependency of a range estimate obtained from the radar measurement data and in presence of a gesture being performed according to various examples.
- FIG. 11 schematically illustrates raw and filtered positional time spectrograms for a “circle-clockwise” gesture according to various examples.
- FIG. 12 schematically illustrates raw and filtered positional time spectrograms for a “finger wave” gesture according to various examples.
- FIG. 13 schematically illustrates a processing pipeline for determining positional time spectrograms according to various examples.
- FIG. 14 schematically illustrates a processing pipeline for determining positional time spectrograms according to various examples.
- FIG. 15 schematically illustrates a processing pipeline for training a variational auto-encoder neural network algorithm according to various examples.
- circuits and other electrical devices generally provide for a plurality of circuits or other electrical devices. All references to the circuits and other electrical devices and the functionality provided by each are not intended to be limited to encompassing only what is illustrated and described herein. While particular labels may be assigned to the various circuits or other electrical devices disclosed, such labels are not intended to limit the scope of operation for the circuits and the other electrical devices. Such circuits and other electrical devices may be combined with each other and/or separated in any manner based on the particular type of electrical implementation that is desired.
- any circuit or other electrical device disclosed herein may include any number of microcontrollers, a graphics processor unit (GPU), integrated circuits, memory devices (e.g., FLASH, random access memory (RAM), read only memory (ROM), electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), or other suitable variants thereof), and software which co-act with one another to perform operation(s) disclosed herein.
- any one or more of the electrical devices may be configured to execute a program code that is embodied in a non-transitory computer readable medium programmed to perform any number of the functions as disclosed.
- gesture classification In particular, using the techniques described herein, hand gestures or finger gestures or gestures performed using a handheld object can be recognized. Such object can perform the gesture in free space. I.e., the gesture may be defined by a 3-D motion of the object, e.g., along a trajectory and/or including self-rotation. It would also be possible to recognize other kinds and types of gestures, e.g., body-pose gestures or facial expression gestures.
- gesture classification can be used to predict a gesture class of a gesture. For example, there can be a predefined set of gesture classes. Then, once such an object performs a gesture, it can be judged whether this gesture is part of one of the gesture classes. For this, it can be judged whether certain features of the gesture match respective feature ranges associated with the gesture class.
- gesture is not part of any one of the gesture classes—but, e.g., rather is part of a yet-to-be-defined gesture class or corresponds to a general object movement not resembling a gesture. I.e., new gesture classes can be identified.
- gesture classes are conceivable.
- the particular choice of the set of gesture classes used for the gesture classification is not germane for the functioning of the techniques described herein. Nonetheless, hereinafter, a few examples will be given for possible gesture classes:
- gesture classes that could be used in a respective predefined set to implement the gesture classification.
- These gesture classes define the gestures that can be recognized. Further details with respect to such gesture classes will be explained in connection with FIG. 3.
- Swipe left to right (2) Swipe right to left (3) Swipe top to down (4) Swipe down to top (5) Circle clockwise (6) Circle anti-clockwise (7) Swipe back to front (8) Swipe front to back (9) Finger wave - wave single fingers (10) Finger rub - thumb sliding over fingers
- an HMI can be implemented. It is possible to control a machine. For instance, different actions could be triggered depending on the gesture class of the recognized gesture.
- the HMI can facilitate control of a machine by a user.
- Example use cases include: gesture-controlled wearable and mobile devices, gesture-controlled smart TVs, projectors, gesture-controlled smart homes and smart devices, automotive infotainment systems, augmented reality-virtual reality (AR-VR), feedback systems.
- Gesture classification can replace alleviate the need for touch and clicks needed for HMI.
- Various techniques disclosed herein employ a radar measurement of a scene including an object—e.g., a hand or finger or handheld object such as a stylus or beacon—to acquire data based on which the gesture classification can be implemented.
- an object e.g., a hand or finger or handheld object such as a stylus or beacon
- radar chirps can be used to measure a position of one or more objects in a scene having extents of tens of centimeters or meters.
- a millimeter-wave radar unit may be used to perform the radar measurement; the radar unit operates as a frequency-modulated continuous-wave (FMCW) radar that includes a millimeter-wave radar sensor circuit, one or more transmitters, and one or more receivers.
- FMCW frequency-modulated continuous-wave
- a millimeter-wave radar unit may transmit and receive signals in the 20 GHz to 122 GHz range. Alternatively, frequencies outside of this range, such as frequencies between 1 GHz and 20 GHz, or frequencies between 122 GHz and 300 GHz, may also be used.
- a radar unit can transmit a plurality of radar pulses, such as chirps, towards a scene. This refers to a pulsed operation.
- the chirps are linear chirps, i.e., the instantaneous frequency of the chirps varies linearly with time.
- a Doppler frequency shift can be used to determine a velocity of the target.
- Measurement data provided by the radar unit can thus indicate depth positions of multiple objects of a scene. It would also be possible that velocities are indicated.
- gesture classification based on a radar measurements can have some advantages such as: invariant to illumination conditions; hand visibility occlusions; preserving privacy; capable of capturing subtle hand gesture motions.
- Various techniques described herein employ a machine-learning algorithm to predict a gesture class of a gesture performed by an object. This is based on measurement data obtained from the radar measurement.
- the machine-learning algorithm could, accordingly, be referred to as a classification algorithm or a gesture classification algorithm.
- An example implementation of the ML algorithm is a neural network algorithm (hereinafter, simply neural network, NN).
- An NN generally includes a plurality of nodes that can be arranged in multiple layers. Nodes of given layer are connected with one or more nodes of a subsequent layer. Skip connections between non-adjacent layers are also possible. Generally, connections are also referred to as edges. The output of each node can be computed based on the values of each one of the one or more nodes connected to the input. Nonlinear calculations are possible. Different layers can perform different transformations such as, e.g., pooling, max-pooling, weighted or unweighted summing, non-linear activation, convolution, etc.
- the NN can include multiple hidden layers, arranged between an input layer and an output layer.
- the calculation performed by the nodes are set by respective weights associated with the nodes.
- the weights can be determined in a training of the NN. For this, a numerical optimization can be used to set the weights.
- a loss function can be defined between an output of the NN in its current training can then minimize the loss function. For this, a gradient descent technique may be employed where weights are adjusted from back-to-front of the NN.
- the x-y-resolution of the input data and the output data may be decreased (increased) from layer to layer along the one or more encoder branches (decoder branches).
- the encoder branch provides a contraction of the input data
- the decoder branch provides an expansion.
- feature channels can increase and decrease along the one or more encoder branches and the one or more decoder branches, respectively.
- the one or more encoder branches and the one or more decoder branches are connected via a bottleneck.
- the autoencoder neural network includes an encoder branch and a decoder branch that are sequentially arranged and connected via a bottleneck. Away from the input layer and the output layer and specifically at the bottleneck, latent feature representations (feature embeddings) are obtained.
- the feature embedding can specify the presence or absence of certain features.
- the feature embedding thus can be seen as a compressed form of the input.
- the aim is to re-construct the input.
- a loss function can be defined during training of the auto-encoder NN that penalizes differences between the input and the output. Accordingly, it is possible to train the auto-encoder NN using unsupervised learning.
- VAENN variational autoencoder NN
- a VAENN can be used in the various examples disclosed herein.
- a VAENN has a feature embedding that represents the latent features in a probabilistic manner. Specifically, a probability distribution of each latent feature of the feature embedding can be determined, typically a Gaussian distribution. The probability distribution can be defined by its mean and width.
- VAENNs are generally known to the skilled person, e.g., from Kingma, Diederik P., and Max Welling. “Auto-encoding variational bayes.” arXiv preprint ar Xiv: 1312.6114 (2013).
- the feature space can be structured into regions—the regions being associated with the different gesture classes and obtained from training of the VAENN—and it can be checked whether the distribution of the feature embedding of the VAENN obtained for certain measurement data falls within one of such regions.
- the regions can have an N-dimensional hyperspheroidal surface, where N is the dimension of the feature space.
- the raw data samples obtained from an analog-to-digital converter (ADC) of a radar sensor can be pre-processed to obtain the input data. It has been found that certain type of input data is particularly helpful to facilitate accurate gesture classification.
- ADC analog-to-digital converter
- a spectrogram can encode the intensity of a respective contribution associated with a specific value of the positional observable in the raw data.
- a spectrogram can be a 2-D spectrogram which associated two positional observables, e.g., range and Doppler, or a positional observable and time, e.g., range and time.
- a positional time spectrogram provides positional information of the object in the scene as a function of time. I.e., the positional time spectrogram illustrates a change of one or more positional degrees of freedom of the object in the scene with respect to the radar sensor (positional observable) over the course of time.
- the positional time spectrograms are an image-like representation of the time-dependency of positional observables.
- the radar sensor is able to measure radial distance, velocity and the angle of arrival of a target. Such information is encoded in the frequencies of the raw data obtained from the radar sensor. To unveil those physical observables, pre-processing of the raw data is applied, which yields the image-like representation of the physical observables over time.
- the pre-processing of the raw data output from the ADC could include a 2-D Fast Fourier Transformation (FFT) of a data frame structured into fast-time dimension, slow-time dimension and antenna channels.
- the data frame includes data samples over a certain sampling time for multiple radar pulses, specifically chirps. Slow time is incremented from chirp-to-chirp; fast time is incremented for subsequent samples.
- the 2-D FFT can be along fast-time and slow-time dimension. This yields a range-doppler image (RDI). It would then be possible to select a range bin in the RDI. Based on the RDI, it is then possible to extract the various positional observables.
- RDI range-doppler image
- the mean or maximum range and/or mean or maximum velocity/Doppler shift can be extracted. This yields the range and velocity (as mean of maximum value, or as intensity vector) as positional observables for a certain point in time associated with the sampling time of the data frame. It is also possible to apply beamforming to determine the mean/maximum elevation angle or azimuthal angle. This yields the elevation angle and azimuth angle (as mean or maximum value, or as intensity vector) as positional observables. It would be possible to aggregate multiple such positional observables to generate a positional time spectrogram.
- positional information can be provided with respect to, e.g.: range, velocity, azimuthal angle, or elevation angle.
- gesture classification using radars have several major challenges to be addressed for deployment in a practical viable solution: For instance, the gesture classification should be able to handle large inter-class and low intra-class differences of gestures. The gesture classification should be able to reject arbitrary motions or unknown gestures. The gesture classification should be able to work under all alien background environments.
- an improved classification accuracy—by obtaining more discriminative classes—and random gesture rejection accuracy can be achieved, e.g., if compared to state-of-art deep metric learning approaches.
- a Softmax classifier as often used in the prior art—provides separability of classes, but no discriminative class boundaries. Hence, many background motions or other disturbance are erroneously predicted as one of the known classes with a high confidence using conventional techniques. As a result, reference techniques of gesture classification based on Softmax classifiers perform poorly in real environments. Such limitations are overcome by using the VAENN.
- the training is significantly simplified. For instance, it is not required to obtain training datasets including all random motions that may appear in real-world scenarios. This is because specific class extents are inherently obtained from the VAENN.
- the VAENN makes the feature space of the feature embedding continuous as well as implicitly makes the gesture classifications robust to mis-detection and noise spurs.
- the training can consider one or more losses that help to group gesture classes into closed-knit clusters in feature space, leading to better discriminative properties that facilitate rejection of arbitrary and random motions.
- clusters can be to be hyper-spheroidal allowing simple strategy to reject arbitrary motion images.
- the radar sensor 70 could output 2-D spectrograms such as an RDI or an azimuth-elevation spectrogram or a range time spectrogram or a Doppler time spectrogram or an azimuth time spectrogram or an elevation time spectrogram.
- 2-D spectrograms such as an RDI or an azimuth-elevation spectrogram or a range time spectrogram or a Doppler time spectrogram or an azimuth time spectrogram or an elevation time spectrogram.
- the processor 62 may load program code from a memory 63 and execute the program code.
- the processor 62 can then perform techniques as disclosed herein, e.g., pre-processing measurement data 64 , predicting a gesture class based on the measurement data 64 , controlling an HMI, etc. Details with respect to such processing will be explained hereinafter in greater detail; first, however, details with respect to the radar sensor 70 will be explained.
- FIG. 2 illustrates aspects with respect to the radar sensor 70 .
- the radar sensor 70 includes a processor 72 (labeled digital signal processor, DSP) that is coupled with a memory 73 . Based on program code that is stored in the memory 73 , the processor 72 can perform various functions with respect to transmitting radar pulses 86 using a transmit antenna 77 and a digital-to-analog converter (DAC) 75 .
- DAC digital-to-analog converter
- DAC digital-to-analog converter
- the processor 72 can process raw data samples obtained from the ADC 76 to some larger or smaller degree. For instance, data frames could be determined and output. Also, spectrograms may be determined.
- the radar measurement can be implemented as a basic frequency-modulated continuous wave (FMCW) principle.
- a frequency chirp can be used to implement the radar pulse 86 .
- a frequency of the chirp can be adjusted between a frequency range of 57 GHz to 64 GHz.
- the transmitted signal is backscattered and with a time delay corresponding to the distance of the reflecting object captured by all three receiving antennas.
- the received signal is then mixed with the transmitted signal and afterwards low pass filtered to obtain the intermediate signal. This signal is of significant lower frequency as the transmitted signal and therefore the sampling rate of the ADC 76 can be reduced accordingly.
- the ADC may work with a sampling frequency of 2 MHz and a 12-bit accuracy.
- a scene 80 includes multiple objects 81 - 83 .
- the objects 81 , 82 may correspond to background, whereas the object 83 could pertain to a hand of a user.
- gestures performed by the hand can be recognized. Some gestures are illustrated in FIG. 3 .
- FIG. 3 schematically illustrates such gestures 501 - 510 and corresponding labels of gesture classes 520 , but other gestures are possible. According to the techniques described herein, it is possible to reliably and accurately classify the gestures 501 - 510 . Details with respect to such gesture classification are explained in connection with FIG. 4 .
- FIG. 4 schematically illustrates a processing pipeline for implementing the gesture classification. For instance, such processing may be implemented by the processor 62 upon loading program code from the memory 63 .
- the VAENN 111 performs gesture classification based on the measurement data 64 .
- the VAENN 111 provides, as output, a label 115 that is indicative of the particular gesture class 520 of the gesture recognized in the positional time spectrograms.
- the measurement data 64 may be pre-processed. As illustrated in FIG. 4 , multiple positional time spectrograms 101 - 104 can be provided as input to a VAENN 111 .
- positional time spectrograms 101 - 104 While in FIG. 4 a count of four positional time spectrograms 101 - 104 is illustrated, as a general rule, fewer or more positional time spectrograms can be used as input to the VAENN 111 .
- one or more positional time spectrograms can be used which are selected from the group including: range time spectrogram, velocity time spectrogram, an azimuthal angle time spectrogram, or an elevation angle time spectrogram.
- raw positional time spectrograms and/or filtered positional time spectrograms are provided as an input to the VAENN 111 .
- an appropriate filter may be applied.
- a smoothing filter could be applied. Such filtering may be achieved by using an unscented Kalman filter, as will be described later in greater detail.
- the different positional time spectrograms 101 - 104 can be provided as different channels to the VAENN 111 .
- VAENN 111 Details with respect to the VAENN 111 will be explained. First, training of the VAENN 111 and inference using the VAENN 111 will be explained in connection with FIG. 5 .
- FIG. 5 is a flowchart of a method according to various examples. The method can be executed by at least one processor, e.g., upon loading program code from a memory. For instances, the method of FIG. 4 could be executed by the processor 72 and/or the processor 62 (cf. FIGS. 1 and 2 ).
- the method of FIG. 5 pertains to operation and maintenance of an NN such as the VAENN 111 of FIG. 4 .
- a training of the NN is implemented.
- values of multiple parameters of the NN are set. This is generally based on one or more losses defined by a respective loss function. Each loss can correspond to a respective contribution of the loss function. Each loss can be determined based on a difference between a prediction of the NN in its current training state and a corresponding ground truth. Different losses use different metrics to quantify such difference and/or use different predictions and/or inputs.
- An iterative optimization can be implemented.
- multiple elements of a training set can be used to adjust the weights in multiple iterations.
- Each iteration can include a backpropagation training algorithm to adjust the weights starting from an output layer of the NN towards an input layer.
- inference can be implemented at box 3010 .
- a prediction of a gesture class of a gesture of a scene can be made without relying on corresponding ground truth.
- the weights of the NN as determined in the training of box 3005 are used.
- FIG. 6 schematically illustrates aspects with respect to the VAENN 111 .
- the VAENN 111 includes an encoder branch 141 and a decoder branch 142 .
- the decoder branch 142 operates on a feature embedding 149 representing latent features of the positional time spectrograms 101 - 104 provided as input to the encoder branch 141 .
- the decoder branch 142 may not be required during inference at box 3010 (cf. FIG. 5 ), but rather only during training at box 3005 to calculate a respective reconstruction loss.
- the gesture class 520 is predicted based on the feature embedding 149 .
- the label 115 e.g., “L-R Swipe”—identifying the gesture class 520 is thus extracted from the feature embedding 149 . This is explained in detail next.
- each data point 201 - 204 describes a respective observation of a respective gesture.
- These data points 201 - 204 can correspond to the mean values 144 . More generally, they could be determined based on the distribution, e.g., based on the mean values 144 and the width 143 .
- predefined regions 211 - 213 defined in the feature space 200 and it can be checked whether the data points 201 - 204 are within one of these predefined regions 211 - 213 .
- These predefined regions 211 - 213 can be obtained from the training of the VAENN, as will be disclosed in detail below.
- FIG. 6 Also illustrated in FIG. 6 is a scenario in which multiple data points 204 form a cluster 214 that is offset from any one of the predefined regions 211 - 213 . It would be possible to use such cluster 214 to define a new gesture class, as will also be explained in further detail below.
- FIG. 7 illustrates aspects with respect to the VAENN 111 .
- FIG. 7 illustrates a specific example of an implementation of the encoder branch 141 and the decoder branch 142 .
- the encoder branch 141 includes three convolutional layers using filter sizes of (5 ⁇ ), (3 ⁇ 3) and (3 ⁇ 3) and 32, 32, and 64 channels followed by Dropout layers with rate 0.5. To reduce the date size, two max-pooling layers with pooling sizes (2,2) are added after the first two convolutional layers. Afterwards the tensor is flattened and projected to the feature space 200 using a fully connected layer with an output size of 32.
- the decoder branch 142 generally corresponds to the encoder branch 141 ; here, the max-pooling layers are replaced by up-sampling layers and the convolutional layers are replaced by transposed convolutional layers.
- the VAENN 111 uses two fully connected layers in parallel.
- the output of one fully connected layer is interpreted as the mean 144 of the Gaussian distribution of the feature embedding 149 and the other as the width 143 of the Gaussian distribution.
- the VAENN 111 learns the mapping of embedded features generated by a continuous distribution to the same filtered image label. As a result, the feature space 200 is forced to be continuous and close-by embedded features are reconstructed to the same output. Therefore, the VAENN architecture already implicitly enforces close-knit class-clusters in the feature space 200 . Due to the generative aspect of the architecture, smooth and compact class clusters are obtained. Thus, the proposed architecture is well suited to recognize embedded features produced by background motion.
- FIG. 8 is a flowchart of a method according to various examples.
- the method of FIG. 8 can be executed by a processor, upon loading program code from a memory.
- the method of FIG. 8 could be executed by the system 65 .
- at least parts of the method of FIG. 8 could be implemented by the processor 62 of the device 60 . It would also be possible that at least some parts of the method are executed by the DSP 72 of the radar sensor 70 (cf. FIG. 2 ).
- the method of FIG. 8 implements gesture classification. Based on a radar measurement, it is possible to classify an observed gesture. Accordingly, the method of FIG. 8 can implement the inference according to box 3010 of the method of FIG. 5 .
- the gesture classification can be implemented using the VAENN 111 as explained above.
- the method of FIG. 8 includes multiple boxes 3105 , 3110 , 3115 , and 3120 that together implement obtaining—box 3140 —input data for performing a gesture classification at box 3150 .
- obtaining the input data for performing the gesture classification at box 3140 can be configured differently in various examples. For instance, depending on the kinds and type of the input data, box 3140 can be implemented differently. For illustration, in a simple scenario, the input data could be pre-acquired and stored in a memory. It would also be possible that the pre-processing is performed by the radar sensor.
- box 3140 one example implementation of box 3140 will be explained in connection with boxes 3105 , 3110 , 3115 , and 3120 , but other implementations are possible. This implementation will be described by making reference to FIG. 8 , and as well to FIGS. 9 to 14 .
- the implementation that will be explained below facilitates predicting a gesture class of a gesture based on one or more positional time spectrograms, as explained above in connection with FIG. 4 .
- raw data is acquired by using a radar measurement. This can include triggering transmission of radar chirps and reading data samples output from an ADC (cf. FIG. 2 : ADC 76 ).
- the data samples 49 are illustrated in FIG. 9 .
- FIG. 9 schematically illustrates aspects with respect to the measurement data 64 .
- FIG. 9 schematically illustrates a structure of raw data in form of a data frame 45 .
- a data frame 45 is defined by arranging data samples 49 obtained as raw data from the ADC (as explained in connection with FIG. 2 ) with respect to a fast-time dimension 42 and a slow-time dimension 41 ( FIG. 9 is a schematic illustrative drawing; instead of sampling the received signal directly, the ADC samples a processed signal obtained from mixing the transmitted signal with the received signal; this is generally referred to as a Frequency-Modulated Continuous Wave, FMCW, radar measurement).
- a position along the fast time dimension 42 is incremented for each subsequent readout from the ADC (this is illustrated in the circular inset in FIG. 9 ), whereas a position along the slow time dimension 41 is incremented with respect to subsequent radar chirps 48 .
- the duration of the data frames 45 is typically defined by a measurement protocol.
- the measurement protocol can be configured to use 32 chirps within a data frame 45 .
- the frame repetition frequency may be set to 30 frames per second.
- the duration of the data frames 45 is much shorter than the duration of a gesture (gesture duration). Accordingly, it can be helpful to aggregate data from multiple subsequent data frames 45 to determine the time duration covered by each positional time spectrograms 101 - 103 . Aspects related to such aggregation are illustrated in FIG. 10 .
- FIG. 10 schematically illustrates the dependency of the measured range 251 of an object 83 on time.
- a time duration 250 during which a gesture 501 - 510 is performed (gesture duration).
- the range 251 observed by the radar measurement changes significantly as a function of time—i.e., at a large change rate.
- the range 251 is comparably static. While FIG. 10 illustrates, for illustrative purposes, the range 251 as an example of an observable of the radar measurement, also other observables-such as velocity or angle-could exhibit such qualitative behavior.
- the gesture duration 250 could correspond to a time duration of increased activity observed in the scene 80 .
- the gesture detection different to the gesture classification implemented at box 3150 —does not need to discriminate between different types of gestures, but rather solely identifies that a gesture is performed.
- time gating It would then be possible to discard (e.g., set to zero) data outside the gesture duration 250 which defines a start time and stop time of the gesture being performed. This can be referred to as time gating.
- the positional time spectrograms can be obtained by time gating the measurement data of the radar measurement based on one or more corresponding trigger events. These one or more trigger events can be associated with the gesture detection.
- a change rate of a positional observable e.g., range or velocity or azimuth or elevation angle
- a predefined threshold e.g., there are sudden changes at the beginning of the gesture duration 250 and only a small/no changes towards the end of the measurement time duration.
- a start time of the gesture duration 250 with respect to significant changes in the range or another positional coordinate such as velocity or angle.
- the stop time of the gesture duration 250 can be defined with respect to changes in the range or another positional coordinate such as velocity or angle falling below a respective threshold.
- a gesture detection algorithm For instance, a respectively trained NN may be implemented that detects absence or presence of a gesture, e.g., based on range data.
- the gesture duration 250 per gestures is pre-set or initialized at 2 s. Within this time, the test person has to perform the gesture. Some gestures like the swipes are performed in a much shorter time period, and therefore, after recording, the start and end of a gesture is detected, based on the trigger events. Thus, the gesture duration 250 is refined. The data samples within the refined gesture duration 250 are preserved, whereas the remaining data samples are set to zero. The start of a gesture is detected, if for example the energy within 10 frames 45 increases over a threshold compared to the energy of the first frame 45 in the series. The end of gesture is similarly detected when a drop of energy larger then the threshold is detected, as the trigger event.
- the measurement data is optionally preprocessed at box 3115 , to obtain the positional time spectrograms.
- the measurement data is optionally preprocessed at box 3115 , to obtain the positional time spectrograms.
- the measurement data is optionally preprocessed at box 3115 , to obtain the positional time spectrograms.
- the range Doppler (velocity), azimuth and elevation are obtained from the measurement data 64 .
- Such spectrograms show the temporal progress of the respective physical observable and allow a unique identification of a specific gesture.
- FIG. 11 and FIG. 12 illustrate the positional time spectrograms 101 - 104 , 101 *- 104 * for range, velocity, azimuthal and elevation angle, respectively. This is for two gestures 505 , 509 .
- the contrast of the 2-D image-like representations encodes the intensity of a respective positional observable.
- FIG. 11 and FIG. 12 upper row—illustrate unfiltered positional time spectrograms 101 - 104 ; while FIG. 11 and FIG. 12 —lower row—illustrate filtered positional time spectrograms 101 *- 104 * (the filtering will be explained in detail below).
- the range and Doppler spectrograms of some gestures have similar signatures.
- the right-left and left-right swipes are performed along azimuth direction whereas backward and forward swipe is performed along elevation direction.
- estimating azimuth and elevation angle can be used to differentiate those gestures.
- the range-Doppler image (RDI) of each frame is generated in a first step (cf. FIG. 13 : 7005 ; FIG. 14 : boxes 7105 , 7110 , 7115 ). This is done by 2D windowing of each data frame followed by a 2-D FFT defined as
- N st and N ft are the number of chirps 48 and number of samples 49 per chirps 48 respectively (cf. FIG. 9 ).
- a moving target indication (MTI) in form of an exponentially weighted moving average (EWMA) filter may be applied on the RDIs (cf. FIG. 13 : 7010 ).
- a range and a Doppler vector can be extracted (cf. FIGS. 13 : 7020 and 7025 ; FIG. 14 : boxes 7120 and 7125 ).
- the range vectors and correspondingly the Doppler vectors are selected based on marginalization along each axis, they are appended across time to generate the range spectrogram and Doppler spectrogram respectively (cf. FIG. 14 : boxes 7130 and 7135 ).
- the range-Doppler bin with maximum energy is selected on which digital beam forming over multiple receiving channels—i.e., the antenna dimension 43 —is applied (cf. FIG. 13 : box 7035 ; FIG. 14 , boxes 7140 and 7145 ). This is done by multiplying the selected range-Doppler data x with phase shifts sweeped across the field of view, i.e.
- x j is the complex valued selected range-Doppler bin of the j th channel
- ⁇ circumflex over ( ⁇ ) ⁇ is the estimated angle sweeped across the field of view at predefined angular steps.
- the data of receiving antennas 1 and 3 and for the elevation angle the data of antennas 2 and 3 is used.
- FIG. 11 and FIG. 12 illustrates respective filtered positional time spectrograms 101 *- 104 *.
- the gesture class is predicted based on a comparison of the mean 144 of the distribution of the feature embedding 149 of the positional time spectrograms with one or more of the predefined regions 211 - 213 defined in the feature space 200 of the feature embedding 149 (cf. FIG. 6 ).
- These predefined regions 211 - 213 can be obtained from the training of the VAENN (cf. FIG. 5 : box 3005 ). Next, techniques with respect to the training will be described.
- FIG. 15 schematically illustrates aspects with respect to the training of the VAENN 111 .
- FIG. 15 illustrates a processing pipeline for implementing the training.
- the processing pipeline can thus implement box 3005 .
- the training of the VAENN 111 is based on multiple training sets 109 of training positional time spectrograms 101 - 104 , 101 *- 104 * and associated ground-truth labels 107 .
- These training positional time spectrograms 101 - 104 , 101 *- 104 * can be obtained using the pre-processing described in connection with box 3115 of the method of FIG. 8 ; in particular, the UKF can be used to obtain the filtered positional time spectrograms 101 *- 104 *.
- the VAENN 111 receives, as input, the raw and filtered positional time spectrograms 101 - 104 , 101 *- 104 * (in FIG. 15 , only the raw spectrograms 101 - 104 are shown as input).
- the ground-truth labels 107 denote the gesture class 520 of the gesture 501 - 510 captured by the respective positional time spectrograms 101 - 103 .
- a first loss 191 is based on a difference between the reconstructions 181 - 184 of the respective input positional time spectrograms 101 - 104 and data associated with the input positional time spectrograms 101 - 104 . More specifically, in the illustrated example, the input (raw) positional time spectrograms 101 - 104 are filtered at box 7005 , e.g., using an Unscented Kalman filter, to obtain respective filtered positional time spectrograms 101 *- 104 * (cf. FIG. 11 and FIG. 12 ). These filtered positional time spectrograms 101 *- 104 * are then compared with the reconstructions 181 - 184 . For instance, a pixel-wise difference could be calculated (cf. Eq. 12). Accordingly, the VAENN 111 is trained to reconstruct filtered positional time spectrograms 101 *- 104 *.
- filtering can be helpful for training (cf. FIG. 5 : box 3005 ).
- filtering may sometimes also be used for inference (cf. FIG. 5 : Box 3010 ) and then be executed as part of the pre-processing (cf. FIG. 8 : box 3115 ).
- the VAENN 111 is trained to reconstruct the filtered positional time spectrograms 101 *- 104 *. Then, during inference, it may not be required to implement the filtering. Not having to rely on the filtering during inference (by appropriately training the VAENN 111 ) makes the implementation of the gesture classification fast and robust.
- an unscented Kalman filter may be applied to the positional time spectrograms.
- the maximum value of each positional time spectrogram is extracted, which serves as the measurement vector for the UKF. Due to filtering, outliers and measurement errors are mitigated, but on the other hand also “micro” features are removed. Especially for the gestures finger-wave and finger-rub these micro features can be important since the hand is kept static and only small finger movements define the gesture.
- filtering emphasizes the overall movement of the hand and removes outliers ( FIG. 11 and FIG. 12 : lower rows). Especially the angle estimation using only two antennas tend to have large variances in its results. Thus, the filtering is helpful to remove outliers.
- class-specific (and thus generally desirable) “micro” features can also be filtered out. For instance, this is apparent when comparing the filtered elevation angle time spectrograms 104 * for the gesture classes “circuit clockwise” and “finger wave” according to FIG. 11 and FIG. 12 : both spectrograms 104 * have a comparable qualitative shape (peak-plateau-dip)-micro features distinguishing these spectrograms 104 * are removed due to the filtering.
- the unscented transformation used in the UKF—tries to approximate the distribution of a random variable that undergoes a non-linear transformation.
- ⁇ represents both process model (.) and measurement model h(.).
- the unscented transform is used to generate sigma points ⁇ such that the distribution of ⁇ can be approximated by the mean and covariance defined as
- the UKF assumes a Gaussian random variable for the distribution of the state vector.
- the process model defines the non-linear state transition or prediction into the next time step.
- the process model transformation for x can be defined as (other motion models are possible):
- r p r + ⁇ ⁇ tv + 0.5 ⁇ ⁇ t 2 ⁇ a r ( 6 )
- v p v + ⁇ ⁇ ta r
- de and ag are random angular accelerations drawn from a normal distribution with zero mean and a variance of ⁇ /180.
- the measurement and process noise matrices are set using normalized innovation squared test and ensured that the chi-square distribution is within 95% confidence score.
- the output of the UKF is a series of filtered state vectors. These can be concatenated to obtain a respective filtered positional time spectrogram 101 *- 104 *. Each vector in the spectrogram is constructed by generating a Gaussian with mean and variance of its corresponding UKF filtered output state.
- a second loss 192 is based on a difference between the prediction of the gesture class 520 and the respective ground-truth labels 107 .
- the second loss 192 is the triplet loss which maximizes inter-class distance.
- the triplet loss is generally known from. Ge, W. Huang, D. Dong, and M. R. Scott. 2018. Deep Metric Learning with Hierarchical Triplet Loss. CoRR abs/1810.06951 (2016). arXiv: 1810.06951.
- the idea of the triplet loss is to feed three samples (i.e., three training sets 109 of positional time spectrograms 101 - 104 ) into VAENN 111 .
- the first training set 109 it the anchor
- the second training set 109 is a random sample of the same gesture class
- the third training set 109 is a random sample of another gesture class.
- the embedding is modeled as Gaussian distribution, as explained above.
- d ( ⁇ 1 , ⁇ 2 ) ( ⁇ 1 ⁇ 2 )+( ⁇ 1 ⁇ 2 ).
- ⁇ 1 , ⁇ 2 denote the means 144 of the respective distributions of the samples.
- a statistical distance could be considered. For example, the Mahalanobis distance between the anchor distribution and the mean of either the positive or negative distribution may be evaluated.
- the triplet loss evaluates the distance between single embedded feature vectors of anchor, positive and negative, whereas the statistical distance triplet loss operates on distributions.
- the statistical triplet loss is determined based on the statistical distances between the distribution of the feature embedding 149 obtained for the anchor set and the means of the distributions obtained for the positive and negative sets 109 , respectively.
- the reconstruction loss 191 aims to minimize the difference between the reconstructed images and the label images, e.g., the filtered positional time spectrograms 101 *- 104 *.
- the mean squared error defines as
- C is the number of channels
- N and M are the dimensions of the input/output images (here, the filtered positional time spectrograms 101 *- 104 *; and the respective reconstructions 181 - 184 )
- Y rec are the reconstructions 181 - 184
- Y lab are the label images (here, the filtered positional time spectrograms 101 *- 104 *).
- the feature embedding 149 of an input sample is modeled as a multivariate Gaussian distributed random variable X.
- the underlying and unknown distribution is approximated by a multivariate standard Gaussian distribution.
- the difference between the underlying distribution of the feature embedding 149 and the multivariate standard Gaussian distribution is evaluated using the Kullback-Leibler (KL) divergence defined as
- the center loss minimizes the Euclidean intra-class distances and therefore leads to more discriminative classes.
- the standard center loss is defined as
- L center ⁇ c ⁇ C ( ⁇ c ⁇ - x c ) T ⁇ ( ⁇ c ⁇ - x c ) ( 14 )
- C is the set of all classes
- A is the estimated mean of class c
- x c is the embedded feature vector of a set 109 associated to class c.
- VAENN 111 operates with distributions in the feature space 200 , a re-formulation of the classical center loss towards a statistical-distance-based center loss, which minimizes the spread of samples according to its underlying class distribution, is possible.
- a class distribution is defined by a combination of the multiple distributions of the feature embedding 149 of the VAENN 111 obtained for all sets of input data associated with a given class.
- the class distribution can be estimated by the mean/average of the embedded distributions of all samples associated to the same class.
- ⁇ ⁇ c 1 ⁇ " ⁇ [LeftBracketingBar]" X c ⁇ " ⁇ [RightBracketingBar]” ⁇ ⁇ x ⁇ X c ⁇ x and the variance is defined as
- ⁇ ⁇ c 2 1 ⁇ " ⁇ [LeftBracketingBar]" X c ⁇ " ⁇ [RightBracketingBar]” 2 ⁇ ⁇ x ⁇ X c ⁇ x 2 , X c is the set of embedded feature distributions belonging to class c.
- the covariance matrix ⁇ c is defined as a diagonal matrix with ⁇ c 2 entries.
- the Mahalanobis distance (or another statistical distance) can be used to evaluate the statistical distance center loss defined as
- L center stat ⁇ c ⁇ ⁇ C ( ⁇ c ⁇ - x c ) T ⁇ ⁇ ⁇ c - 1 ( ⁇ k ⁇ - x c ) ( 15 )
- C is the set of all classes
- ⁇ circumflex over ( ⁇ ) ⁇ c is the estimated mean of class c
- ⁇ circumflex over ( ⁇ ) ⁇ c is the estimated covariance matrix of class c
- x c is the embedded mean of a sample belonging to class c.
- the statistical distance center loss is determined based on the statistical distance between the class distribution of each gesture class and the respective means of the distribution of the feature embeddings of the VAENN obtained for all training samples associated with this gesture class.
- the (statistical) triplet loss helps to maximize the inter-class distance
- the (statistical distance) center loss helps to minimize the intra-class distance
- each gesture class Based on the class distributions of the feature embedding 149 of the VAENN 111 obtained for the training sets 109 of training positional time spectrograms 101 - 104 belonging to each gesture class, it is possible to determine the regions 211 - 213 in the feature space 200 used during gesture classification in the inference phase at box 3010 . Each region 211 - 213 is thus associated with a respective gesture class 520 .
- these regions 211 - 213 could be centered around the respective means 144 of the class distributions and have a size that is determined in accordance with the standard deviations 143 .
- regions 211 - 213 may be stored as parameters along with the VAENN 111 and then used during inference at box 3010 . It can be decided whether the mean 144 of a certain instance of the feature embedding is inside or outside such regions 211 - 213 .
- a further gesture not covered by any gesture class of the sets 109 of training positional time spectrograms—is performed multiple times. I.e., a further gesture class may be observed. This is illustrated in FIG. 6 by data points 204 in the feature space 200 included in a cluster 214 that is offset to any of the pre-trained regions 211 - 213 .
- the gesture classification is based on a radar measurement.
- a VAENN is used to predict the gesture class.
- VAENN it is possible to add variations to the input data during training-using a sampling operation-without the need of augmenting the input data. Thereby, a robustness against noise or clutter is increased. Also, user-specific variations of the gestures can be captured.
- VAENN training the VAENN. Specifically, techniques have been disclosed which rely on a statistical distance such as the Mahalanobis distance when determining a respective loss.
- class distributions e.g., based on the assumption of an underlying Gaussian distribution. This can be done under the assumption that the distributions of the feature embedding of the samples of a gesture class are independent and distributed identically across the class. Accordingly, it is possible to calculate the class distribution as the average of all distributions of the feature embedding obtained for of all training sets of a specific gesture class.
- VAENN can also be applied for achieving a robust gesture classification using other sensors such as vision, ultra-sonic sensors and any other sensor capable of receiving gesture feedback.
- the raw data obtained from the radar sensor undergoes a preprocessing step (cf., e.g., box 3115 ) to obtain features relevant for the purpose of gesture classification.
- a preprocessing step cf., e.g., box 3115
- similar specific gesture feature extraction process can be performed for other sensor such as velocity and range information where applicable.
- the possible gesture classification is not limited to just hand gesture but virtually any form of gesture feedback such as body pose or facial expressions.
- a statistical distance is determined and considered in a loss for training the VAENN.
- the disclosed embodiments are not limited to a statistical distance between a distribution and a point (mean), but can also be applied to a distance between two distributions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Psychiatry (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- User Interface Of Digital Computer (AREA)
- Radar Systems Or Details Thereof (AREA)
Abstract
Description
-
- S. Ahmed, F. Khan, A. Ghaffar, F. Hussain, and S. Cho. 2019. Finger-Counting-Based Gesture Recognition within Cars Using Impulse Radar with Convolutional Neural Network. Sensors 19, 6 (March 2019), 1429;
- S. Hazra and A. Santra. 2018. Robust Gesture Recognition Using Millimetric-Wave Radar System. IEEE Sensors Letters 2, 4 (2018), 1-4;
- S. Hazra and A. Santra. 2019. Short-Range Radar-Based Gesture Recognition System Using 3D CNN With Triplet Loss. IEEE Access 7 (2019), 125623-12563;
- Y. Kim and B. Toomajian. 2016. Hand Gesture Recognition Using Micro-Doppler Signatures With Convolutional Neural Network. IEEE Access 4 (2016), 7125-7130;
- G. Li, R. Zhang, M. Ritchie, and H. Griffiths. 2018. Sparsity-Driven Micro-Doppler Feature Extraction for Dynamic Hand Gesture Recognition. IEEE Trans. Aerospace Electron. Systems 54, 2 (2018), 655-665;
- J. Lien, N. Gillian, E. Karagozler, P. Amihood, C. Schwesig, E. Olson, H. Raja, and I. Poupyrev. 2016. Soli: Ubiquitous Gesture Sensing with Millimeter Wave Radar. ACM Trans. Graph. 35, 4, Article 142 (July 2016);
- P. Molchanov, S. Gupta, K. Kim, and K. Pulli. 2015. Short-range FMCW monopulse radar for hand-gesture sensing. In 2015 IEEE Radar Conference (RadarCon). 1491-1496;
- P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, and J. Kautz. 2016. Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4207-4215;
- K. A. Smith, C. Csech, D. Murdoch, and G. Shaker. 2018. Gesture Recognition Using mm-Wave Sensor for Human-Car Interface. IEEE Sensors Letters 2, 2 (2018), 1-4;
- Y. Sun, T. Fei, S. Gao, and N. Pohl. 2019. Automatic Radar-based Gesture Detection and Classification via a Region-based Deep Convolutional Neural Network. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 4300-4304;
- Yuliang Sun, Tai Fei, Xibo Li, Alexander Warnecke, Ernst Warsitz, and Nils Pohl. 2020. Real-time radar-based gesture detection and recognition built in an edge-computing platform. IEEE Sensors Journal 20, 18 (2020), 10706-10716;
- Q. Wan, Y. Li, C. Li, and R. Pal. 2014. Gesture recognition for smart home applications using portable radar sensors. In 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 6414-6417;
- S. Wang, J. Song, J. Lien, I. Poupyrev, and O. Hilliges. 2016. Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (Tokyo, Japan) (UIST'16). Association for Computing Machinery, New York, NY, USA, 851-860;
- Z. Zhang, Z. Tian, and M. Zhou. 2018. Latern: Dynamic Continuous Hand Gesture Recognition Using FMCW Radar Sensor. IEEE Sensors Journal 18 (2018), 3278-3289;
- J. Deng, J. Guo, N. Xue, and S. Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4690-4699;
- W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song. 2017. SphereFace: Deep Hypersphere Embedding for Face Recognition. CoRR abs/1704.08063 (2017). arXiv: 1704.08063;
- F. Wang, J. Cheng, W. Liu, and H. Liu. 2018. Additive margin softmax for face verification. IEEE Signal Processing Letters 25, 7 (2018), 926-930;
- H. Wang, Y. Wang, Z. Zhou, X. Ji, Z. Li, D. Gong, J. Zhou, and W. Liu. 2018. CosFace: Large Margin Cosine Loss for Deep Face Recognition. CoRR abs/1801.09414 (2018). arXiv: 1801.09414;
- L. He, Z. Wang, Y. Li, and S. Wang. 2020. Softmax Dissection: Towards Understanding Intra- and Inter-class Objective for Embedding Learning. ArXiv abs/1908.01281 (2020);
- Y. Wen, K. Zhang, Z. Li, and Y. Qiao. 2016. A discriminative feature learning approach for deep face recognition. In European conference on computer vision. Springer, 499-515; and
- T. Stadelmayer, M. Stadelmayer, A. Santra, R. Weigel, and F. Lurz. 2020. Human Activity Classification Using Mm-Wave FMCW Radar by Improved Representation Learning. In Proceedings of the 4th ACM Workshop on Millimeter-Wave Networks and Sensing Systems (London, United Kingdom) (mmNets'20). Association for Computing Machinery, New York, NY, USA, Article 1, 6 pages.
| TABLE 1 |
| examples of various gesture classes that could be used in |
| a respective predefined set to implement the gesture classification. |
| These gesture classes define the gestures that can be recognized. |
| Further details with respect to such gesture classes will |
| be explained in connection with FIG. 3. |
| (1) | Swipe left to right |
| (2) | Swipe right to left |
| (3) | Swipe top to down |
| (4) | Swipe down to top |
| (5) | Circle clockwise |
| (6) | Circle anti-clockwise |
| (7) | Swipe back to front |
| (8) | Swipe front to back |
| (9) | Finger wave - wave single fingers |
| (10) | Finger rub - thumb sliding over fingers |
where w(m, n) is the 2D weighting function along the fast-time and slow-time and s(m, n) is signal within a data frame. The indices n, m sweep along the fast-
x MTI =x i −x avg
x avg =ax i+(1−a)x avg (2)
where xMTI is the MTI filtered RDI, x1 is the RDI of the current time step and xavg is the average RDI of the filter.
where xj is the complex valued selected range-Doppler bin of the jth channel, {circumflex over (θ)} is the estimated angle sweeped across the field of view at predefined angular steps. To estimate the azimuthal angle, the data of receiving antennas 1 and 3 and for the elevation angle the data of antennas 2 and 3 is used.
where χ(i) are the ‘sigma points’ and Wi are the consecutive weights. In total 2nη+1 ‘sigma points’ are generated with nη being the dimension of the state n and computed as
where
is the i-th column of
which is the Cholesky decomposition of matrix Ω. The state vector of the UKF is defined as x=[rvθ{dot over (θ)}ϕ{dot over (ϕ)}], where r and v are the radial position and velocity respectively, θ and ϕ are the azimuth and elevation angles and {dot over (θ)} and {dot over (ϕ)} are the respective angular velocities. The UKF assumes a Gaussian random variable for the distribution of the state vector. The linear measurement model (.) accounts for the trivial transformation of state vector into measurement domain. H(.) only extracts the range, velocity, azimuth end elevation angle from the state vector x. Hence, the measurement vector is defined as z=Hx. The process model defines the non-linear state transition or prediction into the next time step. The process model transformation for x can be defined as (other motion models are possible):
where de and ag are random angular accelerations drawn from a normal distribution with zero mean and a variance of π/180.
d(x 1 ,x 2)=(x 1 −x 2)T(x 1 −x 2) (7)
where x1 is the anchor and x2 is either the positive or negative sample.
d(μ1,μ2)=(μ1−μ2)+(μ1−μ2). (8)
d stat(μa,Σa,μ2)=(μa−μ2)TΣa −1(μa−μ2) (9)
where μa and Σa are the mean and covariance matrix of the anchor distribution Xa, and μ2 is either the mean of the positive or negative sample distribution, respectively.
L triplet=max(d(μa,μp)−d(μa,μn)+α,0), (10)
L triplet stat=max(d stat(μa,Σa,μp)−d stat(μa,Σa,μn)+α,0) (11)
where μa and Σa define the anchor distribution Xa, μp and μn are the mean feature vectors of positive and negative sample respectively and a is a hyper-parameter. Both the triplet loss as well as the statistical triplet loss may be used in examples disclosed herein.
where C is the number of channels, N and M are the dimensions of the input/output images (here, the filtered
where K is the dimension of a random variable X and μ(X)k and Σ(X)k is the mean an variance value of its kth dimension. The resulting divergence defines the KL-Divergence loss. By optimizing the KL-divergence the maximization of the variational lower bound is achieved. Next, a further example of the
where C is the set of all classes, A is the estimated mean of class c, and xc is the embedded feature vector of a
and the variance is defined as
Xc is the set of embedded feature distributions belonging to class c. The covariance matrix Σc is defined as a diagonal matrix with σc 2 entries.
where C is the set of all classes, {circumflex over (μ)}c is the estimated mean of class c, {circumflex over (Σ)}c is the estimated covariance matrix of class c and xc is the embedded mean of a sample belonging to class c.
L VAE=∝1Ltriplet (stat)+∝2 L MSE+∝3 L KL+∝4 L center (stat) (16)
where ∝1 to ∝4 are hyper-parameters that can be predefined.
Claims (20)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP21190926.2A EP4134924B1 (en) | 2021-08-12 | 2021-08-12 | Radar-based gesture classification using a variational auto-encoder neural network algorithm |
| EP21190926 | 2021-08-12 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230068523A1 US20230068523A1 (en) | 2023-03-02 |
| US12307821B2 true US12307821B2 (en) | 2025-05-20 |
Family
ID=77316831
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/886,264 Active 2043-06-21 US12307821B2 (en) | 2021-08-12 | 2022-08-11 | Radar-based gesture classification using a variational auto-encoder neural network |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12307821B2 (en) |
| EP (1) | EP4134924B1 (en) |
| CN (1) | CN115705757A (en) |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4134924B1 (en) * | 2021-08-12 | 2025-10-01 | Infineon Technologies AG | Radar-based gesture classification using a variational auto-encoder neural network algorithm |
| US12487599B2 (en) | 2022-03-24 | 2025-12-02 | Dell Products L.P. | Efficient event-driven object detection at the forklifts at the edge in warehouse environments |
| US12530619B2 (en) * | 2022-07-14 | 2026-01-20 | Dell Products L.P. | Feature-aware open set multi-model for trajectory classification in mobile edge devices |
| US12524886B2 (en) | 2022-07-27 | 2026-01-13 | Dell Products L.P. | Object-driven event detection from fixed cameras in edge environments |
| US20240230841A9 (en) * | 2022-10-21 | 2024-07-11 | Texas Instruments Incorporated | Frequency modulated continuous wave radar system with object classifier |
| EP4432162A1 (en) | 2023-03-13 | 2024-09-18 | Infineon Technologies Dresden GmbH & Co . KG | Early-exit neural networks for radar processing |
| CN116524537A (en) * | 2023-04-26 | 2023-08-01 | 东南大学 | Human body posture recognition method based on CNN and LSTM combination |
| CN116482680B (en) * | 2023-06-19 | 2023-08-25 | 精华隆智慧感知科技(深圳)股份有限公司 | Body interference identification method, device, system and storage medium |
| EP4650991A1 (en) | 2024-05-15 | 2025-11-19 | Infineon Technologies AG | Data augmentation for object-specific kinematic observables obtained from radar measurement data |
| CN119622679B (en) * | 2024-11-25 | 2025-10-21 | 重庆大学 | A continuous identity authentication method based on multi-level memory-enhanced autoencoder |
| CN119690319B (en) * | 2024-12-19 | 2025-11-25 | 清华大学 | Symbol input method and device based on tapping gesture recognition |
Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10205457B1 (en) * | 2018-06-01 | 2019-02-12 | Yekutiel Josefsberg | RADAR target detection system for autonomous vehicles with ultra lowphase noise frequency synthesizer |
| US10466772B2 (en) * | 2017-01-09 | 2019-11-05 | Infineon Technologies Ag | System and method of gesture detection for a remote device |
| US10699538B2 (en) * | 2016-07-27 | 2020-06-30 | Neosensory, Inc. | Method and system for determining and providing sensory experiences |
| US20210034160A1 (en) * | 2019-07-29 | 2021-02-04 | Qualcomm Incorporated | Gesture detection in interspersed radar and network traffic signals |
| US10937438B2 (en) * | 2018-03-29 | 2021-03-02 | Ford Global Technologies, Llc | Neural network generative modeling to transform speech utterances and augment training data |
| US11036303B2 (en) * | 2019-03-29 | 2021-06-15 | Tata Consultancy Services Llc | Systems and methods for three-dimensional (3D) reconstruction of human gestures from radar based measurements |
| US11354823B2 (en) * | 2017-07-11 | 2022-06-07 | Deepmind Technologies Limited | Learning visual concepts using neural networks |
| US11432753B2 (en) * | 2018-08-08 | 2022-09-06 | Tata Consultancy Services Limited | Parallel implementation of deep neural networks for classifying heart sound signals |
| US11475910B2 (en) * | 2020-02-11 | 2022-10-18 | Purdue Research Foundation | System and methods for machine anomaly detection based on sound spectrogram images and neural networks |
| US20230068523A1 (en) * | 2021-08-12 | 2023-03-02 | Infineon Technologies Ag | Radar-Based Gesture Classification Using a Variational Auto-Encoder Neural Network |
| US11640208B2 (en) * | 2019-11-21 | 2023-05-02 | Infineon Technologies Ag | Gesture feedback in distributed neural network system |
| US11774553B2 (en) * | 2020-06-18 | 2023-10-03 | Infineon Technologies Ag | Parametric CNN for radar processing |
-
2021
- 2021-08-12 EP EP21190926.2A patent/EP4134924B1/en active Active
-
2022
- 2022-08-11 US US17/886,264 patent/US12307821B2/en active Active
- 2022-08-11 CN CN202210966956.1A patent/CN115705757A/en active Pending
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10699538B2 (en) * | 2016-07-27 | 2020-06-30 | Neosensory, Inc. | Method and system for determining and providing sensory experiences |
| US10466772B2 (en) * | 2017-01-09 | 2019-11-05 | Infineon Technologies Ag | System and method of gesture detection for a remote device |
| US11354823B2 (en) * | 2017-07-11 | 2022-06-07 | Deepmind Technologies Limited | Learning visual concepts using neural networks |
| US10937438B2 (en) * | 2018-03-29 | 2021-03-02 | Ford Global Technologies, Llc | Neural network generative modeling to transform speech utterances and augment training data |
| US10205457B1 (en) * | 2018-06-01 | 2019-02-12 | Yekutiel Josefsberg | RADAR target detection system for autonomous vehicles with ultra lowphase noise frequency synthesizer |
| US11432753B2 (en) * | 2018-08-08 | 2022-09-06 | Tata Consultancy Services Limited | Parallel implementation of deep neural networks for classifying heart sound signals |
| US11036303B2 (en) * | 2019-03-29 | 2021-06-15 | Tata Consultancy Services Llc | Systems and methods for three-dimensional (3D) reconstruction of human gestures from radar based measurements |
| US20210034160A1 (en) * | 2019-07-29 | 2021-02-04 | Qualcomm Incorporated | Gesture detection in interspersed radar and network traffic signals |
| US11640208B2 (en) * | 2019-11-21 | 2023-05-02 | Infineon Technologies Ag | Gesture feedback in distributed neural network system |
| US11475910B2 (en) * | 2020-02-11 | 2022-10-18 | Purdue Research Foundation | System and methods for machine anomaly detection based on sound spectrogram images and neural networks |
| US11774553B2 (en) * | 2020-06-18 | 2023-10-03 | Infineon Technologies Ag | Parametric CNN for radar processing |
| US20230068523A1 (en) * | 2021-08-12 | 2023-03-02 | Infineon Technologies Ag | Radar-Based Gesture Classification Using a Variational Auto-Encoder Neural Network |
Non-Patent Citations (29)
| Title |
|---|
| Ahmed, S. et al., "Finger-Counting-Based Gesture Recognition within Cars Using Impulse Radar with Convolutional Neural Network," sensors, MDPI, Mar. 23, 2019, 14 pages. |
| Deng, J. et al., "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jan. 9, 2020, 10 pages. |
| Dubey, A. et al., "A Bayesian Framework for Integrated Deep Metric Learning and Tracking of Vulnerable Road Users Using Automotive Radars", IEEE Acess, May 5, 2021, 20 pages. |
| Erol, B. et al., "Exploitation of motion capture data for improved synthetic micro-doppler signature generation with adversarial learning", SPIE Proceedings, May 18, 2020, 8 pages. |
| Gurbuz, S. et al., "Radar-Based Human-Motion Recognition With Deep Learning: Promising applications for indoor monitoring", IEEE Signal Processing Magazine, Jul. 2019, 13 pages. |
| Hazra, S. et al., "Robust Gesture Recognition Using Millimetric-Wave Radar System," IEEE Sensors Letters, vol. 2, No. 4, Dec. 2018, 4 pages. |
| Hazra, S. et al., "Short-Range Radar-Based Gesture Recognition System Using 3D CNN With Triplet Loss," IEEE Access, Sep. 17, 2019, 11 pages. |
| He, L. et al., "Softmax Dissection: Towards Understanding Intra- and Inter-Class Objective for Embedding Learning," The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), Feb. 12, 2020, 8 pages. |
| Kim, Y. et al., "Hand Gesture Recognition Using Micro-Doppler Signatures With Convolutional Neural Network," IEEE Access, Oct. 13, 2016, 6 pages. |
| Kingma, D. et al., "Auto-Encoding Variational Bayes," arXiv:1312.6114v10, May 1, 2014, 14 pages. |
| Li, G. et al., "Sparsity-Driven Micro-Doppler Feature Extraction for Dynamic Hand Gesture Recognition," IEEE Transactions on Aerospace and Electronic Systems, Apr. 2018, 21 pages. |
| Lien, J. et al., "Soli: Ubiquitous Gesture Sensing with Millimeter Wave Radar," ACM Trans. Graph., vol. 35, No. 4, Article 142, Jul. 2016, 19 pages. |
| Liu, W. et al., "SphereFace: Deep Hypersphere Embedding for Face Recognition," Computer Vision Foundation, Apr. 26, 2017, 9 pages. |
| Molchanov, P. et al., "Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks," Computer Vision Foundation, Dec. 12, 2016, 9 pages. |
| Molchanov, P. et al., "Short-Range FMCW Monopulse Radar for Hand-Gesture Sensing," 2015 IEEE Radar Conference (RadarCon), Jun. 25, 2015, 6 pages. |
| Rautaray, S. et al., "Vision based hand gesture recognition for human computer interaction: a survey," Artificial Intelligence Review, Nov. 6, 2012, 54 pages. |
| Smith, K. et al., "Gesture Recognition Using mm-Wave Sensor for Human-Car Interface," ResearchGate, Sensors Letters, Feb. 2018, 5 pages. |
| Stadelmayer, T. et al., "Human Activity Classification Using mm-Wave FMCW Radar by Improved Representation Learning," mmNets'20: Proceedings of the 4th ACM Workshop on Millimeter-Wave Networks and Sensing Systems, Sep. 2020, 6 pages. |
| Sun, Y. et al., "Automatic Radar-based Gesture Detection and Classification via a Region-based Deep Convolutional Neural Network," ResearchGate, May 2019, 7 pages. |
| Sun, Y. et al., "Real-Time Radar-Based Gesture Detection and Recognition Built in an Edge-Computing Platform," EEE Sensors Journal, arXiv:2005.10145v1, May 20, 2020, 10 pages. |
| Wan, Q. et al., "Gesture Recognition for Smart Home Applications using Portable Radar Sensors," ResearchGate, 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Aug. 2014, 5 pages. |
| Wang, F. et al., "Additive Margin Softmax for Face Verification," IEEE Signal Processing Letters, arXiv:1801.05599v4, May 30, 2018, 7 pages. |
| Wang, H. et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition," CVPR, IEEE, Dec. 16, 2018, 10 pages. |
| Wang, S. et al., "Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition in the Radio-Frequency Spectrum," UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology, Oct. 2016, 10 pages. |
| Wen, Y. et al., "A Discriminative Feature Learning Approach for Deep Face Recognition," Lecture Notes in Computer Science, Sep. 16, 2016, 17 pages. |
| Wikipedia, "Electromagnetic coil," https://en.wikipedia.org/w/index.php?title=Electromagnetic_coil&oldid=776415501, Jul. 11, 2018, 6 pages. |
| Wikipedia, "Qi (standard)," https://en.wikipedia.org/w/index.php?title=Qi_(standard)&oldid=803427516, Jul. 11, 2018, 5 pages. |
| Zhang, J. et al., "Doppler-Radar Based Hand Gesture Recognition System Using Convolutional Neural Networks", arXiv;1711.02254v3, Nov. 22, 2017, 8 pages. |
| Zhang, Z. et al., "Latern: Dynamic Continuous Hand Gesture Recognition Using FMCW Radar Sensor," IEEE Sensors Journal, vol. 18, No. 8, Apr. 15, 2018, 12 pages. |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4134924A1 (en) | 2023-02-15 |
| EP4134924B1 (en) | 2025-10-01 |
| US20230068523A1 (en) | 2023-03-02 |
| CN115705757A (en) | 2023-02-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12307821B2 (en) | Radar-based gesture classification using a variational auto-encoder neural network | |
| Arsalan et al. | Character recognition in air-writing based on network of radars for human-machine interface | |
| Pegoraro et al. | Real-time people tracking and identification from sparse mm-wave radar point-clouds | |
| Majumder et al. | Deep learning for radar and communications automatic target recognition | |
| Hazra et al. | Short-range radar-based gesture recognition system using 3D CNN with triplet loss | |
| Seyfioglu et al. | DNN transfer learning from diversified micro-Doppler for motion classification | |
| Sun et al. | Real-time radar-based gesture detection and recognition built in an edge-computing platform | |
| CN111722706B (en) | Method and system for identifying aerial writing characters based on radar network | |
| Dubey et al. | A bayesian framework for integrated deep metric learning and tracking of vulnerable road users using automotive radars | |
| US20220269926A1 (en) | Radar-Based Object Tracking Using a Neural Network | |
| US12529777B2 (en) | Radar-based motion classification using one or more time series | |
| Arsalan et al. | Air-writing with sparse network of radars using spatio-temporal learning | |
| Arsalan et al. | Radar trajectory-based air-writing recognition using temporal convolutional network | |
| Sun et al. | Multi-feature encoder for radar-based gesture recognition | |
| Chen et al. | MMHTSR: In-air handwriting trajectory sensing and reconstruction based on mmWave radar | |
| Li et al. | Dynamic gesture recognition method based on millimeter-wave radar | |
| Zhai et al. | A multialignment task-adaptive method for MFR mode recognition on few-shot open-set learning | |
| WO2022093565A1 (en) | Feature extraction for remote sensing detections | |
| Wang et al. | mPCT-LSTM: A lightweight human activity recognition model for 3D point clouds in millimeter-wave radar | |
| Wu et al. | A scalable gesture interaction system based on mm-wave radar | |
| Bai et al. | Deep CNN for micromotion recognition of space targets | |
| CN120468793A (en) | An adaptive gesture recognition method based on FMCW radar | |
| Wang et al. | Multi-human activity recognition based on sequential 4D point clouds using frequency-modulated continuous wave radar | |
| Stadelmayer et al. | Out-of-distribution detection for radar-based gesture recognition using metric-learning | |
| Chang et al. | Efficient human action recognition with fine-grained spatiotemporal feature extraction from millimeter-wave point clouds |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: INFINEON TECHNOLOGIES AG, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SANTRA, AVIK;HAZRA, SOUVIK;STADELMAYER, THOMAS REINHOLD;SIGNING DATES FROM 20220812 TO 20220816;REEL/FRAME:061284/0423 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |