CN115294656A - FMCW radar-based hand key point tracking method - Google Patents

FMCW radar-based hand key point tracking method Download PDF

Info

Publication number
CN115294656A
CN115294656A CN202211013101.3A CN202211013101A CN115294656A CN 115294656 A CN115294656 A CN 115294656A CN 202211013101 A CN202211013101 A CN 202211013101A CN 115294656 A CN115294656 A CN 115294656A
Authority
CN
China
Prior art keywords
key point
network
radio frequency
hand
radar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211013101.3A
Other languages
Chinese (zh)
Inventor
韩崇
李帮杰
孙力娟
郭剑
薛景
王娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202211013101.3A priority Critical patent/CN115294656A/en
Publication of CN115294656A publication Critical patent/CN115294656A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/66Radar-tracking systems; Analogous systems
    • G01S13/72Radar-tracking systems; Analogous systems for two-dimensional tracking, e.g. combination of angle and range tracking, track-while-scan radar
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/86Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
    • G01S13/867Combination of radar systems with cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

A method for tracking key points of hands based on an FMCW radar aims at solving the problems of limitation of light receiving line parts, privacy leakage and the like existing in the existing method for tracking the key points of hands by using an optical camera, and utilizes a cross-modal supervision method to take images and data acquired by the camera and the radar as the input of a neural network in training, synchronize radar data and video streams, extract key point information of the hands from the video streams for preprocessing and monitor the key point information as a radio frequency signal processed by the neural network. The trained system can realize the output of tracking the key points of the hands only by using radio frequency signals as input. By the method, the accuracy of gesture recognition can be improved under the condition that personal privacy is protected and illumination conditions are not considered, and the method has the characteristics of strong robustness, stability, instantaneity and high efficiency.

Description

FMCW radar-based hand key point tracking method
Technical Field
The invention belongs to the field of crossing of wireless perception and computer vision, relates to the technical field of millimeter wave radars and neural networks, and particularly relates to a method for tracking key points of a hand based on an FMCW radar.
Background
In recent years, with the advancement of science and technology, man-machine interaction has gradually progressed to a point closely related to the daily life of human beings. The pursuit of more efficient and easier information interaction becomes the core of human-computer interaction research. The new technologies such as face recognition, gesture recognition, lip reading, voice recognition and gesture recognition gradually change the interaction mode taking a computer as the center in the past, so that a user really becomes the main part in a man-machine interaction mode.
The method for acquiring information of gesture actions by using the optical camera is a mature gesture recognition method, although the high-resolution camera enables the recognition rate of a visual gesture recognition technology to be higher than 90%, the light condition of the environment where the optical camera is located has great influence on the recognition rate, and the description effect on gesture information can be greatly reduced in the environment with too strong light or too dim light. Although the current technology can solve the problem by adding a night vision camera and the like, the technology cost is correspondingly increased, and the application range is greatly reduced. Secondly, the method has the problem of privacy leakage, because the characteristics of the optical image determine the possibility of information leakage such as images and videos, and the current era is an era extremely sensitive to personal privacy, the information leakage problem can have great influence on the product and the technical development. Meanwhile, the method has higher energy consumption and higher requirements on operation resources, so that the method cannot be applied to a system with a relatively simple equipment environment in a large scale.
Disclosure of Invention
Aiming at the technical problems, the invention provides a hand key point tracking method, which utilizes an FMCW radar to perform gesture recognition, and has the advantages that based on the gesture recognition of the FMCW radar, data flow is radar signals instead of optical image signals, even if signals are leaked, an attacker can hardly directly see any useful information, and certain guarantee is provided for the safety of the system. In addition, the dynamic gesture recognition based on the FMCW radar can be integrated on a high-speed processing chip with low energy consumption and small volume, which provides the possibility of being embedded in a portable device.
A method for tracking hand key points based on an FMCW radar comprises the following steps:
step 1, initializing an FMCW radar system, configuring parameters of hand information sampling, including receiving and transmitting antenna pairs, sampling points and sampling time, and simultaneously shooting a complete hand motion track by using a camera;
step 2, carrying out corresponding preprocessing on the obtained image information and the radio frequency signal; for image information, storing a GBR image stored in a camera as an RGB image; firstly, clutter suppression is carried out on radio frequency signals, a range-Doppler heat map RDI is formed by carrying out Fourier transform FFT on a range-velocity dimension, the signals are respectively processed in a horizontal direction and a vertical direction compared with the ground, and a horizontal heat map H is formed l And vertical heatmap H v
Step 3, processing the hand information in the picture for the stored image information, capturing the hand information in the picture, automatically marking each key point of the hand according to the obtained information, and obtaining a hand key point confidence map from the video;
step 4, obtaining H l And H v Coding is carried out through a radio frequency coding network, and the coding is input into a CNN; inputting the feature map into different convolutional layers in the CNN to realize feature extraction, and then decoding the coded heat map information through a radio frequency decoding network to obtain a key point confidence map from radio frequency;
step 5, interacting image information and radio frequency information of different modes by using a cross-mode learning and supervised learning method, calling a network for obtaining a video key point confidence map as a teacher network, calling a network for obtaining a radio frequency key point confidence map as a student network, constructing a cross-supervised learning teacher-student network, and detecting the reliability of the key point confidence map obtained by processing radio frequency signals through the network; identifying and tracking the positions of the key points of the hands obtained from the radio frequency signals in a cross-supervision learning mode;
and 6, tracking the key points of the hands by the trained system only through radio frequency signals without video image assistance.
Further, in the step 1, acquiring an original signal of the dynamic gesture through an FMCW radar, setting the period of each frequency modulation continuous pulse chirp as t, S as a frequency increasing slope, tau as a delay of the signal from the radar to the gesture and then back, and f as a carrier frequency of the radar; emission signal S of radar 1 Expressed as:
S 1 =sin(2πft+πSt·t)
receiving signal S 2 Expressed as:
S 2 =sin[2πf(t-τ)+πS(t-τ) 2 ]
after passing through the mixer and the low-pass filter, the output intermediate frequency signal S is:
Figure BDA0003811700030000031
the frequency f of the intermediate frequency signal is obtained by performing one-dimensional Fourier transform on the formula IF Setting the distance from the gesture target to the radar as d, the light speed as c, and the formula as follows:
Figure BDA0003811700030000041
the same processing is repeatedly carried out on the plurality of chirp signals by the method, and then the processed signals are spliced into one frame of data to obtain the radio frequency signals from the radar.
Further, in step 2, converting the hand image acquired by the camera into a corresponding RGB image with a size of 200 × 200 for storage; for radio frequency signals, a frequency domain-based feature extraction method is utilized, signals with complex time domains are transformed to frequency domains through a Fourier transform method in the horizontal direction and the vertical direction, the conditions of frequency components of the signals are observed, and features are extracted in the frequency domains; fourier transform of continuous signal
Figure BDA0003811700030000042
Generating a frequency spectrum having distinct separate peaks by FFT processing, each peak indicating the presence of an object at a particular distance; further taking the phase of each effective data at the same distance to perform FFT again, and distinguishing a plurality of targets with different speeds at the same distance; then after phase FFT, the phase difference omega of each target is obtained 1 、ω 2 Further obtaining targets with different speeds, wherein the hand characteristic diagram, namely the range-Doppler diagram RDI, is obtained at the moment; the horizontal heat map is a projection of the signal reflection on a plane parallel to the ground, and the vertical heat map is a projection of the signal reflection on a plane perpendicular to the ground.
Further, in step 4, the radio frequency coding network uses 10 layers of 9 × 5 × 5 space-time convolution, the step size of each layer is 1 × 2 × 2, and batch processing normalization is utilized after input is completed; after each layer, the ReLU activation function f (x) = max (0, x) is used, and the encoded data is input to the CNN.
Further, in step 4, the radio frequency decoding network decodes the encoded heat map information, and the decoding network has 5 layers except the last layer with the step size of
Figure BDA0003811700030000051
In addition, the step lengths of other layers are
Figure BDA0003811700030000052
Furthermore, after each layer a ReLu function is used, for the last layer a sigmoid function is used as output layer.
Further, in step 5, the image information and the radio frequency signal are respectively input into the teacher network and the student network, the student network receives the key point confidence maps from the teacher network markers and compares the key point confidence maps with the predicted key point confidence maps, and the key point confidence maps from the teacher network provide cross-modal supervision for the student network, so that the student network learns from the teacher network and successfully predicts the key point confidence maps.
Further, in step 5, the goal of student network training is to minimize the difference between its prediction and the teacher network prediction, defining the loss as the sum of the binary cross-entropy losses for each pixel in the confidence map:
Figure BDA0003811700030000053
wherein
Figure BDA0003811700030000054
And
Figure BDA0003811700030000055
is the confidence of each pixel on the confidence map c; the student network receives the key point confidence maps from the teacher network marks and compares the key point confidence maps with the predicted key point confidence maps, and the key point confidence maps from the teacher network provide cross-modal supervision for the student network, so that the student network learns from the key point confidence maps to successfully predict the key point confidence maps.
Further, in step 6, after the training is completed, for tracking the hand key points, the position coordinates of the hand key points are acquired only by placing the hand in front of the radar, and the tracking of the hand key points only through the radio frequency signals is realized without using video image auxiliary markers.
The invention has the beneficial effects that:
(1) The FMCW radar is used for identifying and tracking the key points of the hand, and the electromagnetic wave of the radar is not influenced by factors such as illumination, smoke, visible distance and the like, so that the requirement on the environment is lower, and the action perception reliability and accuracy can be higher even if the environmental condition changes;
(2) The method utilizes the FMCW radar to identify and track the key points of the hands, the data flow of the FMCW radar is radar signals, but not optical images, and even if the signals are leaked, an attacker is difficult to directly see any useful information, so that certain guarantee is provided for the safety of the system;
(3) The invention utilizes the FMCW radar to identify and track the key points of the hand, and the dynamic key point tracking of the FMCW radar can be integrated on a high-speed processing chip with low energy consumption and small volume, thereby having higher transportability and availability.
Drawings
FIG. 1 is a flowchart illustrating a method for tracking a key point of a hand according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a cross-supervisor teacher-student network for key point tracking according to an embodiment of the present invention.
Fig. 3 is a schematic diagram illustrating completion of image information calibration according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained in detail by combining the drawings in the specification.
The invention provides a hand key point tracking method, which utilizes an FMCW radar to perform gesture recognition, and has the advantages that based on the gesture recognition of the FMCW radar, data flow is radar signals instead of optical image signals, and even if signals are leaked, an attacker is difficult to directly see any useful information, so that certain guarantee is provided for the safety of a system. In addition, the dynamic gesture recognition based on the FMCW radar can be integrated on a high-speed processing chip with low energy consumption and small volume, which provides the possibility of being embedded in a portable device.
As shown in fig. 1, the main steps of the method are as follows:
step 1: initializing an FMCW radar system, and synchronously acquiring a hand image acquired by a camera and a radio frequency signal acquired by a radar:
acquiring an original signal of a dynamic gesture through an FMCW radar, setting the period of each frequency modulation continuous pulse chirp as t, S as a frequency increasing slope, tau as a delay of the signal from the radar to the gesture and then returning, and f as a carrier frequency of the radar; emission signal S of radar 1 Expressed as:
S 1 =sin(2πft+πSt·t)
receiving signal S 2 Expressed as:
S 2 =sin[2πf(t-τ)+πS(t-τ) 2 ]
after passing through the mixer and the low-pass filter, the output intermediate frequency signal S is:
Figure BDA0003811700030000071
the frequency f of the intermediate frequency signal is obtained by performing one-dimensional Fourier transform on the formula IF If the distance from the gesture target to the radar is d, the light speed is c, and the formula is as follows:
Figure BDA0003811700030000072
the same processing is repeatedly carried out on the plurality of chirp signals by the method, and then the processed signals are spliced into one frame of data, so that the radio frequency signals from the radar can be obtained.
And 2, step: preprocessing the collected image information and radio frequency signals
For the hand images collected by the camera, the images stored by the camera are GBR images, so that the images need to be converted into RGB images through algorithm conversion, and the hand key points in the images can be conveniently marked later; the radio-frequency signal is subjected to a frequency domain-based feature extraction method, and each pixel in the mapping has a real component and an imaginary component because the radio-frequency signal is complex, so that the complex time domain signal can be transformed into a frequency domain by a Fourier transform method in the horizontal and vertical directions, the condition of each frequency component of the signal is clearly observed, and the feature can be extracted in the frequency domain. Fourier transform of continuous signal
Figure BDA0003811700030000081
Generating a frequency spectrum having distinct separate peaks by FFT processing, each peak indicating the presence of an object at a particular distance; further taking the phase of each effective data at the same distance to perform FFT again, and distinguishing a plurality of targets with different speeds at the same distance; then after phase FFT, the phase difference omega of each target is obtained 1 、ω 2 And further obtaining targets with different speeds, namely a hand characteristic diagram, namely a range-Doppler diagram RDI. One each of the horizontal and vertical antenna arrays needs to be preserved, the horizontal heat map being the projection of the signal reflection onto a plane parallel to the ground, and the vertical heat map being the projection of the signal reflection onto a plane perpendicular to the ground.
And step 3: construction of teacher network using image information
As shown in fig. 2, after image acquisition is completed, the image is input into a teacher network to obtain position information of key points of a hand, the teacher network is mainly built based on a MediaPipe handles model of google, after the MediaPipe handles model is imported, relevant parameters are set, the type of input data is set to be continuous static pictures, the confidence threshold of the model is set to be 0.5, the tracking threshold of the model is set to be 0.5, picture data containing continuous hand movements is input into the model after parameter setting is completed, the model can process the hand information from the pictures, capture the hand information in the pictures, automatically mark each key point of the hand according to the obtained picture information, and simultaneously obtain the pixel position of each key point relative to the pictures, the processed image is shown in fig. 3, and contains position information of 21 key points of the hand, and after processing is completed, a hand key point confidence map from a video can be obtained. A teacher network built based on a MediaPipe handles model can locate 21 key abscissa and ordinate of a hand, supervise the learning of a radio frequency network (student network) by using the coordinates as tags, and detect the prediction effect of the radio frequency network.
And 4, step 4: building student network using radio frequency information
For the processing of the radio frequency signal, a horizontal heat map H will be obtained, as shown in FIG. 2 l And vertical heatmap H v The method comprises the steps of carrying out encoding through radio frequency encoding networks, wherein each encoding network takes a radio frequency heat map of 100 frames (3.3 seconds) as input, the radio frequency encoding networks use 10 layers of 9 × 5 × 5 space-time convolution, the step length of each layer is 1 × 2 × 2, batch processing normalization is adopted after the input is finished, in order to solve the problem that after input data is input and when a neural network carries out direction error propagation, each layer needs to be multiplied by a first derivative of an activation function, gradient is attenuated by one layer every time the gradient is transferred, and when the number of network layers is large, gradient G is continuously attenuated until the gradient disappears, a ReLU activation function f (x) = max (0, x) is used after each layer, and the encoding is input into CNN after the encoding is finished. Inputting the characteristic diagram into different convolutional layers to realize characteristic extraction, and then decoding the coded heat map information by a radio frequency decoding network, wherein the decoding network has 5 layers except the last layer with the step length of 5
Figure BDA0003811700030000091
In addition, the step lengths of other layers are
Figure BDA0003811700030000092
Furthermore, the ReLu function is also used after each layer, and for the last layer, the sigmoid function is used as an output layer. During the training process of the whole student network, two real-value channels for storing a real part and an imaginary part are used for tableShowing a complex-valued radio frequency heat map, the whole network realizes radio frequency information through PyTorch and can obtain a key point confidence map from radio frequency through an encoding-decoding process.
And 5: a teacher-student network is constructed by using a cross-supervised learning mode.
And tracking the key points of the hands by taking the synchronous images and the radio frequency signals as a bridge. Inputting image information and radio frequency signals into a teacher network and a student network respectively, wherein the goal of student network culture is to minimize the difference between the prediction and the teacher network prediction, and the loss is defined as the sum of binary cross entropy losses of each pixel in a confidence map:
Figure BDA0003811700030000101
wherein
Figure BDA0003811700030000102
And
Figure BDA0003811700030000103
is the confidence of each pixel on the confidence map c. The student network receives the key point confidence maps from the teacher network marks and compares the key point confidence maps with the predicted key point confidence maps, and the key point confidence maps from the teacher network provide cross-modal supervision for the student network, so that the student network can learn from the key point confidence maps to successfully predict the key point confidence maps.
And 6: the system after the training realizes the tracking of the key points of the hands through radio frequency signals
After the system training is completed, for tracking the key points of the hand, the position coordinates of the key points of the hand can be obtained only by placing the hand in front of the radar, and the key points of the hand can be tracked only through radio frequency signals without using video image auxiliary marks.
The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiment, but equivalent modifications or changes made by those skilled in the art according to the present disclosure should be included in the scope of the present invention as set forth in the appended claims.

Claims (8)

1. A method for tracking hand key points based on an FMCW radar is characterized in that: the method comprises the following steps:
step 1, initializing an FMCW radar system, configuring parameters of hand information sampling, including receiving and transmitting antenna pairs, sampling points and sampling time, and simultaneously shooting a complete hand motion track by using a camera;
step 2, carrying out corresponding preprocessing on the obtained image information and the radio frequency signal; for image information, storing a GBR image stored in a camera as an RGB image; firstly, clutter suppression is carried out on radio frequency signals, a Fourier transform FFT is carried out on a distance-velocity dimension to form a distance-Doppler heat map RDI, the signals are respectively processed in the horizontal direction and the vertical direction compared with the ground to form a horizontal heat map H l And vertical heatmap H v
Step 3, processing the hand information in the picture for the stored image information, capturing the hand information in the picture, automatically marking each key point of the hand according to the obtained information, and obtaining a hand key point confidence map from the video;
step 4, obtaining H l And H v Coding is carried out through a radio frequency coding network, and the coding is input into the CNN; inputting the feature map into different convolutional layers in the CNN to realize feature extraction, and then decoding the coded heat map information through a radio frequency decoding network to obtain a key point confidence map from radio frequency;
step 5, interacting image information and radio frequency information of different modes by using a cross-mode learning and supervised learning method, calling a network for obtaining a video key point confidence map as a teacher network, calling a network for obtaining a radio frequency key point confidence map as a student network, constructing a cross-supervised learning teacher-student network, and detecting the reliability of the key point confidence map obtained by processing radio frequency signals through the network; the positions of the key points of the hands obtained from the radio frequency signals are identified and tracked in a cross-supervised learning mode;
and step 6, tracking the key points of the hands by the trained system only through radio frequency signals without video image assistance.
2. The FMCW radar-based hand keypoint tracking method according to claim 1, wherein: in the step 1, collecting an original signal of a dynamic gesture through an FMCW radar, setting the period of each frequency modulation continuous pulse chirp as t, S as a frequency increasing slope, tau as a delay of the signal from the radar to the gesture and then returning, and f as a carrier frequency of the radar; emission signal S of radar 1 Expressed as:
S 1 =sin(2πft+πSt·t)
receiving signal S 2 Expressed as:
S 2 =sin[2πf(t-τ)+πS(t-τ) 2 ]
after passing through the mixer and the low-pass filter, the output intermediate frequency signal S is:
Figure FDA0003811700020000021
the frequency of the intermediate frequency signal is f obtained by performing one-dimensional Fourier transform on the formula IF Setting the distance from the gesture target to the radar as d, the light speed as c, and the formula as follows:
Figure FDA0003811700020000022
the same processing is repeatedly carried out on the plurality of chirp signals by the method, and then the processed signals are spliced into one frame of data to obtain the radio frequency signals from the radar.
3. The FMCW radar-based hand keypoint tracking method according to claim 1, wherein: in step 2, the hand images collected by the camera are converted into corresponding sizesStoring the RGB image of 200 multiplied by 200; for radio frequency signals, a frequency domain-based feature extraction method is utilized, signals with complex time domains are transformed to frequency domains through a Fourier transform method in the horizontal direction and the vertical direction, the conditions of frequency components of the signals are observed, and features are extracted in the frequency domains; fourier transform of continuous signal
Figure FDA0003811700020000031
Generating a frequency spectrum having distinct separate peaks by FFT processing, each peak indicating the presence of an object at a particular distance; further taking the phase of each effective data at the same distance to perform FFT again, and distinguishing a plurality of targets with different speeds at the same distance; then after phase FFT, the phase difference omega of each target is obtained 1 、ω 2 Further obtaining targets with different speeds, wherein the hand characteristic diagram, namely the range-Doppler diagram RDI, is obtained at the moment; the horizontal heat map is a projection of the signal reflection on a plane parallel to the ground, and the vertical heat map is a projection of the signal reflection on a plane perpendicular to the ground.
4. A FMCW radar-based hand keypoint tracking method according to claim 1, characterized in that: in step 4, the radio frequency coding network uses 10 layers of 9 × 5 × 5 space-time convolution, the step length of each layer is 1 × 2 × 2, and batch processing normalization is utilized after input is completed; after each layer, the coding is input into the CNN after completion using the ReLU activation function f (x) = max (0, x).
5. A FMCW radar-based hand keypoint tracking method according to claim 1, characterized in that: in step 4, the radio frequency decoding network decodes the encoded heat map information, the decoding network has 5 layers, except the last layer, the step length is
Figure FDA0003811700020000032
In addition, the step lengths of other layers are
Figure FDA0003811700020000033
Furthermore, after each layer a ReLu function is used, for the last layer a sigmoid function is used as output layer.
6. A FMCW radar-based hand keypoint tracking method according to claim 1, characterized in that: and step 5, inputting the image information and the radio frequency signals into a teacher network and a student network respectively, wherein the student network receives the key point confidence maps from the teacher network marks and compares the key point confidence maps with the predicted key point confidence maps, and the key point confidence maps from the teacher network provide cross-modal supervision for the student network, so that the student network learns from the key point confidence maps and then successfully predicts the key point confidence maps.
7. The FMCW radar-based hand keypoint tracking method of claim 6, wherein: in step 5, the goal of student network training is to minimize the difference between the predictions and the teacher network predictions, and the loss is defined as the sum of the binary cross-entropy losses of each pixel in the confidence map:
Figure FDA0003811700020000041
wherein
Figure FDA0003811700020000042
And
Figure FDA0003811700020000043
is the confidence of each pixel on the confidence map c; the student network receives the key point confidence maps from the teacher network marks and compares the key point confidence maps with the predicted key point confidence maps, and the key point confidence maps from the teacher network provide cross-modal supervision for the student network, so that the student network learns from the key point confidence maps to successfully predict the key point confidence maps.
8. A FMCW radar-based hand keypoint tracking method according to claim 1, characterized in that: in step 6, after training is completed, for tracking of the hand key points, the position coordinates of the hand key points are obtained only by placing a hand in front of the radar, and the tracking of the hand key points only through radio frequency signals is realized without using video image auxiliary marks.
CN202211013101.3A 2022-08-23 2022-08-23 FMCW radar-based hand key point tracking method Pending CN115294656A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211013101.3A CN115294656A (en) 2022-08-23 2022-08-23 FMCW radar-based hand key point tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211013101.3A CN115294656A (en) 2022-08-23 2022-08-23 FMCW radar-based hand key point tracking method

Publications (1)

Publication Number Publication Date
CN115294656A true CN115294656A (en) 2022-11-04

Family

ID=83831785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211013101.3A Pending CN115294656A (en) 2022-08-23 2022-08-23 FMCW radar-based hand key point tracking method

Country Status (1)

Country Link
CN (1) CN115294656A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115856881A (en) * 2023-01-12 2023-03-28 南京邮电大学 Millimeter wave radar behavior sensing method based on dynamic lightweight network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115856881A (en) * 2023-01-12 2023-03-28 南京邮电大学 Millimeter wave radar behavior sensing method based on dynamic lightweight network

Similar Documents

Publication Publication Date Title
CN113506317B (en) Multi-target tracking method based on Mask R-CNN and apparent feature fusion
Ren et al. Overview of object detection algorithms using convolutional neural networks
CN110348288A (en) A kind of gesture identification method based on 77GHz MMW RADAR SIGNAL USING
KR20210080291A (en) Method, electronic device, and storage medium for recognizing license plate
CN112669350A (en) Adaptive feature fusion intelligent substation human body target tracking method
Wang et al. SSS-YOLO: Towards more accurate detection for small ships in SAR image
Li et al. Remote sensing image scene classification based on object relationship reasoning CNN
Wang et al. Multiple-environment Self-adaptive Network for Aerial-view Geo-localization
CN115294656A (en) FMCW radar-based hand key point tracking method
Sun et al. A target recognition algorithm of multi-source remote sensing image based on visual Internet of Things
CN116844056A (en) SAR target detection method combining self-supervision learning and knowledge distillation
Decourt et al. A recurrent CNN for online object detection on raw radar frames
Zhang et al. A review of recent advance of ship detection in single-channel SAR images
Ke et al. Dense small face detection based on regional cascade multi‐scale method
CN113901931A (en) Knowledge distillation model-based behavior recognition method for infrared and visible light videos
CN117218345A (en) Semantic segmentation method for electric power inspection image
Huang et al. Multi‐scale feature combination for person re‐identification
US20230168361A1 (en) Real time object motion state recognition method using millimeter wave radar
Zheng et al. Unsupervised human contour extraction from through-wall radar images using dual UNet
Ding et al. Novel Pipeline Integrating Cross-Modality and Motion Model for Nearshore Multi-Object Tracking in Optical Video Surveillance
Wu et al. Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events
Qiu et al. Effective object proposals: size prediction for pedestrian detection in surveillance videos
Yue et al. Improving multi‐object tracking by full occlusion handle and adaptive feature fusion
Wang et al. Attention-based vision transformer for human activity classification using mmwave radar
Zhang et al. An end-to-end framework for real-time violent behavior detection based on 2D CNNs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination